Concatenative languages [1] have the property that every token sequence is a valid program.
For languages using single bits as tokens, every bit sequence is a valid program. One such language is Chris Barker's zot [2].
Inspired by zot, I defined a concatenative version of Binary Lambda Calculus that shares the same property [3].
[1] https://en.wikipedia.org/wiki/Concatenative_programming_lang...
[2] https://en.wikipedia.org/wiki/Iota_and_Jot#Zot
[3] https://cstheory.stackexchange.com/questions/32309/concatena...
> Concatenative languages [1] have the property that every token sequence is a valid program.
I don't think this is correct? Concatenative languages have the property that if a and b are both valid programs, that the program a || b is valid (where || means "concatenate"). But that property doesn't imply that every sequence of tokens is valid.
For example, in Cat,
[1 2
is not grammatically valid.You're right. I should have qualified it as "some concatenative languages".
Is there a particular term for concatenative languages with this property?
So does Prolog count as a concatenative language given the standard definition of concatenative languages?
For context, a Prolog program is a set of definite clauses so the union of two Prolog programs is a Prolog program.
In fact, this property forms the basis of the concept of (syntactic) generality in Prolog: if P1, P2, P3 are three Prolog programs such that P1 is the union of P2 and P3, then we say that P1 generalises each of P2 and P3. Does this concept of generality also exist for concatenative languages?
They used an OCR program though, which probably has a bias toward attempting to close parentheses.
> making Jot a natural Gödel numbering of all algorithms.
This sounds very cool. I wish I understood both Jot and that sentence.
A Gödel numbering is simply a mapping to integers (that is easily decoded). If your programs are arbitrary binary strings, then you're basically already done, since bitstrings are in 1-1 correspondence with integers:
empty 0 1 00 01 10 11 000 001 ...
0 1 2 3 4 5 6 7 8 ...
But doesn't Gödel numbering imply some sort of uniqueness, i.e. that each algorithm is only present once?
Otherwise, wouldn't any language assigning a valid meaning to any sequence of 0 and 1 be a Gödel numbering, even if only by saying that "an unrecognised sequence is a noop"?
> A Gödel numbering is simply a mapping to integers (that is easily decoded).
"easily" is arguable. Sure, I could multiply/factor an enormous number and count how many copies of each prime it has, but I'd much rather concatenate/split some digits.
I enjoyed footnote 5:
â” This feature does enable a neat quine: the Perl program âIllegal division by zero at /tmp/quine.pl line 1.â, when saved in the appropriate location, outputs âIllegal division by zero at /tmp/quine.pl line 1.â The reason for this behavior is left as an exercise for the reader.
I wrote a blog post to explain it at https://dotat.at/@/2019-04-04-a-curious-perl-quine.html
And also a superficially-related but actually rather different Python quine:
File "quine.py", line 1
File "quine.py", line 1
^
IndentationError: unexpected indent
Can you help out a reader who does not know any Perl?
I tried it in the REPL and found that "Illegal division" can't locate method "illegal" in package "division", so presumably that gets ignored, same with method "by" in package "zero", and that "at /tmp" is the simplest version of the string that produces the error message, which apparently is more severe than the missing package warnings and terminates the program?
I'd guess the / is the operator for division, and the "tmp" is getting initialized as a variable and coerced into an integer? But "/tmp" doesn't do it, and "/tmp/" does something with regex, so I'm not sure why the parser would split it there.
The footnote is originally referenced here:
Figure 6 represents the string âgggijgziifiiffifâ, which by pure
coincidence happens to accurately represent the authorsâ verbal reaction
upon learning that âunquoted stringsâ were a feature intentionally
included in the Perl language.â”
So, the hint is that this has to do with the "unquoted strings" feature (aka "bare words"[1]).See the sibling comment about the actual parse â "at" and "tmp" are seen as strings.
The strings get coerced into numbers due to being used with the numeric "/" operator (that's normal Perl behavor). Since the strings can't be parsed as numbers, they become "0". So, you get division by 0.
> The strings get coerced into numbers due to being used with the numeric "/" operator (that's normal Perl behavor). Since the strings can't be parsed as numbers, they become "0".
Huh, that's an odd failure of Perl's normal ethic of doing things that appear to make sense in context. A number should be 0 by default in additive contexts and 1 by default in multiplicative contexts.
Non Perl user here: why do people love Perl but hate JS? I imagine Perl had a lot more thought put into it (than JS's 10 days), but this kind of implicit conversion sounds like exactly the kind of thing that bites me in the ass constantly in JS land.
(Actually, implicit unquoted strings sound so nightmarish it's comical, but let's do one question at a time...)
I used to think the solution was static typing, but then I found none of the same infuriating bugs in Python, which has dynamic but strong typing, forcing you to be explicit about type conversions.
Edit: I think I've hit the nesting limit, so please reply to parent comment and I will find it.
PHP had this too in earlier versions. so if you missed an import that defined constants, the constants would just evaluate as their names. People liked to use this particularly for associative array indexes like $array[userid] instead of $array['userid']. Naturally a terrible idea because the inverse is also true in regards to constants, defining a constant named "userid" would then change the meaning of the program
Easiest way is to Deparse it:
$ perl -MO=Deparse tp.pl
'division'->Illegal('zero'->by('at' / 'tmp' / 'quine' . 'line'->pl(1)));
tp.pl syntax OK
So I believe this is what causes it (note that "at" and "tmp" and such are "barewords"): $ perl -e 'at / tmp'
Illegal division by zero at -e line 1.
It might be when you run Perl with no qualifier you get a recent version of Perl which turns on strict warnings for you, and (rightly) warns about undefined words instead of quietly trying to evaluate them anyway.
But yes, it converts the strings to integers and divides them, as shown in the sibling comment.
By default, even on a recent perl it'll act like you're on a really old perl and run just fine.
With warnings, it runs, but tells you about all of the mistakes you made. With strict, it doesn't run.
You can do this with Python as well for indentation errors
Jokes aside, isn't it wrong that OCR software still always produces textual result from images wich are not text? More than a decade ago I OCRed an old book, and I remember how annoying it was to deal with all the garbage text produced from small pictures, smudges, and dirt. It looks like there's not much progress done since in the field
That question seems to be the same kind as the question in the OP. Isn't something wrong when a random scribble creates a valid execution in Perl?
> It looks like there's not much progress done since in the field
LLMs help here. From my own experiments chatGPT is pretty good "smart, context-aware" OCR agent.
Using image embedding and evaluating 100s billion parameter LLM for OCR is like hunting rabbits using Yamatoâs 18in naval gun.
Well using a human is bring an interstellar rail gun to hunt rabbit so i guess it still better ?
Not really. Proper OCR in the broadest sense (extracting text from arbitrary pdfs that intermingle tables, images, etc, or from hand written artistic posters) requires a full understanding of semantic intent.
You are perhaps imagining more constrained scenarios of straight lines of consistent text on a page with well-known artifacts of "noise" (smudges, print imperfections, and so on).
Yes, there has been progress. But the featured article is meant to be fun!
Related:
93% of Paint Splatters Are Valid Perl Programs (2019) - https://news.ycombinator.com/item?id=27929730 - July 2021 (163 comments)
Also:
93% of Paint Splatters Are Valid Perl Programs (2019) - https://news.ycombinator.com/item?id=38754686 - Dec 2023 (1 comment)
I understand that this discusses recognizing paint splatters as characters with a given "optical character recognition" program, which seems disposed to almost always recognize pain as some combination of characters. Of the many possible ways this could be realized, this is absolutely welcome and in the spirit.
However, it did give me the initial impression about other possible ways to do this, such as taking patches of color and empty space as 0s and 1s, and the totality of it as a program. I think the vast majority of those cases would be pointless noise.
So there's two extremes, one with mostly noise, one with mostly meaning. I suppose the game-within-the-game here is to find what form of interpretation does the most to credit paint splatters with the most possible meaning where, to the greatest extent possible, the meaning truly comes from the structure and not from how aggressive the rules are at choosing to see meaning.
> disposed to almost always recognize pain as some combination of characters.
Well, break out the eeg, letâs see if pain is also a valid perl program.
The opposite is definitely true
Just make sure you don't accidentally autoviviy another instance of yourself.
âWhatâs in the box?â
âPerl.â
With Generative AI, you can create new and innovative paint splatters that evaluate to working software, faster than ever before. Generative AI enables a new class of creators to harness and leverage text to image workflows, driving value for businesses of all sizes. New AI models are capable of embedding working software and machine readable codes into a wide variety of high resolution content, engaging viewers and providing creators new and exciting ways to grow their audiences.
More cutting edge computational research here : https://sigbovik.org/
Clever variation on the old "indistinguishable from line noise" jokes.
(For those who weren't frequently exposed to "line noise"... Imagine an ASCII character video terminal that's interpreting a stream of bytes, to display meaningful text. Now imagine that the communication channel gets corrupted somehow (say, someone picks up the phone handset while modem is online, or there's interference on the cable), and there's no error correction or checksumming, so the bytes being interpreted are effectively become randomized. So random letters, digits, punctuation, control characters, etc., are being interpreted and displayed, and this is familiar, and you know it's random and why... but the joke is that it's still actually a valid Perl program.)
Great, now you've made me realize that line noise is in the same category of things I'll never be able to explain to Kids These Days (like broadcast schedules), and now I might as well get it over with and tie an onion to my belt.
Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.