93% of paint splatters are valid Perl programs (2019)

1 week ago/126 comments/mcmillen.dev

Concatenative languages [1] have the property that every token sequence is a valid program.

For languages using single bits as tokens, every bit sequence is a valid program. One such language is Chris Barker's zot [2].

Inspired by zot, I defined a concatenative version of Binary Lambda Calculus that shares the same property [3].

[1] https://en.wikipedia.org/wiki/Concatenative_programming_lang...

[2] https://en.wikipedia.org/wiki/Iota_and_Jot#Zot

[3] https://cstheory.stackexchange.com/questions/32309/concatena...

7 days ago by deathanatos

> Concatenative languages [1] have the property that every token sequence is a valid program.

I don't think this is correct? Concatenative languages have the property that if a and b are both valid programs, that the program a || b is valid (where || means "concatenate"). But that property doesn't imply that every sequence of tokens is valid.

For example, in Cat,

  [1 2

is not grammatically valid.

7 days ago by tromp

You're right. I should have qualified it as "some concatenative languages".

6 days ago by joshmarlow

Is there a particular term for concatenative languages with this property?

7 days ago by YeGoblynQueenne

So does Prolog count as a concatenative language given the standard definition of concatenative languages?

For context, a Prolog program is a set of definite clauses so the union of two Prolog programs is a Prolog program.

In fact, this property forms the basis of the concept of (syntactic) generality in Prolog: if P1, P2, P3 are three Prolog programs such that P1 is the union of P2 and P3, then we say that P1 generalises each of P2 and P3. Does this concept of generality also exist for concatenative languages?

7 days ago by dheera

They used an OCR program though, which probably has a bias toward attempting to close parentheses.

7 days ago by kqr

> making Jot a natural Gödel numbering of all algorithms.

This sounds very cool. I wish I understood both Jot and that sentence.

7 days ago by tromp

A Gödel numbering is simply a mapping to integers (that is easily decoded). If your programs are arbitrary binary strings, then you're basically already done, since bitstrings are in 1-1 correspondence with integers:

    empty  0  1  00  01  10  11  000  001  ...
        0  1  2   3   4   5   6    7    8  ...

7 days ago by kqr

But doesn't Gödel numbering imply some sort of uniqueness, i.e. that each algorithm is only present once?

Otherwise, wouldn't any language assigning a valid meaning to any sequence of 0 and 1 be a Gödel numbering, even if only by saying that "an unrecognised sequence is a noop"?

7 days ago by Dylan16807

> A Gödel numbering is simply a mapping to integers (that is easily decoded).

"easily" is arguable. Sure, I could multiply/factor an enormous number and count how many copies of each prime it has, but I'd much rather concatenate/split some digits.

7 days ago by interroboink

I enjoyed footnote 5:

⁵ This feature does enable a neat quine: the Perl program “Illegal division by zero at /tmp/quine.pl line 1.”, when saved in the appropriate location, outputs “Illegal division by zero at /tmp/quine.pl line 1.” The reason for this behavior is left as an exercise for the reader.

7 days ago by fanf2

I wrote a blog post to explain it at https://dotat.at/@/2019-04-04-a-curious-perl-quine.html

And also a superficially-related but actually rather different Python quine:

  File "quine.py", line 1
   File "quine.py", line 1
   ^
 IndentationError: unexpected indent

7 days ago by LeifCarrotson

Can you help out a reader who does not know any Perl?

I tried it in the REPL and found that "Illegal division" can't locate method "illegal" in package "division", so presumably that gets ignored, same with method "by" in package "zero", and that "at /tmp" is the simplest version of the string that produces the error message, which apparently is more severe than the missing package warnings and terminates the program?

I'd guess the / is the operator for division, and the "tmp" is getting initialized as a variable and coerced into an integer? But "/tmp" doesn't do it, and "/tmp/" does something with regex, so I'm not sure why the parser would split it there.

7 days ago by interroboink

The footnote is originally referenced here:

    Figure 6 represents the string “gggijgziifiiffif”, which by pure
    coincidence happens to accurately represent the authors’ verbal reaction
    upon learning that “unquoted strings” were a feature intentionally
    included in the Perl language.⁵

So, the hint is that this has to do with the "unquoted strings" feature (aka "bare words"[1]).

See the sibling comment about the actual parse — "at" and "tmp" are seen as strings.

The strings get coerced into numbers due to being used with the numeric "/" operator (that's normal Perl behavor). Since the strings can't be parsed as numbers, they become "0". So, you get division by 0.

[1] https://perlmaven.com/barewords-in-perl

7 days ago by thaumasiotes

> The strings get coerced into numbers due to being used with the numeric "/" operator (that's normal Perl behavor). Since the strings can't be parsed as numbers, they become "0".

Huh, that's an odd failure of Perl's normal ethic of doing things that appear to make sense in context. A number should be 0 by default in additive contexts and 1 by default in multiplicative contexts.

7 days ago by andai

Non Perl user here: why do people love Perl but hate JS? I imagine Perl had a lot more thought put into it (than JS's 10 days), but this kind of implicit conversion sounds like exactly the kind of thing that bites me in the ass constantly in JS land.

(Actually, implicit unquoted strings sound so nightmarish it's comical, but let's do one question at a time...)

I used to think the solution was static typing, but then I found none of the same infuriating bugs in Python, which has dynamic but strong typing, forcing you to be explicit about type conversions.

Edit: I think I've hit the nesting limit, so please reply to parent comment and I will find it.

7 days ago by asddubs

PHP had this too in earlier versions. so if you missed an import that defined constants, the constants would just evaluate as their names. People liked to use this particularly for associative array indexes like $array[userid] instead of $array['userid']. Naturally a terrible idea because the inverse is also true in regards to constants, defining a constant named "userid" would then change the meaning of the program

7 days ago by hoytech

Easiest way is to Deparse it:

    $ perl -MO=Deparse tp.pl 
    'division'->Illegal('zero'->by('at' / 'tmp' / 'quine' . 'line'->pl(1)));
    tp.pl syntax OK

So I believe this is what causes it (note that "at" and "tmp" and such are "barewords"):

    $ perl -e 'at / tmp'
    Illegal division by zero at -e line 1.

7 days ago by kqr

It might be when you run Perl with no qualifier you get a recent version of Perl which turns on strict warnings for you, and (rightly) warns about undefined words instead of quietly trying to evaluate them anyway.

But yes, it converts the strings to integers and divides them, as shown in the sibling comment.

7 days ago by TheDauthi

By default, even on a recent perl it'll act like you're on a really old perl and run just fine.

With warnings, it runs, but tells you about all of the mistakes you made. With strict, it doesn't run.

7 days ago by teaearlgraycold

You can do this with Python as well for indentation errors

7 days ago by undefined

[deleted]

7 days ago by broken-kebab

Jokes aside, isn't it wrong that OCR software still always produces textual result from images wich are not text? More than a decade ago I OCRed an old book, and I remember how annoying it was to deal with all the garbage text produced from small pictures, smudges, and dirt. It looks like there's not much progress done since in the field

7 days ago by kzrdude

That question seems to be the same kind as the question in the OP. Isn't something wrong when a random scribble creates a valid execution in Perl?

7 days ago by jonahx

> It looks like there's not much progress done since in the field

LLMs help here. From my own experiments chatGPT is pretty good "smart, context-aware" OCR agent.

7 days ago by Kubuxu

Using image embedding and evaluating 100s billion parameter LLM for OCR is like hunting rabbits using Yamato’s 18in naval gun.

7 days ago by manquer

Well using a human is bring an interstellar rail gun to hunt rabbit so i guess it still better ?

6 days ago by jonahx

Not really. Proper OCR in the broadest sense (extracting text from arbitrary pdfs that intermingle tables, images, etc, or from hand written artistic posters) requires a full understanding of semantic intent.

You are perhaps imagining more constrained scenarios of straight lines of consistent text on a page with well-known artifacts of "noise" (smudges, print imperfections, and so on).

7 days ago by petters

Yes, there has been progress. But the featured article is meant to be fun!

7 days ago by dang

93% of Paint Splatters Are Valid Perl Programs (2019) - https://news.ycombinator.com/item?id=27929730 - July 2021 (163 comments)

Also:

93% of Paint Splatters Are Valid Perl Programs (2019) - https://news.ycombinator.com/item?id=38754686 - Dec 2023 (1 comment)

7 days ago by glenstein

I understand that this discusses recognizing paint splatters as characters with a given "optical character recognition" program, which seems disposed to almost always recognize pain as some combination of characters. Of the many possible ways this could be realized, this is absolutely welcome and in the spirit.

However, it did give me the initial impression about other possible ways to do this, such as taking patches of color and empty space as 0s and 1s, and the totality of it as a program. I think the vast majority of those cases would be pointless noise.

So there's two extremes, one with mostly noise, one with mostly meaning. I suppose the game-within-the-game here is to find what form of interpretation does the most to credit paint splatters with the most possible meaning where, to the greatest extent possible, the meaning truly comes from the structure and not from how aggressive the rules are at choosing to see meaning.

7 days ago by bee_rider

> disposed to almost always recognize pain as some combination of characters.

Well, break out the eeg, let’s see if pain is also a valid perl program.

7 days ago by EvgeniyZh

The opposite is definitely true

7 days ago by hnzix

Just make sure you don't accidentally autoviviy another instance of yourself.

7 days ago by akurtzhs

“What’s in the box?”

“Perl.”

7 days ago by iamleppert

With Generative AI, you can create new and innovative paint splatters that evaluate to working software, faster than ever before. Generative AI enables a new class of creators to harness and leverage text to image workflows, driving value for businesses of all sizes. New AI models are capable of embedding working software and machine readable codes into a wide variety of high resolution content, engaging viewers and providing creators new and exciting ways to grow their audiences.

7 days ago by dmbche

More cutting edge computational research here : https://sigbovik.org/

7 days ago by neilv

Clever variation on the old "indistinguishable from line noise" jokes.

(For those who weren't frequently exposed to "line noise"... Imagine an ASCII character video terminal that's interpreting a stream of bytes, to display meaningful text. Now imagine that the communication channel gets corrupted somehow (say, someone picks up the phone handset while modem is online, or there's interference on the cable), and there's no error correction or checksumming, so the bytes being interpreted are effectively become randomized. So random letters, digits, punctuation, control characters, etc., are being interpreted and displayed, and this is familiar, and you know it's random and why... but the joke is that it's still actually a valid Perl program.)

7 days ago by saalweachter

Great, now you've made me realize that line noise is in the same category of things I'll never be able to explain to Kids These Days (like broadcast schedules), and now I might as well get it over with and tie an onion to my belt.

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack