Writing Parsers Like it is 2017 [pdf]

JoshTriplett | 205 points

This should be titled more honestly: perhaps something like "Case studies in rapidly replacing dangerous C parser code with Rust using nom".

The examples are interesting and well presented. But the first sections trying to put a veneer of respectability on rust-all-the-things were a bit rough and got my cynic sense tingling.

Yes, you believe Rust will produce better results: don't try to justify that with facts you don't have ("Several languages were tested .." bullshit, unless you show some data. Likewise the assertions about type-safety and no-GC being essential properties). The data you do have (implementations produced and integrated and tested in a paper-like time frame) are valuable, unfortunately they're cheapened/buried under this false veneer.

saywatnow | 7 years ago

Is it possible to have "parser generators" (not necessarily in the formal sense of the term) that produce recursive descent parsers? Even if mathematically they can't be perfect, could we have "good enough" ones? Nobody uses parser generators because even though P.G.s "work" on the barest level of ingesting source code and spitting an AST, they don't do anything beyond that. For example, trying to get helpful error messages (think: from old GCC to Clang) from a P.G.s output is close So instead everyone writes custom parsers, and that introduces these problems.

Can't we have smarter parser generators that do make debugging nice, but are still formally verified?

Analemma_ | 7 years ago

As long as you're writing parsers, it is forever 1969. Knuth invented all you need to know just recently. Off you go and parse!

kazinator | 7 years ago

Nice overview of the pitfalls of C-parsers, their hardening, a presentation of Rust advantages, parser combinators, the nom crate, its usage, the application to VLC and an intrusion detector, the integration with those complex existing C codebase, fuzzing and a few ideas to improve rust for more security.

And I am happy to see the ANSSI here :)

BigIQ | 7 years ago

No mentions to instaparse? https://github.com/Engelberg/instaparse

"Parse IS be something easy as regexp"

souenzzo | 7 years ago

In my experience avoiding unlimited recursion/stackoverflows is one of the most annoying parts of writing a parser, so I find it surprising that the paper doesn't even mention that topic.

CodesInChaos | 7 years ago

Are there any Context-sensitive algorithms/parsers/generators?

The list at https://en.wikipedia.org/wiki/Context-sensitive_grammar only contains two links, where one of them, "LuZc" seems completely dead with "lorem ipsum" under Downloads, and the other "bnf2xml" seems to be misplaced since BNF is not context-sensitive.

JoelJacobson | 7 years ago

Parser combinators are less powerful than what parser generators can do, in terms of expressiveness and efficiency. And we've known parser generators since the 70s.

So I'm not sure what the point is of the title of the article.

amelius | 7 years ago

Sorry for the low effort comment but Clever Cloud is an awesome name

oldsj | 7 years ago