Ragel rocks (part 1 of 2)
Zed Shaw is spot-on: “The Ragel State Machine Compiler is one kick-ass piece of software.”
As part of Kinetic development, I need a way to parse several custom-designed languages, each with its custom grammar. The term popularized by Martin Fowler is DSLs, short for “domain-specific languages.”

http://www.flickr.com/photos/dilaudid/278649026/
In the past, when faced with this task of parsing DSLs, I took one of two approaches:
Approach #1: For parsing complex DSLs, the typical starting point, and the one taught in college compiler design courses, is to first define the language’s custom syntax via a comprehensive set of BNF grammar rules.
In junior high, I was one of those kids who actually enjoyed diagramming sentences. Well, defining BNF grammars for DSLs requires a similar masochistic mindset. Parser generators such as ANTLR allow these BNF grammars to be defined, and custom logic to be executed when the constituent words/tokens are “found” in the input text.
I’m over-simplifying the process of course, as any CS major who wasted away Memorial Day weekend staring at lex/yacc debugging messages can attest. And that is in fact the primary drawback of this approach of using parser generators: it’s usually too heavy-handed unless you really want to design a fairly rich custom DSL.
It’s no surprise that parsers and compilers are written by a relatively-tiny and arcane community of computer engineers. That business developer you’re paying $90/hour for? Chances are, he or she has never written a custom DSL parser.
Thankfully, I’ve only had to go down this road twice in my programming career (both for personal projects). I absolutely love the power of this approach–using full-fledged parser generator tools gives me the freedom to define complex DSLs–but the price in terms of software complexity and time is difficult to justify for most projects.
Approach #2: For that reason, I generally opt to keep my DSLs simple. Simpler DSLs means simpler parsing. This allows me to just cobble together straightforward text-parsing routines. These routines rely on basic multi-stage regular expression (regex) matchers and core string-manipulation functions. Nothing fancy, it’s anything any engineer can do.
Yes, it’s simplistic and dirty with the taint of hackiness. But–more importantly, it gets the job done every time.
The ugly catch? Debugging and maintaining these simplistic parsers is no fun. There’s a pearl of programming wisdom:
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.”
Step away from several complicated regexes, and try patching that code a few years later. I’ve had to do that on several occasions, and each time, I growled at my former self. I would’ve had more luck trying to decipher the green letters in the Matrix. Add a pair of parentheses to an existing regex, and you risk breaking your back-references (Ruby 1.9′s support of named groups only helps very slightly here). The pain of maintaining regexes and convoluted string-manipulation logic generally leads to parsing code collecting more than its fair share of dust.
Which is why I was thrilled to discover Ragel.
(to be continued tomorrow…)
