Ragel rocks (part 2 of 2)
Yesterday, I touched on two strategies for developing domain-specific language (DSL) parsers: Hardcore DSLs may warrant using a full-fledged parser generator (such as JavaCC or ANTLR) with support for BNF grammar rules. For lighter-weight DSLs, regex hackery and substring functions often suffice to get the job done.
My challenge is–Kinetic’s language waffles somewhere in the middle of the “hardcore DSL” vs. “lightweight DSL” spectrum. Language design is largely about making hard choices about your language’s limitations. Right now, Kinetic is far from “hardcore.” It doesn’t support if/else/while control statements, exception handling, class definitions, and many amenities found in modern programming languages. And I’m ok with that: The target user for Kinetic isn’t the ninja coder; rather, Kinetic is intended to appeal to less-technical folks who want an easy way to describe complex diagrams.
But my ego won’t allow me to classify it as a “lightweight DSL” either. The current Kinetic language spec supports macro definitions, variable assignments, file references, embedded scripting, and a fairly expressive node-selection syntax. And new language features will be added as the project evolves.
Lucky for me–this middle-ground happens to be Ragel‘s sweet spot.
So what exactly does Ragel do? It’s a tool to describe finite state automata (FSA). Basically, an FSA defines the DSL parser’s behavior as it encounters each and every character in the input text. An FSA describes “states” and “transitions”:

http://www.complang.org/ragel/number_lg.png
For example, if a parser sees a “a” followed by a “d” followed by another “d”, it can conclude that it saw that word “add.” As the parser hops from “state” to “state” (from letter to letter), we can instruct it to perform special actions during those “state transitions”. For example, the parser can increment a counter variable every time it transitions from “a” to “d”.
The elegance of Ragel actually makes it fun to create DSL parsers. Remember how earlier I noted that “hardcore parser-generators” offer the benefit of BNF notation? Well, Ragel’s syntax isn’t quite as powerful as BNF, but it gets us mostly there. These Ragel rules are more maintainable than regexes. And, since it turns out that every regex is in fact an FSA (every regex engine starts by converting a regex to an FSA), if you’re already familiar with regexes, you’ll find it incredibly easy to pick up Ragel’s slick syntax.
Here’s some Kinetic language grammar defined using Ragel (compare to BNF):
unquoted_string = ([^"]+);
quoted_string = (["] unquoted_string ["] | (["] ["]));
node = (upper (('_' | alpha | digit)*?))
tagname = (lower+ ('_' | lower)*)
variable_name = (alpha (('_' | alpha | digit)*?))
Then, Ragel performs its magic –voila! — and generates code for your custom DSL parser. Peek at the Ragel-generated code and you’ll see thousands upon thousands of lines like this:
self._parser_actions = [
0, 1, 0, 1, 1, 1, 2, 1,
2, 6, 22, 2, 7, 4, 2, 7,
23, 2, 16, 24, 2, 19, 23, 2,
19, 24, 2, 20, 24, 2, 23, 0,
3, 1, 6, 22, 3, 6, 1, 22,
3, 8, 13, 23, 3, 10, 18, 23,
3, 10, 18, 24, 3, 11, 15, 23]
And that, dear friend, is precisely why you never want to write a finite state automata by hand.
The generated Ruby/Java/C++ parser code is completely standalone. In other words, once you use Ragel to generate your parser, there’s no need to include Ragel libraries in your code distribution. This is simply brilliant.
If you’ve read this far, you may also want to check out Ragel’s other features. Frankly I think Ragel is underrated, and is worth adding to your quiver.
Thanks Adrian Thurston, for your gift to us. Ragel rocks!
