Rubby - the Little Language That Could
A couple of weeks ago we had the pleasure of attending RailsCamp NZ 2013 at the beautiful Camp Kaitoke. We knew that it was RailsCamp tradition to have a project to work on over the course of the weekend and although we have plenty of rails related projects we decided that we wanted to work on our own language. We’ve had this idea for a little language, much like CoffeeScript, sloshing around in the back of our brains for a while, and we thought it’s about time we got it out. Thus Rubby was born.
Rubby consists of a transpiler that converts Rubby code into idiomatic Ruby, for example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Which transpiles into the following Ruby:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
And while the output obviously still needs a little tweaking (specifically adding whitespace between methods, etc, and when to put parens on method arguments) it’s mostly complete feature wise.
We worked almost non-stop on Rubby at RailsCamp, but would probably would have given it up if it wasn’t for the quiet enthusiasm of Bardoe and Brett. Over the weekend, Rubby went from a barely passable lexer and parser to having a basically functional transpiler and REPL. We were hugely proud to be able to stand up on Sunday night and demonstrate our achievement to the other campers.
How it works
Rubby is based around Chris Wailes’ RLTK library, although with a number of Rubby-specific patches. Rubby code is run through the lexer which processes the input into a stream of tokens with an optional value (eg
<- emits just a
'foo' emits a
STRING token with the value
'foo'). The only real magic in the lexer is in
Rubby::Lexer::Environment#indent_token_for where it attempts to measure whitespace after a newline and emit the correct number of
The token stream is then passed into the parser which is in essense a massive state machine; given a particular token it builds a list of possible next tokens, if there are multiple possible actions then it tries each one until it succeeds in consuming the entire token stream or it runs out of actions (a syntax error). The parser emits an abstract syntax tree, the classes for which are defined in
Next, the transpiler walks through every node in the syntax tree calling
#modify_ast on each which allows nodes to make modifications to other nodes in the ast (for example the
InstanceArgument node modifies it’s parent method definition to contain instance variable assignments). This can’t be done at the same time collapsing the AST into Ruby because a node may need to modify an already collapsed peer to implement a language feature.
Once all this is done, the transpiler then calls
#to_ruby on the root node of the AST, which in turn will call
#to_ruby on it’s children (if required) and will return a large nested array of ruby statements, where an increase in nesting corresponds to an increase in indentation. This array is then passed into the
RubyFormatter which joins these arrays with the correct indenting and returns the final Ruby representation of the program.
Rubby still has a bunch of features needed before we can contemplate a 1.0 release; most pressingly support for interpolated regular expressions and a convincing
ensure syntax. We also want to submit a pull request for
ActiveSupport::Dependencies to be language agnostic, something that the existing Polyglot hook gets us near to, but not all the way. If you’d like to help with that, or with Rubby in general (there are a bunch of Cucumber features tagged as
@todo) then we could really use your help.
I hope people enjoy programming in Rubby as much as we enjoyed writing it and I’m really keen for any feedback whatsoever.