Nleyten@1300: the joy of parsing


I know this is bound to brand me as an incurable language geek, but I find Abstract Syntax Trees a thing of extraordinary beauty. Yeap, I am still immersed in the creation of the languages and associated parsers for the markup of Lambdium stories and comments. And I'm finding that I'm really enjoying all the stuff in this trade.

I have now completed the definition and most of parsing tools for the full-fledged Lambtex parser. As it turned out, introducing support for labelling and references forced me to rethink some of the prior assumptions I had for the Lambtex language. It was a difficult birth of sorts, and the language definition did go through several revisions until it arrived in its present state. I am now however quite pleased with the result: Lambtex manages to be both simple and powerful, gracefully handling most complex cases without burdening beginners with an arcane syntax. I will write a small manual (to be part of Nleyten's help system) soon enough — you will then get a clearer idea of what I mean.

I also realise that I'm lucky enough to be able to be creating both the language and the tools to parse it at the same time. The two play an intricate dance together, and even minor changes to the syntax of a language can have a profound effect on the parsing tools. I am also thankful to be doing this in Ocaml: the language is just perfect for compiler writing!

As for the SVN changelog at revision 1300, it reads "Adapted tokenizer to new Lambtex scanner". The Lambtex parsing chain is composed of four separate modules (as typical with most compilers): at the lowest level there is a scanner that splits the input into recognisable language atoms; closely tied to it there is a tokenizer that converts those atoms into proper parser tokens; then we have the parser proper (I am using Menhir to generate the parser; it's similar to ocamlyacc, but better); and at last there is a postprocessor that verifies that the document is semantically correct (all references point to valid targets, for example) and produces its final version.

 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this entry.
Comments
  • No comments exist for this entry.
Leave a comment

 Enter the above security code (required)

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.