To content | To menu | To search

Thursday 4 February 2016

Library authors: Don't forget the examples!

These are exciting times in the OCaml community. Compiler development is proceeding at a brisk pace, with several long-awaited features on the horizon (flambda, multicore, and modular implicits, just to name a few). The tooling has also improved dramatically in the past few years, making the time before OPAM and Merlin seem like a very distant and best forgotten dark age. Moreover, the community has grown to the point where it is very likely that you will find a library that tackles your particular needs, whatever they may be. Some pain points remain, however. In this post I'll address a particularly frustrating one: the issue of library documentation (or lack thereof). Frustrating not only because it is endemic to our community, but also because it can be mitigated with minimal burden to library authors.

Ideally, a library's documentation would consist of an introductory tutorial garnished with multiple examples, plus the API reference. While the latter is indisputably useful, for libraries with a large surface API it is not enough. In such cases, users faced solely with the API reference are likely to scratch their heads wondering where to begin and how the various pieces fit together. The API reference only becomes useful once users have built a proper mental model of the library, which is a lot easier to do after reading through a tutorial.

Unfortunately, if a project includes any documentation at all, it is very likely that consists solely of the API reference rather than a tutorial. The reason is fairly obvious to any developer: the API reference sits right there in the code (in the mli file, usually), and imposes a minimal burden in creation and maintenance. Writing a tutorial, on the other hand, is a whole extra task that robs time away from the actual coding. It's also not as fun.

There is however a middle ground between writing a full-fledged tutorial and not writing one at all. Moreover, it's a middle ground that imposes minimal inconvenience to developers while providing tremendous value to users. I'm talking about simply including some usage examples with your library.

What makes examples so special? Foremost, a good example provides one of the critical advantages of a tutorial: condensing a large API into a concrete starting point. Consider the case of Cohttp, which offers a relatively large API further complicated by the need to support both Lwt and Async backends. Despite this, getting started with Cohttp is actually fairly easy because it includes a couple of trivial examples. Any OCaml developer can look at the half-dozen lines of each example and immediately build a mental model of how the Cohttp library is structured.

Besides the advantages they bring to users, examples are not very burdensome to developers. For one, they can be derived from private examples used in testing. Moreover, it's fairly trivial to keep examples up-to-date with the latest library API, because in most cases you can rely on the compiler doing the heavy lifting of verifying that the example is still valid. All in all, there is an asymmetry at play: a minimal effort on the developer's part will make a tremendous difference to users. Therefore, my plea to OCaml library developers is to please include examples with your libraries!

Now that (hopefully) I've convinced you to exemplify your libraries, you should also consider a few common sense rules for maximising the effectiveness of examples:

  • Place your examples in a directory named examples. This is the de facto standard in the OCaml community.

  • Start simple, even if the example is dummy. Resist the temptation to demonstrate all the awesome features of your library in a single example.

  • Fully qualify identifiers. In other words, avoid global opens and over-using custom operators. Also, note that parenthesised local opens (the Module.(expr) syntax) are preferable to global opens anyway. And if you insist on showing off the powers of conciseness afforded by your library, please consider having two versions of the same example: one using full qualification, and one without.

  • Order your examples. Suppose you have a sundry collection of examples. Simply number them from the simplest to the more complex. Otherwise, users are faced with a directory full of examples, without knowing where to start. (This is the approach I have chosen for the examples for the Lambdoc library.)

  • Include a Makefile or build instructions for the examples, particularly if the building process is not obvious.

  • Keep your examples up-to-date.

Thursday 29 October 2015

On-the-fly lexer switching with Menhir

Suppose you have a markup language that requires different lexers for different contexts (you have different verbatim environments, for instance). You can often solve this problem entirely within the lexer. This is the approach typically used for dealing with comments when parsing programming language source code, for example. What if, however, the markup language is complex enough that you need some help from the parser to know in which context you sit at any given moment, and therefore which lexer is the right one to call?

The situation above makes the familiar OCamllex (or Sedlex) + Menhir combination problematic. So much so that even if you otherwise have a strong preference for these tools and the grammar of your language is a nice fit for an LR(1) parser generator, you may be forced to adopt some scannerless parsing technique or some exotic parser generator. Nevertheless, I would like to share a couple of approaches which allow you to have your cake and eat it too. That is, perform on-the-fly lexer switching within a Ocamllex (Sedlex, actually) + Menhir framework.

The markup we want to parse is a very simplified sibling of Lambtex, supporting only paragraphs, quote environments, and two different kinds of verbatim-like environments (verbatim proper and source). Within paragraphs, only plain text, bold text, and hyperlinks are supported. Here you'll find a complete sample of our markup language. (Note that this dumbed-down Lambtex is indeed so simple that you could get away with parsing the verbatim-like environments entirely within the lexer context. You'll have to indulge me here!)

The first approach relies on Menhir's new Inspection API, which essentially allows for the current parser state to be inspected from the outside. This solution demands that Menhir be run in incremental mode, which in turn demands the use of the table-based back-end. As the Menhir manual notes, the table-based back-end is generally slower than the default code-based back-end. On the plus side, this solution does not require any hacks within the parser specification itself. It does, however, require a mildly complex Tokenizer layer between the Lexer and the Parser.

The second approach relies on a hack made practical by Menhir's ability to produce parameterised parsers, ie, parsers which are in fact OCaml functors. Suppose thus that we declared our parser to be parameterised by a module C obeying signature Context.S:

%parameter <C: Context.S>

The hack itself consists of using side-effects within the parser specification to set the current lexing context (note the set_general and set_literal rules below). The fact that the parser is parameterised allows us to contain the side-effects within each instantiation of the functor. Though the hack would also work without parser parameterisation, the resulting parser would not be reentrant and could not be safely used in a multi-threaded application. At last, note that this approach also inserts a Tokenizer layer between the Lexer and the Parser. It is however much simpler than the one required by the first approach.

  | set_literal BEGIN_VERBATIM set_general TEXT END_VERBATIM  {Ast.Verbatim $4}
  | /* empty */  {C.(set General)}
  | /* empty */  {C.(set Literal)}
You may have noticed the seemingly odd placement of the set_literal and set_general producers within the sole production of the block rule. These seem to be placed one position before where they should be. The reason is simple: remember that we have to take into account the lookahead token!

And that's it. Both of these approaches work and each has its advantages and disadvantages. I'm leaning towards the second approach for a cleaner reimplementation of Lambtex's current parser, though I can imagine that even hairier markups may require the extra flexibility afforded by the first approach. To conclude, note that I've been deliberately terse in explanation because the complete code is available on Github. Just bear in mind that it is littered with debug statements.

Monday 17 August 2015

Announcing Lambdoc 1.0-beta4

I'm happy to announce the release of version 1.0-beta4 of Lambdoc, a library providing support for semantically rich documents in web applications. Though Lambdoc was designed with Ocsigen/Eliom integration in mind, it does not actually depend on the Ocsigen server or Eliom, and you may use it with other frameworks. In fact, you may find it useful outside the web application domain altogether.

An overview of Lambdoc's features may be found in previous posts announcing the beta1 and beta3 releases. Between beta3 and beta4, the most salient changes are as follows:

  • Introduction of Lambdoc_core_foldmap, a module for aiding the construction of functions for deep traversal and transformation of a document tree. The basic idea is inspired by the compiler's Ast_mapper module, so it should be widely familiar. Moreover, the foldmapper is the result of a functor parameterised by a custom monad, so it's easily integrated in an application using Lwt or Async if the foldmapping requires doing monadic I/O. The tutorial directory includes some examples using Lambdoc_core_foldmap:

    • Tutorial 7 illustrates one of the simplest possible applications of this feature: a function that counts the number of bold sequences used in a document.
    • Tutorial 8 depicts a link validator that uses Cohttp to verify that all external links are live. Note that it registers itself as a parsing postprocessor, allowing any found errors to be reported together with other unrelated document errors. Moreover, it lives under the Lwt monad.
    • Tutorial 9 implements a simple document transformer which replaces all instances of Eastasia with Eurasia and vice-versa.
  • The addition of Lambdoc_core_foldmap enabled the simplification of the extension mechanism. Previous versions of Lambdoc feature hooks for reading/writing link and image URLs. All of those hooks are now gone.

  • Lambdoc documents may now carry information about the parsed source (location, etc) in every attribute. I briefly entertained the possibility of making the attribute polymorphic, thus allowing for document to carry custom meta-data. However, at this moment I have no practical need for this extra flexibility, and I am wary of increasing complexity in the name of hypothetical use cases.

Lambdoc 1.0-beta4 should be available in OPAM any moment now. Documentation is still a work in progress, and since OCamldoc gets terribly confused with Lambdoc's heavy use of module aliases, we may have to wait for Codoc before proper API docs can get generated. In a small effort to ameliorate this situation, the examples directory includes a tutorial with self-contained demos of Lambdoc's various features.

Monday 30 March 2015

Announcing Lambdoc 1.0-beta3

I'm happy to announce the release of version 1.0-beta3 of Lambdoc, a library providing support for semantically rich documents in web applications. Lambdoc was designed with Ocsigen/Eliom integration in mind, though you may of course use it with other frameworks (it does not actually depend on the Ocsigen server or Eliom). In fact, you may find it useful outside the web application domain altogether.

An overview of Lambdoc's features may be found in the post I wrote announcing the first beta of Lambdoc. The good news is that in the intervening months, some of the most pressing issues with the library have been fixed, and it is now much closer to completion. The bad news is that backward-incompatible changes were required. For most uses these amount to no more than a module renaming fixable by search-and-replace. The extension mechanism suffered a complete overhaul, however (more on that below), and is manifestly incompatible with the first beta. My apologies if anyone was inconvenience by this, and the caveat emptor regarding beta software remains.

Lambdoc 1.0-beta3 should hit the OPAM repos any moment now.

Salient changes since beta 1

  • The module structure was reorganised, with the module packs being ditched in favour of flatter structure reliant on module aliases.
  • OASIS is now used for the build system.
  • Completely revamped extension mechanism. Extensions are now easily composable, and output raw AST values instead of Lambdoc_core values, allowing for greater flexibility. Within the examples directory are some illustrations of the power offered by the extension mechanism:

What to expect before a 1.0 release

Though Lambdoc is perfectly useful right now, there are still some issues to resolve before I'm willing to tag a final 1.0 release. The Markdown support, in particular, is still far from complete. Other prominent issues include #24, #28, #29, #31, #32, and #33. Fortunately, though some of these issues may require backward incompatible changes, these are pretty minor.


Massive kudos to Gabriel "Drup" Radanne and Edwin Török for their feedback and code contributions.

Thursday 18 September 2014

Announcing Lambdoc 1.0-beta1

I'm happy to announce release 1.0-beta1 of Lambdoc, a library providing support for semantically rich documents in web applications. Lambdoc was designed with Ocsigen/Eliom integration in mind, though you may of course use it with other frameworks (it does not actually depend on the Ocsigen server or Eliom).

A brief overview of Lambdoc's features

  • A rich set of supported document features, including tables, figures, math, and source-code blocks with syntax-highlighting.
  • Built-in support for multiple input markups (see below), and easy integration of additional custom markups.
  • Runtime customisation of available document features. You may, for instance, declare that a certain class of users may not format text passages in bold.
  • Detailed error messages for mistakes in input markup.
  • A simple macro mechanism.
  • An extension mechanism.
  • The CLI application lambcmd, which allows conversion from any input markup to any output markup from the comfort of the command line.
  • Ships with decent looking CSS, easily customisable to your needs. Note that you'll need CCSS (available on OPAM) if you wish to modify the source for the CSS.

Supported input markups

This first beta of Lambdoc ships with built-in support for four different input markup languages:

  • Lambtex: Shamelessly inspired by LaTeX, Lambtex is my take on what LaTeX should look like if one were to get rid of all legacy baggage and gear it towards publishing on the web. Lambtex supports all of Lambdoc features, and even has a complete manual (which by the way I also recommend if you want to get a comprehensive list of all document features supported in Lambdoc).
  • Lambwiki: Largely inspired by the Wiki Creole syntax, Lambwiki is a light-weight markup language. Though it does not support some of Lambdoc's more advanced features, it is veritably light and its syntactic conventions are IMHO more memorable than Markdown's. Moreover, it also has a complete manual.
  • Lambxml: An XML-markup largely compatible with HTML. I don't find XML to be particularly human-friendly, but Lambxml might prove useful as a gateway for external XML-outputting tools.
  • Markdown: Love it or hate it, Markdown is ubiquitous, and as such supporting it is practically mandatory. Lambdoc supports Markdown via the OMD library, and therefore you should refer to OMD's documentation to learn about the supported flavour of Markdown. Note that Lambdoc's integration of OMD is still experimental, and there are still some issues to be resolved before the final 1.0 release. Prominently, OMD does not currently preserve location information, which is required for Lambdoc's error reporting mechanism. Fortunately, this issue has been acknowledged upstream.

Supported output markups

The only supported output markup is HTML5 via Tyxml. However, the functorial implementation used allows easy integration with Eliom.

Developer documentation

Unfortunately, developer documentation for this beta release is still sparse. Ocsigen/Eliom users are advised to take a look at the four-part tutorial included in the examples directory. The first step of the tutorial is a very minimalistic and straightforward illustration of how Lambdoc can be integrated in Eliom applications. Each subsequent step builds upon this foundation by introducing one new feature. Hopefully this will be enough to get you started.

About the extension mechanism

The extension mechanism is the latest addition to Lambdoc. It allows for the attachment of custom hooks to the processing of inline links, inline and block images, and the generic extern block. It is still somewhat experimental, but hopefully flexible enough to cover most use cases. Check out the last step of the tutorial for a basic example, or the source of lambcmd for a more complex real-world example which uses Bookaml to enable the special protocol isbn for links to books.

On the betaness of this release

Besides the aforementioned issues with the OMD integration, the lack of proper documentation, and the experimental character of the extension mechanism, the beta moniker for this release is also justified by the somewhat ad-hoc build system (I'm not sure OASIS even supports a project using module packs internally). Fortunately, using OPAM should spare you the trouble of worrying about this issue.

One important caveat: though I have no plans for further changes to the API, the betaness of this release also means I'll have no compunction in making them should the need arise.

Concluding remarks

The package is now available on OPAM. It has a tonne of dependencies, but since they are all packaged in OPAM, this shouldn't be a hassle. Note that some of the dependencies (Lwt, Ocsigenserver, Bookaml) apply only to the lambcmd CLI utility, and not the library itself. (Yes, I'm considering simplifying lambcmd for subsequent releases.)

Your comments/suggestions/criticisms are of course welcome. Feel free to send me an email or to open a ticket on the project's page at Github. I'll be particularly thankful if you find any bugs.

Tuesday 19 August 2014

Announcing Camlhighlight 3.0

I'm happy to announce release 3.0 of Camlhighlight, a library offering syntax highlighting and pretty-printing facilities for displaying code samples in Ocsigen applications. The library works by interfacing with the library portion of GNU Source-highlight, a popular application supporting the most common programming and markup languages.

This version features a smaller dependency set, now requiring Tyxml instead of forcing a dependency on Eliom. However, full compatibility with Eliom applications is maintained. The functorial interface used seems hairy at first glance, but it's actually not that complicated in practice. As an example, if your application uses only Tyxml and you wish to write a syntax-highlighted sample as a Html5.elt, you will first need to apply the Camlhighlight_write_html5.Make functor using Tyxml's Html5 and Svg modules as parameter:

module Html5_writer = Camlhighlight_write_html5.Make
include Html5.M
module Svg = Svg.M

Similarly, if you intend the Html5_writer module to be Eliom compatible, then use Eliom's Html5.F.Raw and Svg.F.Raw modules as parameter:

module Html5_writer = Camlhighlight_write_html5.Make
include Eliom_content.Html5.F.Raw
module Svg = Eliom_content.Svg.F.Raw

The package should be hitting the OPAM repository soon. Eliom users should beware that Camlhighlight requires Tyxml >= 3.2, whereas Eliom 4.0.0 requires Tyxml < 3.2. Therefore, should you want to use Camlhighlight in an Eliom application, you are advised to install the development version of Eliom (please see the Ocsigen site for instructions regarding Ocsigen's development repo for OPAM).

Thursday 7 August 2014

Announcing Bookaml 1.0

I'm happy to announce release 1.0 of Bookaml, a simple library for validating ISBNs, gathering information about a book given its ISBN, or to find any number of books matching given search criteria.

Bookaml is closely tied to the Amazon Product Advertising API, which it uses internally for retrieving book information. Therefore, if you intend to use the library's facilities beyond basic validation of ISBN numbers, you will need an Amazon Web Services account and associated access keys (these are freely obtainable, though registration is required).

(Note that though adding more features related to the Amazon Product Advertising API is an ever-present temptation, this is not the goal of Bookaml. You may find Bookaml's code a good starting point for that purpose, in which case I encourage you to fork the project. Bookaml itself should remain focused on its original goal.)

The core library depends on Batteries, Calendar, Cryptokit, and Ocamlnet's Netstring.

Bookaml is engine-agnostic on the matter of which XML parser and HTTP fetcher are actually used for contacting the Amazon servers and parsing the result. The intention was to be friendly towards monadic libraries such as Lwt or Async. Bookaml ships with two such engines, each having their own set of dependencies:

  • Bookaml_ocamlnet: Tyxml (for the XML parser) and Ocamlnet's Netclient (for HTTP fetching).
  • Bookaml_ocsigen: Tyxml (for the XML parser) and Ocsigenserver (for HTTP fetching). Note that this engine lives in Lwt monad.

The API documentation is available online. Particularly of interest is the Bookaml_ISBN module for handling and validating ISBNs (and which may be used even without an AWS account), and module type ENGINE, which documents the functions used for gathering book information and searching for books.

Another good place to get started is to peruse the small included example, which illustrates the use of the library from within an Ocsigen/Eliom server-side application (note that you will need actual AWS credentials if you intend to run this example).

Wednesday 28 May 2014

Announcing PG'OCaml 2.0

I'm happy to announce version 2.0 of PG'OCaml, a library offering type-safe access to PostgreSQL databases for OCaml applications.

This version no longer depends on Batteries, which hopefully will entice more Core users to give it a spin. Below is the list of the remaining changes, straight from the changelog:

  • Dario Teixeira and Jacques-Pascal Deplaix: fixing issues with arrays. This requires all array types to change from 'a array to 'a option array, which breaks backward compatibility.
  • Dario Teixeira's patch making PostgreSQL's NUMERIC type be converted to/from OCaml strings. This change is not backward's compatible, requiring a bump in the major version number (though there seems to be no actual code in the wild relying on the previous behaviour).
  • Dario Teixeira's patch adding function 'uuid', which exposes the unique connection handle identifier.
  • Jacques-Pascal Deplaix's patches adding 'catch', 'transact', 'alive', 'inject', and 'alter' functions.

Note that a couple of changes break backward compatibility, hence the new major version number. These changes were required to fix some long-standing issues, so I trust you'll be understanding.

This new release should be hitting OPAM soon. Alternatively, you can grab the source from the project's homepage. Happy hacking!

Tuesday 27 May 2014

Announcing Litiom 3.0

I'm happy to announce the release 3.0 of Litiom, a small library aiming to complement Eliom, the web programming framework part of the Ocsigen project. Litiom is basically a collection of modules offering high-level constructs for Web programming.

Along with some minor adjustments so that Litiom works with the newly release Eliom 4.0, Litiom 3.0 takes advantage of the bump in major version number to make a long sought reorganisation of the API. Yes, this means that Litiom 3.0 is not backwards-compatible with Litiom 2.1. On the bright side, porting code using the old Litiom to the new one is mostly a matter of search & replace, and should not require any major refactoring. Here's the changelog:

  • Split off choose related code into Litiom_choice module. As a consequence, functors and interfaces have been renamed.
  • In CHOOSABLE signature, rename elems to all for compatibility with Jane Street's Enumerate syntax extension.
  • Function choose now accepts transform optional function as parameter (defaults to String.Capitalize).
  • List of elements now of type t list (may be empty) instead of t * t list. Accordingly, parameter allowed in function choose is now also of type t list. (The function will raise Invalid_argument if provided with an empty list, however).
  • Check for value only once in function choose.
  • Select now takes required optional parameter, in track with changes in Eliom 4.0.

If you're porting code from Litiom 2.1 to 3.0 and you've never used the choose related code, then you'll just have to rename some modules/signatures/functors. For instance, signature Litiom_type.SIMPLE_BASE is now called Litiom_type.STRINGABLE, and functor Litiom_type.Make_simple is now just Litiom_type.Make. If on the other hand you used choose facilities, bear in mind that those have moved into their own Litiom_choice module. Therefore, the old functor Litiom_type.Make_choice is now Litiom_choice.Make.

The API documentation is available online, and includes some concrete examples to get you up-to-speed. Also, the package is already available in OPAM. Happy hacking!

Thursday 22 May 2014

Announcing CCSS 1.5

CCSS is a preprocessor/pretty-printer for CSS (Cascading Style Sheets), extending the language with support for variables and arithmetic. Version 1.5 of is out now, and is already available in OPAM, for your convenience.

This version features support for @keyframes (CSS3 animations). And of course, you may take advantage of CCSS's variables to make your keyframe declarations easier to maintain. Consider, for example, the source below:

Inflexion_point: 50%;
@keyframes test
from                    {opacity: 0;}
Inflexion_point         {opacity: 0.5;}
Inflexion_point + 1%    {opacity: 0.6;}
to                      {opacity: 1;}

And the corresponding output from CCSS:

@keyframes test
opacity: 0;
opacity: 0.5;
opacity: 0.6;
opacity: 1;

This version also brings a change to the internal AST representation that though minor, is bound to raise a few eyebrows, and thus requires a brief explanation. In CSS, the @charset at-rule is special in the sense that if present, it may only appear at the very start of the file. In previous versions, CCSS's AST encoded this requirement by treating @charset differently than other at-rules. In this new version, this is no longer the case. The end result is a cleaner and more concise AST, but it does mean that CCSS will now accepted @charset statements beyond the start of the file.

Before you cry "regression", bear in mind that CCSS was never meant to be a standards compliant CSS parser nor a CSS lint. The tool will happily accept semantically meaningless CSS while simultaneously rejecting perfectly valid CSS (it requires declaration blocks to always be terminated by a semicolon, for example, which the standard does not). Therefore, making CCSS slightly less standards-compliant is a viable option if the upside is a simpler, saner, and easier to maintain grammar. Having said that, in practice you won't find many incompatibilities, and I'll happily fix any egregious ones.

Monday 7 October 2013

Announcing CCSS 1.4

I've recently released version 1.4 of CCSS, a preprocessor/pretty-printer for CSS (Cascading Style Sheets). Though CCSS most definitely falls into the category of software developed to scratch a personal itch, it has gained other users in the meantime. Consequently, I reckon a few words are in order concerning the features/idiosyncrasies of this tool.

For those not familiar with CCSS or CSS preprocessors in general, the basic rationale for these tools is to fill some glaring shortcomings in the vanilla CSS language, namely the lack of variables and basic arithmetic operations. Many tools go further still, providing new ways to structure and organise CSS declarations.

There is certainly no lack of CSS preprocessors available, most of which have large user bases and plenty of fancy features. So, why CCSS? Does the world really need yet another CSS preprocessing tool? I cannot speak for the world. I can, however, relate that some years ago I was in need of a CSS preprocessor that supported variables, arithmetic, and did not choke on the few CSS3 constructs I was introducing into my stylesheets. Moreover, I was looking for a nice, clean syntax that felt like a natural extension to vanilla CSS. Finally, I don't think it was unreasonable to expect that a tool performing such a simple task to be fast. Unfortunately, the tools I tried back then did not quite meet these requirements, particularly where speed was concerned.

OCaml hackers being who they are, I reckoned that a better tool for my needs could be written in a weekend. And that's how CCSS was born. Fortunately, I was about right concerning the time estimate, which just goes to show how awesome the OCaml language and assorted tools (namely Menhir and Ulex) are for writing compilers.

This little history brings us to the most important point about CCSS that potential users should be aware of: it was never meant to be a fully-compliant superset of vanilla CSS. To illustrate this point, consider the humble semicolon, which according to the CSS spec is a declaration separator, and optional after the last declaration in a block. Omitting the last semicolon is, however, a brittle practice (it's all too easy to forget to add the semicolon when a new declaration is appended to the block), which is why users are advised to always write a semicolon at the end of each declaration, effectively treating it as a declaration terminator. Guess what? The CCSS grammar treats the semicolon as a mandatory terminator — spec be damned.

Variables are perhaps the most useful extension to CSS, and CCSS does support them. In previous versions, variables could only be assigned to expressions. The recently released version 1.4, however, introduces the possibility of assigning a whole declaration block to a variable. This feature is similar to what other preprocessors term mixins, so this is the nomeclature I'm using too.

The example shown below illustrates the use of variables to declare commonly used expressions. Note that syntactically, variable identifiers are distinguished by starting with a mandatory uppercase letter. This was chosen to make tooling (such as syntax highlighting in editors) easier.

Foo: 20em;
Bar: 1px solid black;
width: Foo;
border: Bar;

The second example concerns the use of variables to declare mixins, ie, declaration blocks that can be included within subsequent declaration blocks:

color: #fff;
background: #000;
font-weight: bold;

CCSS also extends CSS expressions with basic arithmetic operations (addition, subtraction, multiplication, and division). The operands must be CSS quantities (either dimensionless or with an attached unit), or other expressions that ultimately resolve into a quantity. Moreover, variables whose value is a quantity (or an expression which resolves into a quantity) may also be used as operand.

The operators are '+', '-', '*', and '÷'. Note that multiplication and division have precedence over addition and subtraction, but you may use parentheses to group operations. In addition, the choice of the non-ASCII character '÷' as division operator betrays CCSS's origins as tool designed to scratch a personal itch: it can be input with just a few easy-to-remember keystrokes in VIM, the editor I use. I had thus no motivation to find an ugly multi-character token to represent it. Consider thus the following input:

Left: 10em;
Right: 5em;
Total: Left + Right;
padding: (1.5em + 0.5em) * 2;
width: 2 * Total;

CCSS will produce the following output:

padding: 4em;
width: 30em;

The reader will have noticed that CCSS must be unit-aware when performing arithmetic. As a matter of fact, the programme performs a basic sanity check of units, and will complain if you try, for example, to add "1em" with "10px". By default, CCSS will not make any attempt to convert units even if they are convertible, such "cm" and "mm". If you wish for CCSS to attempt unit conversion, please provide option "--convert" on the command line (short version "-c").

Units can be grouped into four categories, and conversion is possible if the units belong to the same category. Upon conversion, the result will be the same unit as the first operand. The categories and corresponding units are as follows:

  • length: mm, cm, in, pt, pc
  • angle: deg, grad, rad
  • time: ms, s
  • frequency: hz, khz

As an illustration of unit conversion, the result for all the following arithmetic operations is the same, "2in":

foo1: 1in + 1in;
foo2: 1in + 2.54cm;
foo3: 1in + 25.4mm;
foo4: 1in + 72pt;
foo5: 1in + 6pc;

And that's it! Note that the project's development has recently moved to GitHub, so fork away!

Wednesday 24 October 2012

Announcing OCaml-bitcoin 1.0

OCaml-bitcoin is a library offering an OCaml interface to the original Bitcoin client API. It is a small and straightforward library, whose announcement would normally warrant nothing more than a quick post on the caml-list. However, given the controversy surrounding Bitcoin and my own conflicted view on the subject, I reckon some extra words are in order.

The problem with Bitcoin

TL;DR: it's the deflation thing.

Economics is still, at best, a protoscience. This remark is not meant to be snarky or dismissive of the entire discipline, however. Quite on the contrary — it is simply the acknowledgement that grounding Economics on the great edifice of science is a tremendously complex effort, which in all likelihood will require several more decades before the prefix proto even begins to lose its relevance. There are two main factors at play. First, the foundation upon which Economics must be grounded — Psychology — is also still in the early stages of shedding its own proto prefix. Second, one must consider the enormous difficulty of conducting controlled experiments in Economics, particularly as the scale moves into the realm of Macroeconomics.

Take for example the seemingly simple problem of determining whether an electronic commodity with a fixed and finite supply can function as a viable currency — the Bitcoin problem, essentially. Some decades from now (for sufficiently large values of "some"), Behavioural economics will have developed to the point that we are able to construct a highly sophisticated simulated human that models actual human economic behaviour to a close degree, and we should also have the computing power to simulate the interaction of many millions of these "individuals". With such a tool in hand we will be able to grasp all manners of emergent phenomena that arise in an economy under different currency regimes. The Bitcoin problem and all its variations will become experimentally testable, and the current bickering over the feasibility of Bitcoin should become a settled matter.

We are very far from the sci-fi scenario described above, however, which is mostly why there are several different schools of economic thought with often diametrically opposed predictions and solutions (perhaps I'm being overly charitable towards economists as a class, though; I reckon their most significant disagreements have more to do with the blinders of ideology than the genuine fog of a protoscience). Personally, I find that the most compelling argument is made by those who argue against the viability of a deflationary currency. The reasoning is fairly simple: suppose Bitcoin starts to gain some traction, and becomes useful beyond the need to buy Alpaca socks. Being limited in supply, one should expect Bitcoins to increase sharply in value. However, Bitcoins are more than just run-of-the-mill scarce, as total production is hard-capped at about 21 million coins (a number which should be reached around 2140, though close to 90% of those coins will be generated before this decade is out). Moreover, once attrition is factored in, the number of usable Bitcoins will even slowly decrease from a maximum. Now, if you have in hands a commodity which is increasing in value and has a fixed supply, you could not be faulted for guessing its value can only continue to rise, and therefore you would choose to hold on to your Bitcoins instead of spending them. In addition, this commodity becomes an attractive choice for speculation — people will acquire Bitcoins for the sole purpose of later selling them at a higher price, without any intention of actually spending them to buy goods or services. To give a concrete example, suppose you have acquired some Bitcoins and find yourself in a future where Bitcoin is gaining traction and increasing in value. Furthermore, you intend to purchase socks from some website offering either Bitcoins or Euros as payment options. Which would you choose? I reckon most people would opt to spend the currency that slowly decreases in value (the Euro) over the one that is gaining value every day.

Why is the above scenario a bad thing for Bitcoin? First, because if people are stashing Bitcoins under the mattress instead of actually using them as the preferred means of payment on the Internet, then Bitcoin will have failed one its primary goals, that of revolutionising online payments. I realise there may be some for whom Bitcoin's primary goal is instead that of providing a convenient store of value. For them, Bitcoin failing as a currency is not much of an issue, since they see Bitcoin's usefulness as a currency merely as a welcome side-effect. Besides this being a minority view (seriously, who wasn't attracted to Bitcoin primarily because of its potential as a truly revolutionary currency?), I think its premisse lies on very shaky ground. Which brings me to my second point: I think it will be impossible to dissociate Bitcoin's value from its usefulness as a currency. If people aren't actually using Bitcoins for online purchases, fewer websites will have an incentive to keep supporting it, and Bitcoin's usefulness as a currency will decrease. Moreover, a currency whose value is fast changing — even if that change is positive — is a pain to handle. The overall likely result would be a crash in confidence in Bitcoin, and consequent sudden loss in value. Obviously, supporters of Bitcoin's economic model argue against this outcome. For them, if Bitcoin's value were to start rising so fast as to endanger its usefulness as a currency, then the counteracting force would quickly nip the bubble in the bud, forcing a correction and ensuring a fairly stable valuation over time (though still slowly increasing over the long term).

At its heart, the disagreement about Bitcoin can therefore be reduced to different estimations about the nature of the forces acting to push up or bring down the value of the currency. Bitcoin supporter's argue that the forces form a negative feedback loop, and thus Bitcoin should enjoy the stability required to become a feasible currency and store of value. On the opposite corner, Bitcoin's detractors argue that once we factor in the "madness of crowds" and the human tendency towards herding behaviour, the Bitcoin system has an inherent tendency towards positive feedbacks, sudden state reversals, and therefore anything but stability. In other words, boom-bust cycles are part of the nature of deflationary currencies.

As I've already stated, I find the detractor's argument more compelling. Therefore, should Bitcoin start to get some traction and mainstream attention, I expect we will see the formation of an impressive bubble followed by a spectacular crash. Also to be expected during the ramp up stage are pronouncements that "this time is different" and the customary denials of the existence of a bubble. In many ways a repeat of the events of the Summer of 2011, but larger in size. We humans are a sorry lot.

At this point the reader may be asking, If you really think Bitcoin has such a serious flaw, why support it at all? Moreover, you cannot fault people for construing the release of this library as a tacit endorsement, now can you?. My answer is that the Bitcoin idea has enough potential to warrant support, even if in its current incarnation it may be doomed. Moreover, it is crucial that if Bitcoin should fail, that it fails by its own (dis)merit. The alternative — Bitcoin failing due to being outlawed, for example — would rob us of a valuable lesson in economics. Finally, I still hope that more mainstream visibility and the influx of newcomers may tip the balance of the Bitcoin community in such a way that those who take seriously the potential fatal flaw of a deflationary currency are no longer a minority.

Even if you disagree about my assessment, I hope I have at least conveyed the reason why I'm genuinely curious about the outcome of the great Bitcoin experiment, and thus why I would very much like to see it played out in full. Though I may be leaning towards the "Bitcoin cannot work" camp, I've also made clear that I'm fully aware of how difficult it is to predict the behaviour of billions of complex entities interacting with each other. In other words, it would not surprise me greatly if it turned out that I was wrong and that Bitcoin could work after all.

On the upside: Bitcoin is transparency friendly

One underappreciated advantage of Bitcoin is the transparency it enables. Suppose an organisation wanted to open its books to public scrutiny; the aim might be to give donours an assurance that any donation is in fact received, or to give tax authorities the guarantee that all income is accounted for. With Bitcoin, the organisation simply has to make public a master address through which all incoming payments are forwarded. Any donour or client can then use the blockchain to verify that their payment reached the intended destination. Similarly, tax authorities can verify compliance by making random anonymous purchases and checking whether those transactions are properly forwarded to the public master address.

The scheme outlined above does not even require that all payments use the same address (which would be an accounting nightmare). Instead, the organisation can generate a different receiving address for each transaction, with the extra provision that after a certain period (say, one day), all the coins received in that transaction would be used as inputs to another transaction whose target address is the public master address. Though this adds one extra layer of indirection, it still makes it trivial to verify that any given payment is accounted for.

(Note that the low-level control over which transaction outputs are used as inputs for new transactions is a feature introduced by the Raw Transactions API in the recently released Bitcoin 0.7. OCaml-bitcoin has full support for this new API, though users are advised to stand clear of it unless they have a good understanding of the Bitcoin system.)

To say that there is an (infantile) level of anti-tax and anti-government sentiment on the main Bitcoin Forum would be an understatement. So yes, the irony that Bitcoin could become the tax authority's best friend has not escaped me, and this is a realisation that I find most amusing.

Technical notes concerning OCaml-bitcoin

Behind the scenes, OCaml-bitcoin works by making JSON-RPC calls over the network to a running Bitcoin daemon offering the official client API. The obvious alternatives would have been to bind to a C library such as Libbitcoin, or to build in OCaml a full-blown implementation of a Bitcoin node. The former is an uncomfortable proposition until such third-party libraries have been vetted by time, whereas the latter — though without question a very interesting project which I hope someone will embark on — would have required a time investment orders of magnitude higher than what I had alloted to this side project.

Parsing and outputing JSON is handled via Yojson. For the most part, the fact that JSON-RPC is the method of interfacing with the Bitcoin client is hidden from users of OCaml-bitcoin. Ideally, it ought to be completely hidden, and the only reason it's not is because this first release does not convert all complex JSON objects into proper OCaml records. This seemed a prudent choice given that the official client's API is still in flux, but it may be fixed in a future release.

The API offered by OCaml-bitcoin is nearly identical to the original Bitcoin client API. The function names are the same, and so is the set of mandatory/optional parameters. The main difference is that a couple of calls have been split into two separate calls each. Namely, getrawtransaction is split into getrawtransaction proper and getrawtransaction_verbose, whereas getwork is split between getwork_with_data and getwork_without_data. The reason for splitting is that the original calls have very different behaviour and result depending on the presence of an optional parameter, and therefore in my opinion these should have been different calls to start with. Note that OCaml-bitcoin's API documentation includes a fairly complete description of each call, and fixes some of the mistakes and ambiguities found in the Bitcoin wiki. You may find it a useful resource even if you have no intention of using OCaml-bitcoin.

It should be noted that the official Bitcoin client parses incoming JSON-RPC calls in a brain-damaged manner. Essentially, all parameters are parsed by position, including optional parameters. This means that if you want to provide the optional parameter at position n, then you are obliged to also provide the optional parameter at position n-1, even if you are happy with the latter's default value. I tried to hide this brain damage from the OCaml-bitcoin API by internally using default values equal to those assigned by the Bitcoin client. There are however two calls where this is simply not possible: listsinceblock and signrawtransaction. If you wish to provide optional parameters to these functions make sure you obey the n requires n-1 rule, or you'll get a runtime error.

A minor rant concerning the use of floats for monetary values is also in order. Though internally the official Bitcoin client uses 64-bit integers for amounts (the integer value being the multiple of the base BTC unit, which is equal to 10 nanoBTC and informally called a "Satoshi"), the designer of the JSON-RPC API decided to convert these amounts into floats. Though there may be enough precision for this not to be much of an issue if proper care is taken, it still opens the door to improper use. Moreover, since users of the API are bound to use internally either 64-bit integers (the choice I made for OCaml-bitcoin) or some special fixed-decimal type, the floats will have to be converted back into some sensible format anyway. Therefore, the decision to choose floats for serialising BTC amounts strikes me as a poor one.

The actual communication with the Bitcoin client requires an implementation of HTTP's POST method on the OCaml side. While for many applications it would be convenient if OCaml-bitcoin were to provide a Lwt-friendly API, for others being forced to live under Lwt's monad would be an unwelcome outcome (for the purposes of this discussion, you may replace all references to "Lwt" with "Async"). For this reason, I decided not to impose any actual HTTP client implementation. Instead, the module offering the API is available as the result of a functor which takes as parameter a module implementing the basic POST method (of module type Bitcoin.HTTPCLIENT). This module must also define the monad under which the API calls are wrapped. Should no actual monad be required, then the identity monad can be declared, and thus users of the API need not be burdened by having to "live inside the monad".

For convenience sake, OCaml-bitcoin ships with two concrete implementations of Bitcoin.HTTPCLIENT. The module Bitcoin_ocamlnet is based on OCamlnet's Http_client, whereas Bitcoin_ocsigen makes use of Ocsigen's Ocsigen_http_client module. Note that this latter implementation uses the Lwt monad.

As for keeping track of connections, the usual solution is to have a function that creates a new connection handle, and have this handle be a mandatory parameter to any subsequent calls. This is not the solution used by OCaml-bitcoin. Instead, there are two different mechanisms by which you can configure the connection. First, since the module offering the API is already created via a functor, it makes sense to use this same mechanism to also indicate the connection endpoint. Obligingly, the functor that creates the API module actually takes two parameters: the previously described Bitcoin.HTTPCLIENT, and a Bitcoin.CONNECTION declaring a single value default of type conn_t option which if set defines the default connection. Second, each call accepts an optional parameter conn containing the connection endpoint information. If present, this parameter overrides any default connection, and is mandatory if no default connection was provided upon the functor invocation.

To conclude, note that the test directory contains OUnit-based unit tests. The tests are not comprehensive — the nature of the Bitcoin network requires a fairly long wait before one can be certain a transaction did occur as intended — but do cover a fair amount of important functionality. Obviously, the tests expect to be run using Bitcoin's Testnet, and will abort on startup if they find themselves being used on the main network. Just make sure you have at least a half-dozen coins on your Testnet balance, as these will be sent around (but back towards your wallet; you may lose some change due to fees, though). You can obtain Testnet coins for free from the faucet.