nLeyten

Use Ocaml, save the world


(Note: this post is written in a tongue-in-cheek manner; take it too seriously at your own peril).

Since this blog is syndicated in the Ocamlcore Planet, readers may be wondering about the implications for our favourite language of the coming energy crisis I just wrote about. And in general, just how well are different computer languages adapted to an energy-constrained economy?

There are two different factors to consider: a) how efficient is the language at converting CPU cycles into useful work, and b) how efficient is the language at converting programmer time into useful programmes.

The importance of the first factor should be self-evident. If not, note that computers consume an increasing percentage of our energy, and that data centres all over the world also require a significant amount of power. Moreover, there is a wide disparity in the amount of time different languages require to complete the same task. This difference boils down to different approaches to programme execution (compiled languages tend to be more efficient than interpreted ones) and the quality of the code generated by compilers. The Computer Language Shootout, though not without its flaws, provides some ballpark figures on the efficiency of different languages. As you can see, Ocaml fares very well on the existing benchmarks.

The second factor, though it may not be obvious at first sight, may indeed be even more relevant to this discussion. Programmer time requires energy not only to sustain the Homo sapiens specimen that does the programming, but also because programming is to a large extent done in front of an energy-hungry computer. Therefore, by making the programmer more productive, a computer language can result in significant energy savings.

Measuring language productivity is a difficult task. Nevertheless, I would like to posit that the availability of a strong static type checker such as Ocaml's — one that catches most common mistakes at compile-time — is a plus in this regard. Even more so if we consider runtime errors as a loss of productivity. Another good proxy for language productivity is indicated by the terseness of its programmes. Again, the Computer Language Shootout may be of assistance.

So, how well does Ocaml fare when these two factors are taken into account? Very well, I am pleased to say. In fact, using some reasonable parametrisation, the results of the Shootout show Ocaml as the top rated language. We can therefore conclude that Ocaml will also have a role to play in saving civilisation from an energy crisis.

Peak Oil revisited


More than a year ago I posted here my ramblings on the possibility of world oil production being at, or close to, its peak. At the time, the topic of "Peak Oil" was still rather obscure, and there were still many advocating that the world's production capacity could continue to grow for decades to come. They were the ones saying that the price would soon drop down to $20 or $30 a barrel. Needless to say, since then events have precipitated, and just last week the price of oil jumped above the psychological barrier (for computer scientists, anyway!) of $127 a barrel (127 being of course the largest integer quantity that can be expressed with only 7 bits). It seems like a perfect time to revisit and update those predictions.

On the whole, I have nothing to detract from what I wrote 14 months ago. I may have been overly optimistic in some areas (more on that below) and events seem to be happening faster than what I expected, but I still expect the picture to remain more-or-less the same. While I may be now slightly less optimistic about our prospects for the coming couple of decades, overall I still think that the transition away from fossil fuels will occur without the catastrophic collapse of civilisation presaged by those with an inclination for doom.

Note, however, that these are far from being cornucopian predictions. If reality scares you, perhaps you should be visiting this site instead.

Is the peak real and happening now?

In other words, how can we be sure the rise in prices reflects a real constraint on the production side instead of just speculation and/or conscious throttling on the part of exporting nations in order to drive up revenue? Moreover, even if the constraints are real, couldn't this just be a temporary situation caused by a past underestimation of the current demand needs (bear in mind the huge rise in demand in the past few years by China and others), and that everything will be resolved as soon as production catches up?

Well, regardless of the actual numbers, we should reflect anyway on the utter and complete absurdity of having these crucial statistics about the wealth of the world economy be dependent on whims and state secrets of despotic regimes (Saudi Arabia, for one). This is a shortsightedness for which we could pay very dearly.

In any case, what leads me to believe we are now on the peak's "bumpy" plateau is the following:

  • Discoveries peaked in the 60s and despite all technological advancements, have been declining ever since. Recent discoveries (despite all the media attention they get) are pathetic in comparison to the fields discovered decades ago, and are typically made off-shore in very deep waters (read expensive, dangerous, and desperate locations). This fact alone shows that oil production is bound to peak sooner or later, though of course does not prove it is actually occurring now.
  • The number of countries past their peak keeps increasing. Three well-known recent examples are Mexico, Norway and the United Kingdom. The latter went in less than a decade from being a major exporter to a net importer of oil; as for Mexico, predictions are that in less than ten years it will cease exports altogether (bad news for the US, which imports a huge amount of oil from its southern neighbour).
  • The two big exporters, Saudi Arabia and Russia, seem unable to increase their production. Many argue they may also be close to peaking.
  • There are indeed a couple of oil megaprojects coming online in the next couple of years, but afterwards the situation seems dire. We seem to be now in what has been called a "Peak Lite" scenario. This means that even though the absolute peak may not yet have been reached, the oil industry is starting to show the symptoms of a post-peak situation. Namely, production is unable to keep up with demand, leading to a significant increase in prices.
  • Currently, world production sits at about 87 million barrels per day. If you take into account the new projects coming online, and the rates of depletion in existing fields, it is possible that this number may yet increase slightly within the next couple of years. However, I very much doubt it will ever reach 90 mbpd. Couple this with the ongoing increase in demand and the Export Land Model (more on that below), and I would bet against oil ever going below $100 a barrel (barring a major global recession, of course).

Aggravating circumstances

  • I didn't factor in the so-called "Export Land Model" (ELM), but in retrospect I should have. In short, because the internal demand for oil in the big exporting countries is increasing at a fast pace, the amount of oil available for export decreases at a much faster rate than what could be expected from depletion alone. This means that once past the peak, the decline in oil available for export may be much steeper than anticipated. In other words, the transition away from fossil fuels could feel more like shock therapy than a smooth ride.
  • Society is still by and large in denial. Moreover, most people (including politicians and journalists) are completely clueless about basic principles of thermodynamics and the orders of magnitude involved in energy discussions. To give you a concrete example that illustrates most people's complete detachment from the workings of the physical world, some weeks ago a leading Portuguese news magazine published a vision of society in 15 years time; they talked of "cars running on wind power", and showed a picture of a tiny wind turbine on top of a futuristic-looking car! It would almost be funny if the consequences of this kind of thinking were not so tragic.
  • Energy Return on Energy Invested (EROEI) is a problem that goes beyond oil. Not only do fossil fuels offer relatively cheap and very dense energy, but you get a huge return for each unit of energy you invest in extracting them. In the early days of the oil industry, when extraction was as simple as digging a hole in the backyard, EROEI was as high as 100:1. Today, because drilling at huge depths and off-shore is quite energy-intensive, the EROEI for oil has decreased to 20:1 or 15:1. And note that for renewables and nuclear, the EROEI is even much lower than that. (As a side note, take heed that scholars have pointed out that a decrease in EROEI was a contributing factor in the decline of the Roman empire. One of the most interesting interviews I've listened to recently addresses this issue: it features Thomas Homer-Dixon discussing his latest book, "The Upside of Down"; I very strongly recommend listening to it).

Consequences for industry, commerce, and tourism

I won't repeat what I wrote last year. I merely wish to emphasise a few points that I find most relevant:
  • Locality will become much more important. With transportation costs rising significantly, many current assumptions will no longer be valid. This will have impacts on industry (suddenly it won't make as much sense to produce cheap plastic toys in China), agriculture (such as importing out-of-season fruit half-way across the globe), and of course, tourism. I think you can find countless other examples where the assumption that transportation costs are near zero have shaped our economy.
  • The airline industry as we know it is seriously screwed. Note however that "screwed" is not the same as "doomed to extinction" like many are quick to predict. Fares are still ludicrously cheap by historical standards, and even if five years from now they are 3-4 times higher, there will still be a sizable amount of air travel going on. Nevertheless, air travel will take a serious hit from the effects of Peak Oil, and I reiterate that current plans for expanding and building new airports will look very foolish in less than ten years time.

    In addition, the travel industry will undoubtedly adapt and many of its facets will look fairly different in 10 or 15 years from now. Part of the adaptations will be purely technological (as an example, consider that turboprop planes are more fuel efficient than jets), others will come from purely a business choice (the "no frills" concept will go away, because in a context of much higher fares due to fuel costs, the cost of frills will be proportionally tiny, thus giving an advantage to companies that do provide them), and others may stem from a rethinking of the very concept of travel. Consider for example the possibilities offered by the eventual return of the airship (as in the "Hindenburg", hopefully minus the fire and oh-the-humanity). Presently, the "getting there" part of air travelling consists of just the ordeal of having to sit in a confined space for hours on end. What if instead the holiday trip were to start the moment you stepped on a comfortable and spacious airship? It wouldn't matter so much if a trip took a couple of days instead of a few hours.

  • I doubt that economy will fare well during the first 5-10 years of the transition. The scenario I find most likely is a return of the stagflation that plagued the 1970s. However, note that the economy doing badly for an extended period of time is not the same as catastrophic economic collapse. Last time I checked, there were no zombie hordes roaming the streets during the 1970s (though they would explain the whole Disco phenomenon).

Winners and losers

In the grand scope of things we'll all be losers. Even if somehow an energy-rich country were able to isolate itself from the events on the rest of the world, it would still lose simply because the misfortune of any single country reflects the moral collective failure of all humanity.

With the above caveat in mind, some countries are without a doubt better prepared than others to face an energy crisis. Here's my take on some noteworthy examples:

  • I stick to my previous observation that China will suffer serious growing pains. I realise that the consensus among many is that this will be "the century of China", but I beg to differ. On the short term, the rise in oil prices will impact its status as the "world's factory" (note that its export-based economy is dependent on the assumption that transport costs are negligible). On the longer term, China faces a serious sustainability crisis. The challenges of the energy crisis will combine with the effects of climate change, soil erosion, and aquifer depletion. Bear in mind that this a country that needs to feed 1.3 billion people.
  • The picture for India is similar to that of China, though India's environmental problems are not yet quite as severe. On the other hand, there is the further complication that India's population time bomb is far from being under control.
  • Most of the developing world is in serious trouble. Note, however, that the energy crisis is only one small part of the problems faced: climate change, environmental degradation, and — the elephant in the room — overpopulation. Many countries are indeed just disasters waiting to happen (Haiti comes immediately to mind; it's just one hurricane away from catastrophe). Also, I wouldn't be surprised to see Somalia-style collapses of individual states.
  • Countries rich in resources and with fairly low population densities are likely to fare relatively well. Brazil and Canada come to mind. Note, however, that I would not include Australia in that list. Though rich in energy resources, Australia is bound to face problems stemming mainly from the degradation of its very fragile environment and the effects of climate change.
  • Europe as a whole should fare relatively well, though there are of course stark differences in the degree of preparedness of different countries. France, thanks to its reliance on nuclear power, extensive train network, and ongoing projects to build and expand tram and light rail lines is an example to follow. The United Kingdom, on the other hand, will be a victim of years of neglect in its public transit system (particularly railways), and its electric power production capacity. In short, problems stemming from what has been termed "The Anglo Disease".
  • As it currently stands, the United States is very badly prepared for an energy crisis. I would really, really, like to think that Americans will soon wake up and put their ingenuity to work. They would embark on a "Manhattan plan" for fuel alternatives, abandon the suburbia experiment, and redesign their cities in a public-transit and walk-friendly manner. A friend of mine likes to remind me of a quote by Churchill, who said that "You can always count on Americans to do the right thing — after they've tried everything else".

    Unfortunately, recent history does not bode well for the type of reaction we can expect from the US. And note that I'm not just referring to the invasion of Iraq (is there still anyone who maintains the illusion that its purpose was not to take control of the sizable Iraqi reserves?). No, that is simply a reflection of the path Americans chose to take in the 1980 election. Back then, they were given the choice between a candidate aware of the future energy crisis and who advocated some near-term sacrifices to move their economy away from fossil fuels (that candidate was Jimmy Carter), and a candidate who instead told the Americans to screw conservation, that it was "morning in America", and that it felt much better to simply bury their heads in the sand and not to worry about the future. The latter candidate was Ronald Reagan, who has you might recall won by a landslide victory. (On a related note, it is disheartening to see Hillary Clinton — otherwise the most intellectually capable of all three major candidates — follow the short-sighted and populist gimmick proposed by McCain of dropping the gas tax for the holidays).

    Note, however, that I wouldn't be so quick to dismiss the US entirely. It's a huge country, rich in resources (one can sometimes forget that even though it is well past its peak, the US is still the third largest producer of oil in the world), a major agricultural exporter, and with a population density that is sufficiently low to make the country sustainable in the long term.

    In a sense, the biggest challenges the US faces are a crooked political system, and a blind faith on capitalism and the invisible hand of the markets. The biggest danger for the US (and the rest of the world) is that the energy crisis is successfully exploited by the usual ensemble of populist and extremist politicians. Despite this, I reckon that Americans will eventually do the right thing — they've been trying everything else up till now, after all.

    (A brief note on populism: the energy crisis we're facing is in large part a consequence of past populist and short-sighted policies; it is somewhat ironic and definitely discouraging to note that the very same populism that got us into trouble in the first place is the one that typically profits the most from situations of crisis).

In the short term

Though there may be indeed a speculative component to the current price and therefore some ease up won't be surprising, I very much doubt oil ever again get below $100 (again, barring a major recession).

OPEC will have their annual meeting in September. If any, the increases in production announced will be small and insufficient to meet the rise in demand. A couple of months later, the International Energy Agency (the watchdog organisation over matters of energy) is scheduled to issue a report concerning oil reserves. I expect their current rosy predictions that production could grow all the way to 2030 to be revised. This is likely to be the moment that "Peak Oil" and "Energy Crisis" enter the mainstream media. Hopefully not to long afterwards, the true nature of the problem — that these are just aspects of a much broader "Sustainability Crisis" — will become the focus of society's concerns.

Oh, the humanity!


Labelled and optional arguments, and polymorphic variants rank among Ocaml's most interesting features. They can lead, however, to some pretty funky error messages. Here's one example:

File "main.ml", line 160, characters 38-44:
Error: This expression has type
         (unit, unit,
          [ `Attached of
              [ `Internal of ([< `Coservice | `Service ] as 'a) * [ `Get ] ]
              Eliom_services.a_s ],
          [< Eliom_services.suff ] as 'b, 'c, unit, [ `Registrable ])
         Eliom_services.service ->
         (Eliom_sessions.server_params ->
          ('d, Litiom_blocks.out_t) Litiom_blocks.t ->
          Eliom_predefmod.Xhtml.page Lwt.t) ->
         (unit, unit,
          [ `Attached of
              [ `Internal of [< `Coservice | `Service ] * [ `Get ] ]
              Eliom_services.a_s ],
          [< Eliom_services.suff ] as 'e, 'f, unit, [ `Registrable ])
         Eliom_services.service ->
         (Eliom_sessions.server_params ->
          ('g, Litiom_blocks.out_t) Litiom_blocks.t ->
          Eliom_predefmod.Xhtml.page Lwt.t) ->
         carry:unit ->
         Eliom_sessions.server_params ->
         (unit, int,
          [> `Attached of
               [> `Internal of [> `Coservice ] * [> `Post ] ]
               Eliom_services.a_s ],
          'e, 'f, [ `One of int ] Eliom_parameters.param_name,
          [> `Registrable ])
         Eliom_services.service
       but is here used with type
         (unit, unit,
          [ `Attached of [ `Internal of 'a * [ `Get ] ] Eliom_services.a_s ],
          'b, 'c, unit, [ `Registrable ])
         Eliom_services.service ->
         (Eliom_sessions.server_params ->
          ('d, Litiom_blocks.out_t) Litiom_blocks.t ->
          Eliom_predefmod.Xhtml.page Lwt.t) ->
         carry:unit ->
         Eliom_sessions.server_params ->
         (unit, 'h, [< Eliom_services.post_service_kind ],
          [< Eliom_services.suff ], 'i,
          [< int Eliom_parameters.setoneopt ] Eliom_parameters.param_name,
          [< Eliom_services.registrable ])
         Eliom_services.service

Obvious, isn't?

Simple benchmarks on the Ocsigen server


I have looked for a web framework shootout similar to the one that exists for computer languages. Sure, performance is hardly the deciding factor when choosing a web framework, and benchmarks don't say anything about features, robustness, and security. Note, however, that the same can be said about programming languages in general and that does not prevent people from using the shootout in pissing contests. I can, for example, easily place Ocaml on top of that list just by giving a larger weight to the gzip size of the programme (a choice not altogether arbitrary: bear in mind that gzip size is a good indicator of a language's expressiveness; the smaller, typically the more expressive a language is).

Another important caveat to consider is that the bottleneck for the typical web application lies on the database side, not on the webserver. In many cases, the webserver does little more than to fetch and to do minimal processing on the results provided by the database.

Anyway, though I can't make comparisons between these numbers and those from other frameworks using other languages (who in their right mind would want to use PHP?), it is nevertheless interesting to have an idea of how well Ocsigen applications perform. I am therefore sharing the results of some simple tests I ran on two different machines: an old Celeron running at 500 Mhz (a very slow machine by today's standards), and a modern Intel Pentium Dual E2160 running at 1.80 GHz. Note that architecture-wise, the former machine is an x86, while the latter is an AMD64.

All of the tests concern the generation of dynamic content using Ocsigen's Eliom. Though test 1 always produces the same result and could therefore easily be cached, that has not been done. The idea was to test the most demanding operation for a web application: the dynamic generation of pages (typically personalised for individual users). Also, there are two sets of results for each test; one running Ocsigen in bytecode, and another using native code. Note that because Ocsigen relies on dynlink, and Ocaml 3.10 does not support the dynamic linking of native code, I had to use the Ocaml CVS HEAD for the tests. Native dynlinking will arrive with Ocaml 3.11 (coming this summer?).

I used Siege to perform the actual benchmarking. Each test was performed with 10 concurrent client threads, and ran for 30 minutes. Siege was executed on the same machine where the Ocsigen server was located. The results are presented in terms of the number of transactions performed per second. The higher the better, of course.

Test 1: simple page

This service takes no parameters and simply outputs a page with a "bench1" header. This is the Eliom handler that creates the page:

let bench1_handler sp () () =
Lwt.return
(html
(head (title (pcdata "bench1")) [])
(body [h1 [pcdata "bench1"]]))

And here are the results obtained:


Celeron Pentium Dual
Bytecode 110.41 590.49
Native 197.32 1461.10

Test 2: arithmetic on integer parameters

This service takes two integers as a GET parameter, and outputs the result of some common arithmetic functions performed on those numbers:

let rec gcd a = function
| 0 -> a
| b -> gcd b (a mod b)


let lcm a b = (a * b) / (gcd a b)


let bench2_handler sp (a, b) () =
Lwt.return
(html
(head (title (pcdata "bench2")) [])
(body
[
h1 [pcdata "bench2:"];
p [pcdata (Printf.sprintf "%d plus %d is %d" a b (a+b))];
p [pcdata (Printf.sprintf "%d minus %d is %d" a b (a-b))];
p [pcdata (Printf.sprintf "%d times %d is %d" a b (a*b))];
p [pcdata (Printf.sprintf "%d divided by %d is %d" a b (a/b))];
p [pcdata (Printf.sprintf "%d mod %d is %d" a b (a mod b))];
p [pcdata (Printf.sprintf "The GCD of (%d,%d) is %d" a b (gcd a b))];
p [pcdata (Printf.sprintf "The LCM of (%d,%d) is %d" a b (lcm a b))]
]))

Note that the client had to request a page with two GET parameters. These were generated randomly with the special shell variable $RANDOM, and stored in a urls.txt file read by siege. $RANDOM produces random integers between 0 and 32767 (yes, I know there was a chance that both "b" would be 0, causing an exception; I checked the urls.txt beforehand, making sure that didn't happen). Anyway, here are the results obtained:


Celeron Pentium Dual
Bytecode 85.76 444.06
Native 172.39 1215.19

Test 3: generate page with 100 pseudo-random paragraphs

The final test is a bit more demanding. It takes no parameters, and outputs a page consisting of 100 paragraphs containing both a deterministic and a random component:

let random_paragraph =
let rng = Cryptokit.Random.device_rng "/dev/urandom"
and to_hex = Cryptokit.Hexa.encode () in
fun i ->
let random_num = Cryptokit.Random.string rng 80 in
let random_str = Cryptokit.transform_string to_hex random_num in
p [pcdata (Printf.sprintf "This is paragraph %d: %s." i random_str)]


let bench3_handler sp () () =
Lwt.return
(html
(head (title (pcdata "bench3")) [])
(body
[
h1 [pcdata "bench3:"];
div (List.init 100 random_paragraph)
]))

Results obtained:


Celeron Pentium Dual
Bytecode 8.01 40.66
Native 28.67 185.00

Conclusion

First, there is a clear advantage to using native code (no surprise there!). Ocaml 3.11 will therefore be very welcome for Ocsigen users. Moreover, note that the byte/native code difference is even more acute on the AMD64 architecture. There is at the moment still some discussion on whether or not the AMD64 port should use the "-dlcode" compiler option by default (this option is necessary for native dynlinking). Hopefully further tests will come to the conclusion that there is little or no performance impact of using it.

It would be interesting to investigate how other languages/frameworks fare on similar tests. Even more interesting will be to repeat these tests on a more real world scenario, with an actual application that accesses a database, etc. I intend to conduct these soon on Lambdium-light.

Nleyten@1400: Netplexing


The Ocamlnet set of libraries ranks among the many gems available for Ocaml. The blurb on the project's page describes it this way:

Ocamlnet implements a number of Internet protocols (http client & server, cgi and cgi variants, SunRPC, FTP, POP, SMTP) and is a strong base for web and Internet programming. Many protocols can even be driven in an asychronous way, since Ocamlnet defines a framework for asynchronous implementations (equeue). There is also a generic server framework (netplex). There are a number of accompanying data structures like mail messages, URLs, buffers, channels, and also routines for character set conversions.

Netplex is of particular interest for me at the moment. It is basically a generic server framework, allowing one to build full-fledged servers with very little effort. If you've ever built a server from scratch, you will know that while building a basic server is relatively straightforward, things start to get complicated if you intend to move from a simple toy into a production-quality application. Pretty soon you'll start having to worry about keeping pools of slave workers, managing the workload, and synchronising the entire apparatus. Netplex takes care of all of that stuff for you. The end result is that you get an efficient, secure, and fully-featured server with just a few lines of code.

I am using Netplex to build a "parsing server" (aka the 'Parserver'). It is basically a little daemon that sits in the background providing a service that translates a document formatted in one of the supported markups (like Lambtex) into the native representation used by Lambdium. Placing this service in an external, independent daemon makes sense for a number of reasons:

  1. it makes the system more scalable, since we can easily move the parsing server into its own machine(s) if necessary.
  2. it makes the system more secure, since the parsing daemon can run as 'nobody' as shield the rest of the system from potential attacks based from the parsing routines.
  3. Applications built using the Ocsigen framework should be built using the Lwt library for cooperative threads (Lwt stands for 'Lightweight Threads'). At the present no parser-generator includes support for Lwt. It does make sense therefore to place the parsing outside the main application.

As for SVN's 1400 revision log itself, it reads "Defined the protocol for communication with Parserver".

Ocsigen 1.0.0 released!


Vincent Balat announced today on the Ocsigen mailing-list the release of version 1.0.0 of that of that web server + web programming framework. This is great news, as that magical "1.0.0" will hopefully attract more users to this exciting project. Moreover, I can assure you this is a rock solid release; in fact, it has been fairly mature since the 0.6 days, and the 0.99.x series has been in out and about for almost a year now. (In case you haven't noticed, this is the framework I have been using to develop Lambdium and Lambdium-light).

So, why should you care about Ocsigen? Well, it offers much more than a high-performance web server written in Ocaml (though that's also cool). Integrated with Ocsigen comes Eliom, a web programming framework based on the Ocaml language. It brings the advantages of a safe, functional, strongly-typed, and amazingly fast language to web development. Here are some of the highlights:

  • Static checking of XHTML:

    Either using native Ocaml or Ocamlduce, the type system makes sure at compile time (no runtime penalties!) that the pages you generate are valid XHTML.

  • Web sites consistent by construction:

    Again thanks to Ocaml's type system, there's just no way for a website to include broken links, type mismatching of form fields, wrong evaluation of page parameters, and similar brittleness. And again, all this is checked at compile-time.

  • One page, one function:

    This not only helps to ensure the modularity and conciseness of web sites, but it is a more elegant solution to the consistency problem than the templating offered by other frameworks.

  • Continuation-based web programming:

    This is one of those features that is a bit hard to explain, but whose advantages are obvious once you actually start using it. It does simplify tremendously the correct use of the "back" button in browsers or "what if" scenarios when users open multiple tabs from the same form.

  • Automatic handling of sessions:

    Eliom takes care of most of the low-level stuff, automatically taking care of session management. Moreover, it offers you a wealth of possibilities, including the choice between volatile or persistent sessions (even if the web server is restarted!), public versus private sessions, etc.

  • Amazing speed without sacrificing expressiveness:

    Ocaml is known to be a very fast language, comparable to C++ in terms of speed. Moreover, it is a very high-level language, with a degree of expressiveness comparable to Python or Ruby. Wouldn't you like to to have the best of both worlds? Now you can.

In short, Ocsigen brings web development from the infant age into maturity. So, if you are curious, or just simply tired of dealing with the slowness of Ruby on Rails or the epic brain damage that goes by the name of PHP, I suggest you pay a visit to Ocsigen's site and take a look at the tutorial. The only prerequisite is that you know the Ocaml language. But it's well worth it.

S-expressions for long-term storage of Ocaml values


Call it marshalling, pickling, serialisation, or whatever else you wish; this operation — where a value (or "variable" in non-FP languages) is extracted from a programme at runtime in order to be stored in disk or transferred through the wire — is critical for many applications. Most programming languages therefore include in their standard libraries some means of performing it.

Ocaml ships with the venerable Marshal module. It is extremely simple to use: suppose we wished to convert into a string str a marshalled representation of a value manuscript of type Manuscript.t. This would suffice: let str = Marshal.to_string manuscript []. The inverse operation — taking a string with a marshalled representation and converting back into a programme value — is also dead easy: let manuscript : Manuscript.t = Marshal.from_string str 0.

The attent observer will have noticed the type annotation in the unmarshalling example above; in general, unmarshalling does require an explicit type annotation (which otherwise is rarely needed in Ocaml). The reason is that the type inference mechanism may not have enough information to determine what is the type of the marshalled representation (imagine, for example, that the routines performing the marshalling and unmarshalling reside in different programmes!). And here lies the Achilles' heel of marshalling in Ocaml: should the programmer make a mistake in specifying the type to be unmarshalled, the programme will most surely segfault. This problem also occurs if a version of the programme stores a marshalled value, which is then read back by a subsequent of the same programme where the value's type has been modified (even if ever so slightly).

Over the years, there appeared a number of extensions to the Ocaml language offering type-safe marshalling ( HashCaml may be the most well-known, but it's not the only one). And judging from comments by Xavier Leroy (the primary developer of the Ocaml language) at the Ocaml users meeting in Paris this last January, there's a good chance that type-safe marshalling will make it into the core language in the near future.

In Lambdium-light, stories and comments are stored in the database backend using the marshalled representation offered by the Marshal module. While this works and is extremely fast, it does have the problem of not being very resilient to future changes in the story and comment format. Therefore, I have been looking at alternatives to Marshal that provide some degree of backwards compatibility while not sacrificing too much speed.

I suspect that the XML-fanboys in the audience wouldn't even think twice, but personally I am far from finding XML a good solution for many of the problem domains where it is applied. For this reason, I asked about this problem in the Caml-list.

Anyway, I am currently leaning towards choosing a solution based on Sexplib, a library that converts Ocaml values to/from S-expressions. It comes with a syntax extension that given a type t automatically writes the sexp_of_t and t_of_sexp "marshalling" functions. Making the transition from Marshal to Sexplib is therefore very straightforward. Another advantage is that S-expressions are essentially just text and are therefore human-readable. Moreover, the format is very compact (a lot more than XML!) and fairly easy to parse. Speed-wise, while obviously not being as fast as Marshal, it is still reasonably fast, especially in native code.

Suppose I have a fairly large story of value Manuscript.t. On my machine, and using Ocaml byte code, marshalling and unmarshalling this value 100,000 times takes approximately 19.68 seconds. Using Sexplib, these operations take 1175 seconds, which is about 60 times slower. However, the times in native code are respectively 17.98 and 105.3 seconds — Sexplib is less than 6 times slower than Marshal. Given the other advantages of Sexplib, these are numbers I can live with.

If you are curious about how I got these numbers (and you should — never take anyone's word at face value when benchmarks are involved!), here follows the run-down of the small programme I used for testing.

run_marshal is a function that given a manuscript, does a marshalling followed by an unmarshalling using the Marshal module. run_sexplib does the same thing but using Sexplib. Note that the latter function actually first converts the Manuscript.t into its Sexp.t representation and then this latter value into a string (and vice-versa for the reverse operation):

let run_marshal manuscript () =
let marshalled = Marshal.to_string manuscript [] in
let unmarshalled : Manuscript.t = Marshal.from_string marshalled 0 in
ignore (unmarshalled)


let run_sexplib manuscript () =
let manuscript_sexp_old = Manuscript.sexp_of_t manuscript in
let mach_str = Sexplib.Sexp.to_string_mach manuscript_sexp_old in
let manuscript_sexp_new = Sexplib.Sexp.of_string mach_str in
let manuscript_new = Manuscript.t_of_sexp manuscript_sexp_new in
ignore (manuscript_new)

I also define a generic benchmarking function that loops a provided function 100,000 times. It uses Unix.gettimeofday to retrieve timing information:

let benchmark test =
let start = Unix.gettimeofday () in
for i = 1 to 100000 do
test ()
done;
let finish = Unix.gettimeofday () in
let duration = finish -. start in
duration

Finally, the main programme simply creates a new manuscript (assume that function get_manuscript returns a new parsed manuscript) and calls the benchmark function with the marshal and sexplib routines:

let () =
let manuscript = get_manuscript () in
let duration_marshal = benchmark (run_marshal manuscript) in
let duration_sexplib = benchmark (run_sexplib manuscript) in
Printf.printf "Marshal: %f\n" duration_marshal;
Printf.printf "Sexplib: %f\n" duration_sexplib

Nleyten@1300: the joy of parsing


I know this is bound to brand me as an incurable language geek, but I find Abstract Syntax Trees a thing of extraordinary beauty. Yeap, I am still immersed in the creation of the languages and associated parsers for the markup of Lambdium stories and comments. And I'm finding that I'm really enjoying all the stuff in this trade.

I have now completed the definition and most of parsing tools for the full-fledged Lambtex parser. As it turned out, introducing support for labelling and references forced me to rethink some of the prior assumptions I had for the Lambtex language. It was a difficult birth of sorts, and the language definition did go through several revisions until it arrived in its present state. I am now however quite pleased with the result: Lambtex manages to be both simple and powerful, gracefully handling most complex cases without burdening beginners with an arcane syntax. I will write a small manual (to be part of Nleyten's help system) soon enough — you will then get a clearer idea of what I mean.

I also realise that I'm lucky enough to be able to be creating both the language and the tools to parse it at the same time. The two play an intricate dance together, and even minor changes to the syntax of a language can have a profound effect on the parsing tools. I am also thankful to be doing this in Ocaml: the language is just perfect for compiler writing!

As for the SVN changelog at revision 1300, it reads "Adapted tokenizer to new Lambtex scanner". The Lambtex parsing chain is composed of four separate modules (as typical with most compilers): at the lowest level there is a scanner that splits the input into recognisable language atoms; closely tied to it there is a tokenizer that converts those atoms into proper parser tokens; then we have the parser proper (I am using Menhir to generate the parser; it's similar to ocamlyacc, but better); and at last there is a postprocessor that verifies that the document is semantically correct (all references point to valid targets, for example) and produces its final version.

The bottomless pit of cultural relativism takes another victim


There are groups of people whose comments have to be taken always with a grain of salt. Heads of religious confessions are obviously one such group — let's face it, you don't get to be the head of organised lunacy unless you're pretty much messed up as an individual. That's just a fact of life.

You must have seen it already on the news, but if not, here's a link. In short, Rowan Williams, aka the "Archbishop of Canterbury" (that's one of the titles that certifies the person has a fucked up moral sense; other such titles include "Pope" and "Imam"), stated that adopting some aspects of Sharia law in the UK would be unavoidable and a good thing.

I can't even begin to understand the layers upon layers of perverted thinking the could have provoked such statements. If ever anyone needed a perfect example of the dangers of cultural relativism, this one will be hard to beat.

Another thing that I find particularly troubling with this sort of news is the continued reverence these sorts of reactionary idiots get from journalists. Seriously guys, take heed from a long-learned lesson in Internet newsgroups: don't feed the trolls!

(A quiet announcement)


By the way, though I did make an announcement in the Ocsigen mailing-list, I never mentioned it in this blog that version 0.80 of Lambdium-light has been available from the usual place for the last couple of weeks (sometimes I forget that no one who reads this blog subscribes to that mailing-list!). It's feature-complete, though the code will still suffer some refactoring.