Saturday, February 28, 2009

ibrowse module for erlang

In the project I am working ( cannot disclose what it is for now ), I had to do a lot of web service calls over http. The server is written in erlang as a gen_server which spawn the process for each request.

Apache is used as the user facing http server. We have written a content handle (apache module) which acts as a c-node by connecting to a local or remote (configured) erlang VM. gen_server in erlang VM registered as a named process (let's say foo). Each incoming user request is sent to the registered process and response from the registered process is sent back to the user (XML or JSON).

Process spawned by gen_server make one or more web service call to the back-end servers. We were using erlang's http client. Apparently it is hard to configure as there is not much documentation available to maintain a persistent connection to the back-end (web service) servers. Thus there is a lot of socket churning as web service calls are http connections and were getting opened and closed. Location of the back-end servers are not necessarily near to our servers (physically). That means there is not only communication overhead but also http connection overhead. If ping time between our server and remote server is 25ms, each http connection will take 75ms (3-way hand shake). Having a persistent connection will solve this problem.

We came across ibrowse module. This has a simple configuration to maintain persistent connection as well as configuration for maximum number of connection and maximum number of http requests that can be pipelined on each connection. This improved our latency by 60% at 400qps and there is plenty of room to grow till 1000qps (on single box).

I can also revert back to erlang's default http client provided I can set the profile for httpc while starting inets. I did not find a decent document to do so. If anyone know about it, let me know.

Wednesday, February 18, 2009

iolost vs list in Eralng

I started using Erlang few months back. Erlang has very little documentation (in most cases) and no documentation (in some cases).

In Erlang strings are implemented in terms of list. Unlike C, string is a linked list. So every character in a string occupy 2 machine words, one for the character and another for the next pointer. On a 32-bit machine, each byte translates to 8 bytes and on 64-bit box 16 bytes. This is an issue when you are dealing with transporting string over the network.

This becomes an issue when transporting data between erlang VM and c-node. If string is transported as internal representation, erlang will transport 8x number of bytes for a plain string. It is better to transport string as a binary (use erlang:list_to_binary/1 function). This will reduce the network overhead by 8x (on 32-bit machine).

If a string is constructed in VM and is transported to C-node, it is easier to construct a iolist and use iolist_to_binary at the end to transport it as a binary. This will reduce the overhead of string concats and also save a lot on the garbage collection.

For example,

construct_string() ->
[[<<"this is">>, <<" a">>], [" very long", <<" string.">>], <<" This string will go to c-node">>].

transport_to_code() ->
iolist_to_binary(construct_string()).

I saw huge improvement in memory as well as CPU utilization by using binaries instead of strings. Lesson learned is that do not use strings unless it is absolutely required. If binaries will do the work, use it.

Book Promotion