Wednesday, February 18, 2009

iolost vs list in Eralng

I started using Erlang few months back. Erlang has very little documentation (in most cases) and no documentation (in some cases).

In Erlang strings are implemented in terms of list. Unlike C, string is a linked list. So every character in a string occupy 2 machine words, one for the character and another for the next pointer. On a 32-bit machine, each byte translates to 8 bytes and on 64-bit box 16 bytes. This is an issue when you are dealing with transporting string over the network.

This becomes an issue when transporting data between erlang VM and c-node. If string is transported as internal representation, erlang will transport 8x number of bytes for a plain string. It is better to transport string as a binary (use erlang:list_to_binary/1 function). This will reduce the network overhead by 8x (on 32-bit machine).

If a string is constructed in VM and is transported to C-node, it is easier to construct a iolist and use iolist_to_binary at the end to transport it as a binary. This will reduce the overhead of string concats and also save a lot on the garbage collection.

For example,

construct_string() ->
[[<<"this is">>, <<" a">>], [" very long", <<" string.">>], <<" This string will go to c-node">>].

transport_to_code() ->
iolist_to_binary(construct_string()).

I saw huge improvement in memory as well as CPU utilization by using binaries instead of strings. Lesson learned is that do not use strings unless it is absolutely required. If binaries will do the work, use it.

No comments:

Book Promotion