Thursday, November 13, 2008

Erlang and Garbage collection

In my new project, we did the whole coding in erlang. Yaws is our front-end. When running a load test, we observed that memory usage goes high and ultimately crash the erlang virtual machine. We did a thorough analysis of our code to find any possible memory leak (not possible as erlang uses garbage collection) and also studied the code for any possible recursion instead of tail recursion. We did not find any. It was pretty puzzling. Then I did find one strange thing. This is all about erlang garbage collection works.



In turns out that Erlang's garbage collection works per process. Each process has its own heap and stack. If a process is a long running one and does lot of things for each call, memory gets accumulated and will get released very slowly. When Erlang virtual machine is busy doing things, garbage collection thread gets lower priority and may never get called. Even if it is called, it may not do the whole memory garbage collection. Thus, memory keeps on growing.


In our code, we were running a gen server. This gen server was doing lot of work in the same process. We were running several hundreds of these gen_servers (monitored by a supervisor). Child specs for the supervisor was created dynamically depending on the initial configuration. In nutshell, it was doin the following




-module(foo_server)
-behaviour(gen_server)

% Not outlining all methods exported here, only the required ones

execute(Params) ->
gen_server:call({run, Params}).

handle_call({run, Params}, _From, State) ->
some_module:execute(Params). % some_module:execute does most of the work here.


Note that here some_module:execute/1 runs in the same process as gen_server. Idea here was that gen_server is monitored by supervisor and we get free process monitoring. But since same process gets called over and over again, memory accumulates. Running load test was increasing memory usage.



Now Erlang process comes to rescue. Because of functional aspect, it got pretty easy to change the code to run some_module:execute/1 in a different process and return the result to the caller via process message passing. I did the following changes


-module(foo_server)

execute(Params) ->
MySelf = self(),
spawn( fun() -> Result = some_module:execute(Params), MySelf ! {MySelf, Result} end),
receive
{MySelf, R} -> R;
after 1000 -> {error, timeout}
end.


Eureka! Running the same load test did not result in increased memory usage. Memory was pretty stable even after running the load test for more than an hour.


1 comment:

Unknown said...

Dude thanks! I have been going over code for the last 3 days trying to figure out why mem was not being released.

Book Promotion