Concurrent future...: 2010

Thursday, October 14, 2010

PHP author(s) doesn't understand functional programming

PHP try to emulate functional programming by providing the functions like array_reduce. But what they lack is the knowledge of principals of functional programming.

I had a problem where there are multiple arrays and need to be merged. array_merge function works only on 2 arrays. Merging multiple arrays need a for loop OR use array_reduce function [I thought]. But it turns out that array_reduce function is not functional as I imagined. Even though the documentation say that the last parameter to this function is the initial value for the array fold function, actually it is not.

If the initial parameter is not an integer, PHP will default to zero. For the above problem I tried the following

array_reduce($my_arrays, "array_merge", array());

What I expected it is to work as

$initial = array();
foreach ( $my_arrays as $m ) { $initial = array_merge($initial, $m); }

What I got was
PHP Warning: array_merge(): Argument #1 is not an array in Command line code on line 1
PHP Warning: array_merge(): Argument #1 is not an array in Command line code on line 1

These kind of behavior will lead to serious problem in the system and is hard to debug.

Sunday, October 10, 2010

State of the Interview process

I have attended several interviews and also have interviewed several people. I have seen the difference between the people I interviewed with and the way I do interview.

In the interviews I have given for several job positions at companies I don't want to mention here, most of the time I felt that the person who is interviewing me is not at all achieving his goal of the interview. What that person is trying to do is to prove to me that he/she is smarter than me in all respect.

For me interview is a combination of 2 words "inter" and "view". That means exchanging views among people involved in the process. Thus what I expect is to have a debate on the subject in question. But mostly what happens is not a debate.

In one interview, I was given a problem to solve. Instead of listening to my solution and to have a debate on my solution, the Interviewer was playing some game on this iPhone. I was explaning the solution to him and that person was not even listening to me. Instead he was trying to be an smart ass, saying your solution is wrong. My question is how can a solution be wrong if it can solve the problem. The solution may be not efficient enough. Nevertheless solution is never wrong. Other person may have different solution in mind (which may or may not be the efficient one), but still the purpose of the interview is not achieved.

What I like to do when I am conducting an interview is not to prove that I am smarter than the interviewee (which I am :-)). What I am trying to gauge is the knowledge and the potential of the person opposite to me. I am a believer a person willing to learn and push his/her limit can do wonders. No one is born smart. Smartness is achieved by hard work and will to gain more knowledge. I always try to find a person who can fit into my immediate job description and is smart/hard working enough to achieve more in the near future.

At least in the technology field, what I find is that interview is a lost cause. Most people are I interviewed with are trying to prove that they are smarter than me or they are the smartest people in the world. They are not even try to find if I can fit into the current job and can do more in the future.

Another thing I try to achieve in the interview is the personality of the person I am interviewing. No matter how smart one is unless that person has the right attitude and the personality, he/she cannot fit into the team. It takes only one bad apple to disintegrate the whole basket of good apples. It is all about team work.

Without a good team dynamics, it is not possible to achieve the long term goals. I takes ne person with wrong attitude in the a big team to lead the project to failure. What happens is that this person will demotivate others in the team. It is the responsibility of the manager solve this problem and also the responsibility of the team members to raise the issue. Again it is the responsibility of the manager to see the signs and take actions at the early stage.

Having the social skills is equally important as the knowledge of the current field he/she is working in. Social skills refers to personality, how comfortable one can make others, how good of a team player one can be etc. etc.

Thursday, September 9, 2010

Content vs Ads

Looking at several websites known to provide content, I am wondering what is important for these websites.. Content or Ads.

There was a time when I used to get ads within the content, but now a days I am searching for content within ads. Page is full of ads and somewhere hidden is a small pre-text of content. I have to click on several ads (or real links) to figure out what is the actual content. Most of the time I end up clicking on ads which look like content (thanks to Google adsense ads buried in the content). Google makes it easy for publishers to format the ads so that it looks like the content. "ads by Google" is like a fine-print in the terms of service notice. It is barely visible.

Apart from them, even big names like CNN is also doing it but in a different way. They are displaying big banner ads at the top of the page which pushes the content below the first fold of the page. They make sure that y

ou view the ads. Whether or not you click on the ads is not important, they are banner ads mostly used for brand awareness anyways.

Look at this page from CNN. Big BMW ad at the top of the page covering approximately 80% of the page.

Only title of the article is viewable on the first fold of the content. Hello... I know the title of the article from the website (in this case Yahoo! Finance). I am here to read the content of the article. You are already making money by the content distribution. Please stop throwing the ads at my face...

Monday, September 6, 2010

Spelling checker in Craigslist

In-spite of being a market leader in listing business, I don't think craigslist understand anything about search. People do mistake while typing search term. So now a days it is absolutely necessary to have a spelling checker and correction service along with the search. It is missing in Craigslist. I found out today when I was searching for car. I typed "toyta car". Instead of giving no results and suggesting corrected spelling, it is threw just a no result page at my face.

Now a days it is not hard to build spelling correction in the system. It is not required to build you own spelling correction system. There are lot of APIs available. BOSS API is one of the prominent one which does provide spelling correction API.

http://boss.yahooapis.com/ysearch/spelling/v1/toyta+car?appid=

Friday, August 27, 2010

HTML Message Ajax Design Pattern

Having more and more browsers supporting XMLHTTPRequest object to send the data to server and get the response from server without reloading the page, it is possible to refresh only a part of the page. There are various techniques to do this. One such technique is that the server send the XML back to the browser and the Javascript on the browser will convert the XML into HTML by creating the DOM nodes on the fly. This is useful but has some limitations.

* If the Javascript is turned off, there is no graceful degradation of application
* Since the Javascript is running on the browser (client side), changing the structure of the generated HTML is hard
* This is not feasible when HTML generated is complex

One can overcome these limitations by sending the HTML snippet in the response. Instead of sending the XML back to browser, generate the HTML on the server side itself. On the browser side, replace the content of the DOM node with the response HTML response from the server.

If the Javascript is turned off on the browser, user will see at least the content of the HTML on the browser instead of just the XML content. So there is a graceful degradation of service on the client side

If HTML generated need to be changed, application is not dependent on changing the Javascript and hoping that the client will reload the page and will get new Javascript. Even if the Javascript is changed, some web proxies may not pull the new Javascript, resulting in the browsers (clients) behind the proxies will continue to get the old Javascript and old HTML will get rendered.

There are various frameworks like Symfony, Django make it possible to generate the complex HTML possible with least effort. In case the request is not made via XMLHTTPRequest, it is possible to send the whole page (with header, footer etc.) in the response. If the request is made via XMLHTTPRequest object, send the response without the layout decoration (just the HTML snippet). Javascript running on client side will replace the content of the DOM node as required.

Take for example, there is an application where the page loaded has many graphics and only a part of the page changes when user take some action like filling a form for sending the email. It is absolutely unnecessary to generate the whole page again just for a simple action of sending an email. I will make an attempt to explain with the example of sending the feedback via a form embedded in the page. I will try to explain this with PHP Symfony framework and Javascript (AJAX).

The URL for the page is /new-page. This page has a feedback form in it. The action of this form is handled by /feedback. Server side code for /feedback expect a form having a text content, sends an email to the product manager and send same page as new-page but replacing the form with thank you note.

Below is a code for the new-page action.

Code for new-page response.

Now is the magic of jQuery Javascript library to submit the content to /feedback url and use the response content to replace the HTML form.

Server side code for handling the post from the browser to /feedback

HTML response from /feedback

In Symfony framework it is possible disable/enable the layout during run-time via setLayout() method of sfAction instance. Setting the parameter to false will disable the layout and send only the content of the action template back to the caller.

There you go.. A simple way to improve the usability of the site even without Javascript support.

Wednesday, August 25, 2010

Execute action within another action

In the current project I am working on (using Symfony framework), I came across a situation where I had to call one action within another action and capture the content of 2nd action. This content is included within the result from 1st action. After doing some research found that it it possible to do so by calling getPresentationFor method of the controller object.

Within the action's execute method,

$content = $this->getController()->getPresentationFor($module, 'my2ndAction');
$this->content = $content;

Thursday, August 5, 2010

File System Over SSH

Some of the development I am doing require me to do the development on some remote machine. Since the development is done in Java, I need Eclipse environment. On the remote system I cannot run Eclipse as it is a server box and does not have display attached to it. I found an elegant solution for this. Thanx for Macfusion

I can mount remote machine's directory on my local machine and work as if it is a local directory. All the dirty work of communication is taken care. I don't need nfs mounting which is not secure. Macfusion let me mount the directory on remote machine over ssh

Sunday, July 11, 2010

Why Erlang scares managers?

Managing is all about control. Managing is about knowing (almost) everything about what one is controlling. If the team is using something manager don't understand or willing to understand, manager is loosing control. Again it is all about control.

99% of managers are comfortable with technologies like PHP, Java etc. Languages with new ideas like Erlang, LISP are far fetched for many managers. They don't want to learn about these new technologies or new ideas. Having these new technologies in the team force them to come out of the comfort zone and take the control away from them. This is not a good news for them. One argument I have heard every time I make a case for why Erlang (is better in-terms of hardware utilization and lines of code) is that "It is hard to hire Erlang developers than PHP/Java developers'. As I see competent developers want to learn new things than putting themselves in the same-old-same-old world. Competent developers are capable of learning new technologies and new ideas.

Off course, there is a initial learning curve. It is always there anywhere you go. Developers/Engineers need to learn the new way of doing things anywhere they go. Some companies are willing to give more time to developers to get familiar with the internal technologies and some don't. But it is a absolute necessity. Everyone does not follow the same development methods or procedures. In a big organization, this changes from group to group.

Sunday, June 27, 2010

To stay or to leave....

There comes a time when one has to stop doing the things one was doing for along time. It is hard to do that... Sometime it feels it is impossible.. Some people lack courage to do that, some people lack motivation and for some people situation don't let them..

Once the decision is taken, is it possible to revert that decision.. Yes.. I think it is possible to that. Question is what it the cost of reverting the decision. I don't think it is much..

Tuesday, April 20, 2010

Dog is not Cat

While talking to my friend today, something struck me regarding the notion of object oriented programming, hierarchy of classes and how these concepts are over used.

Objected oriented design is heavily used for thousands of years. For example, constructing a home. If you think about it, home is built in a very modular way (aka object oriented way). Every part of home is a object [door, window, wall, roof etc.]. They can be put together in many ways to construct a home. The final shape of home may change, but the the basic building blocks remain the same.

Now to the point of using this in computer programming. I think object oriented programming is misunderstood in the software industry. People seems to confuse reusability of the code with object orientation. For example, take the hierarchy of animals. Animal is a base class, cat is derived from class Animal. Now let's say I want to declare a new class Dog. Since Dog class was not considered while implementing Cat class, common functionality of Animal (like see, listen etc.) are implemented in Cat class. Now when one want to design Dog class, since Cat already has the functionalities required by Dog, derive Dog from Cat. This is so wrong. Logically when you derive one class from another, you are implying "is-a" relation. That means, by deriving Dog from Cat [just because you have functionality required by Dog is implemented in Cat], you are implying that Dog is a Cat.

Cases like these, it is better to sit back and think a little. May be code re-factoring will help here. One can move the functionalities that are required by both Dog and Cat [may be Elephant, Tiger etc.] can be moved into the Animal class and implement those functionalities in terms of some abstract concepts like eyes, legs, ears etc.

Now enough of messing your brain.. Go back to work and start thinking about the designs you have done earlier and how you could have done it better [not that what you did earlier was wrong].

Saturday, January 30, 2010

Hacking search suggestions

Opensearch specification has extension for search suggestions. Major search providers have their own search suggestion entry points. After looking doing a search of my own, I found the entry points for the 3 major search providers

Bing - http://api.search.live.com/osjson.aspx?query={Search Term}
Google - http://suggestqueries.google.com/complete/search?q={Search Term}&client=firefox
Yahoo - http://ff.search.yahoo.com/gossip?output=fxjson&command={searchTerms}

For Google entry point, removing client parameter will also provide the number of results in the response. This format is not as per the Opensearch standards.

Who is more open? Google or Yahoo or Bing?

While looking at the search result page HTML from Google, Yahoo and Bing, I discovered that Google does not add auto discovery to its search result page. Where as Yahoo and Bing does.

That makes me wonder why Yahoo and Bing does not get as much credit as Google for adopting the open technologies/standards?

Friday, January 22, 2010

http request pipeline in Erlang

I tried to use Erlang's http module for high concurrent requests. It was not performing well due to pipelining and persistent connection issues. This seems to be solved in R13 version. I figured out how to use the http profiles to do selective pipelining/persistent connections to one server but not for others [if application is sending requests to multiple hosts].

First step in the process is to create a new http profile. It can be done in 2 ways. First one is to run a stand along http connection manager (httpc_manager).


{ok, Pid} = inets:start( httpc, [{profile, other}] ).

As per the documentation, this is not desirable as all benefits of OTP framework is lost.

Dynamically started services will not be handled by application takeover and failover behavior when inets is run as a distributed application. Nor will they be automatically restarted when the inets application is restarted, but as long as the inets application is up and running they will be supervised and may be soft code upgraded. Services started as stand_alone, e.i. the service is not started as part of the inets application, will lose all OTP application benefits such as soft upgrade. The "stand_alone-service" will be linked to the process that started it. In most cases some of the supervision functionality will still be in place and in some sense the calling process has now become the top supervisor

2nd method is to run it as a part of inets application via configuration file

Have a config file with the following content (say inets.config)


[{inets, 
[{services,[{httpc,[{profile, server1}]},
            {httpc, [{profile, server2}]}]}]
}].

Run the erlang shell as


erl -config inets.config

This will start 3 http profiles [server1, server2 and default].

Now the question is how to use the newly created profiles. Let's say the application is using 2 web services hosted at foo1.example.com and foo2.example.com. Web service hosted at foo1.example.com is hosted on a web server which can support lot of persistent connections [keep alive connections]. Web service hosted foo2.example.com is hosted on a normal web server which is not optimized for large number of persistent connectinons.

In the application set the profile for server1 for the connections to foo1.example.com. This can be done by changing the http options listed here.


http:set_options([{max_sessions, 20}, {pipeline_timeout, 20000}], server1).

NOTE:It is required to set the pipeline timeout in order to enable http pipelining.

Profile can be specified during the request time.


http:request( "http://foo1.example.com/v1/get_info/dudefrommangalore", server1).

There is no interface provided by httpc_manager or inets to get the info on the number of sessions open to a server. But good news is that the session information is kept in the ets table. One can query the ets table to get the list of persistent connections.


ets:tab2list(httpc_manager_server1_session_db).

Output is something like


{tcp_session,{{"fo11.example.com",80},
               <0.103.0>},
              false,http,#Port<0.1032>,...}

<0.103.0> is a Pid of httpc_handler gen server process. It is possible to get the status of this process via standard OTP sys module.


sys:get_status(erlang:list_to_pid("<0.103.0>")).

It is also possible to get all the pipelined requests on each persistent connections. For that it is necessary to get the pid of the httpc_manager via inets:services_info(). This call will return the pid of the httpc_manager.


[{httpc,<0.52.0>,[{profile,server1}]},
 {httpc,<0.53.0>,[{profile,server2}]},
 {httpc,<0.41.0>,[{profile,default}]}]

From the pid, get the status of httpc_manager gen server process.


sys:get_status(erlang:list_to_pid( "<0.52.0>")).

ets table name is in bold here.


15> sys:get_status(erlang:list_to_pid("<0.52.0>")).
{status,<0.52.0>,
        {module,gen_server},
        [[{'$ancestors',[httpc_profile_sup,httpc_sup,inets_sup,
                         <0.36.0>]},
          {'$initial_call',{httpc_manager,init,1}}],
         running,<0.40.0>,[],
         [httpc_manager_server1,
          {state,[],24596,
                 {undefined,28693},
                 httpc_manager_server1_session_db,httpc_manager_server1,
                 {options,{undefined,[]},
                          0,2,5,120000,2,disabled,false,inet,default,...}},
          httpc_manager,infinity]]}

Get the content of the ets table to get the pipelined connection


ets:tab2list(24596).

Application can tune the http options to utilize the network bandwidth better, get the most of the machine and network.

Tuesday, January 19, 2010

Erlang process mailbox performance

I came across this performance issue in Erlang while doing the pattern matching against the mailbox [a.k.a. selective message processing]. Here is the orignal code:


-module (perf).

-export( [start/0] ).

start() ->
  S = erlang:now(),
  Pids = spawn_n(fun test/1, 10000, []),
  wait(Pids),
  E = erlang:now(),
  io:format( "Total time: ~p~n", [timer:now_diff(E, S)/1000] ).

spawn_n(_F, 0, Acc) -> Acc;
spawn_n(F, N, Acc) -> 
  Me = self(),
  Pid = spawn(fun() -> F(Me) end),
  spawn_n(F, N-1, [Pid|Acc]).

test(Pid) -> Pid ! {self(), ok}.

wait([]) -> ok;
wait([Pid|Pids]) -> 
    receive {Pid, ok} -> ok end,
    wait(Pids).

Run time for perf:start() was 1.3 seconds

Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.6.5 (abort with ^G)
1> perf:start().
Total time: 1368.038
ok
2>

Now I changed wait(Pids) to wait(lists:reverse(Pids)). After this change, run time for perf:start() was 83 milliseconds.

1> perf:start().
Total time: 83.037
ok

15x improvement just by changing the way mailbox scan is done.

Little things like this are usually overlooked and the language is blamed for the performance issues.

Monday, January 18, 2010

Concurrency in Java - Part 2

In the earlier post I covered the basic cached thread pool.

Another facility offered by Java's concurrency framework is to schedule a thread after certain time or at regular interval (like standard unix cron job). There are 2 ways to schedule the thread at regular interval. First one is to run a task at regular interval regardless of the previous job. Second one is to run a task and wait for the certain interval after the previous job is done.

Second one is helpful in situation like crawlers. It is necessary to download the pages with some politeness factor [wait for sometime before downloading a page from the same website].


ScheduledExecutorService executionService = Executors.newScheduledThreadPool(2);

Above code snippet create a pool of 2 threads. This service can schedule the threads at regular interval.

ScheduledExecutorService provide a method schedule


Runnable task = new Runnable() {
   public void run() {
      System.out.println( "I am responsible for downloading a page" );
      return;
   }
};
/* TimeUnit is defined within java.util.concurrent package */
Future future = executionService.schedule(task, 2000, TimeUnit.MILLISECONDS);

The above code schedule a task to run after 2 seconds [2000 milliseconds]. schedule method return a future object. This future object can be used to check the status of the task [isDone method] or to cancel the task [cancel method]. Read more about the future object and the methods available here.

ScheduledExecutorService also support repeated execution of a task via scheduleAtFixedRate and scheduleWithFixedDelay.

scheduleAtFixedRate method schedule the task at regular interval. This method does not check if the previously scheduled task is finshed or not.

scheduleWithFixedDelay method is similar to scheduleAtFixedRate except that this method wait for the previous execution to finish, wait for the fixed interval and then schedule the task again.

Sunday, January 17, 2010

Concurrency in Java

Java 5.x introduced the concurrency framework. It make the life of developer easier to run multiple threads. This framework also take care of thread caching this reducing the number of spawned threads in the system.

When the concepts of thread was introduced in the operating systems, it was considered light-weight processes. As the clock speed of the CPU is increasing dramatically and also number of CPU cores available for the programs are increasing, even this light-weight processes are deemed to costly to start. Thus introduced the concept of cached threads. Erlang solve this problem by introducing ultra-light-weight processes. Millions of such processes can be spawned within few seconds. This is not the case in kernel threads. Even when kernel threads are used, there is a cost of context switching to schedule those threads from wait state to run state.

I am new to Java concurrency framework. So I am taking baby steps to learn to use the classes available in Java 5.x. All the concurrency related classes are in java.util.concurrent package.

First step in start using these classes is to introduce Executors class. This class has several class methods to create thread pools.


import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;

ExecutorService threadPool = Executors.newCachedThreadPool();

The above code create a cached thread pool. Behavior of this thread pool is documented here.

Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors

A task can be submitted to the newly created thread pool for execution. A task must be an instance of Runnable interface. Submitting a task to the ExecutorService will return an instance of Future interface. This is a wrapper around the task submitted. This instance can be used to query the submitted task for completion, as well as to cancel the task.



Future task = threadPool.submit( new Runnable() {
       public void run() {
          System.out.println( "Hello world from within the thread pool" );
       }
  });

Finally wait for the task to be completed


  while ( !task.isDone() ) {
  }

Once the task is completed and thread pool is no-longer necessary, send shutdown message to the thread pool to terminate all the threads created.


  threadPool.shutdown();
  while (!thread.isTerminated()) {
  }

There it is. First Hello world code using the Concurrent Thread Pool in Java. As and when I learn new methods in this framework, I will write about that here.

Until then, happy thread pooling and utilizing all the cores on the system.

Update: Download the source from here