Tuesday, January 24, 2012

Virgil: Remote Hadoop Job Deployment via REST


In the latest release, Virgil has taken its Hadoop support one step further allowing you to deploy jobs to remote Hadoop clusters via REST. 

We've taken the Ruby example and added the ability to remotely deploy those ruby scripts.

In our deployment model, we run Virgil, Cassandra and Hadoop on each node in the cluster, with an HTTP load balancer in front of the Virgil instances. 

With this latest release of Virgil, we've added a new shell script that allows you to start Virgil with the Hadoop configuration on the classpath.

bin/virgil-hadoop -host localhost


This starts Virgil, remotely accessing the Cassandra on the localhost.  Additionally, we alter the Hadoop configuration (in $VIRGIL_HOME/mapreduce/conf/) set to use the collocated Hadoop cluster.  The Hadoop configuration is described in detail on the Virgil Wiki.

Then, when you sling ruby at the REST API, that ruby is deployed remotely to the entire cluster.   Since we started Virgil using "localhost" that is used as the initial "seed address" in the Map/Reduce job, which means each it will its local Cassandra instance to access the Cassandra cluster minimizing the data passed around the network.

Give it a try and let us know what you think.

Next up: Cassandra Triggers.







No comments: