Monday, November 30, 2009

quick script: open hadoop jobtracker UI with elastic map reduce

If you've ever logged into the hadoop master with amazon's elastic map reduce, you'll see something like:

The Hadoop UI can be accessed via the command: lynx http://localhost:9100/

Great, but lynx?.. not as nice as firefox or safari...

It's easy enough to do some ssh port forwarding so you can use your browser of choice and access the hadoop UI from your machine.

But, after getting tired of typing in the ssh options a bunch of times, I finally put together a short script that automates it a bit. The script takes in the public hostname of your hadoop master (you can get this from elastic-mapreduce --list), then picks a random port number, sets up the ssh forwarding, and opens the page in a new browser window.

I call it hcon for 'hadoop console'. After configuring the script with the path to your emr key file, you run it like:

hcon ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com

Here's the full script, but in case you're curious the magic lines (wrapped) are:

ssh -f -N -o "StrictHostKeyChecking no" \
-L ${LPORT}:localhost:9100 \
-i ${KEYFILE} hadoop@${HOST}
$BROWSER http://localhost:${LPORT}

(Yes, for this, I turn off StrictHostKeyChecking).

Anyway, try it out and let me know if it's helpful at all.

Friday, November 13, 2009

Monday, November 2, 2009

Using Hudson to manage crons

We've been using Hudson for several months now to manage our builds -- we probably have 80-90 different projects that it's responsible for. It's an awesome system for continuous integration and testing.


It's also an awesome system for scheduling and managing generic jobs. We've only just begun to use it as a cron server, but it's clear that it has numerous advantages over the more traditional way of using the unix cron service directly.


  • Notification plugins -- Hudson can be easily configured to send email and Jabber notifications when cron jobs start, succeed, or fail. You can also track your scheduled jobs via RSS.

  • Stdout/Sterr logging -- Hudson saves the stdout and stderr from each run automatically.

  • SCM integration -- if you need to update a job, just check the changes into SVN (or whatever SCM system you use). Hudson will automatically pick up the changes the next time your job is run.

  • Nice web interface -- never underestimate the productivity gains from having a good UI. It can be surprisingly tricky to determine exactly which crons are running on a generic Unix box. Not so with Hudson.


At Bizo, we believe that developers should be getting their hands dirty in the operational aspects of their projects -- Hudson gives us an easy interface for managing our scheduled jobs using the same tools that we're familiar with for managing our build processes. Hudson is such a great tool for continuous integration that it's easy to overlook how good it is at the simpler task of managing generic scheduled jobs.