Wednesday, July 22, 2009

Dependency management for Scala scripts using Ivy

I'm quickly becoming a huge fan of Scala scripting. Because Scala is Java-compatible, we can easily use our existing Java code base in scripts. This is especially convenient as we're moving our reporting to Hive, which supports script-based Hadoop streaming for custom Mappers and Reducers.

The one very annoying thing about Scala scripting is managing dependencies. My initial method was to have my bash preamble manually download the required libraries to the current directory and insert them onto the Scala classpath. So, my scripts looked something like this:


#!/bin/sh

if [ ! -f commons-lang.jar ]; then
s3cmd get [s3-location]/commons-lang.jar commons-lang.jar
fi

if [ ! -f google-collect.jar ]; then
s3cmd get [s3-location]/google-collect.jar google-collect.jar
fi

if [ ! -f hadoop-core.jar ]; then
s3cmd get [s3-location]/hadoop-core.jar hadoop-core.jar
fi

exec /opt/local/bin/scala -classpath commons-lang.jar:google-collect.jar:hadoop-core.jar $0 $@

!#
(scala code here)


This method has some rather severe scaling problems as the complexity of the dependency graph increases. I was about to step into the endless cycle of testing my script, finding the missing or conflicting dependencies, and re-editing it to download and include the appropriate files.

Fortunately, there was an easy solution. We're already using Ivy to manage our dependencies in our compiled projects, and Ivy can be run in standalone mode outside of ant. The key option to use is the "-cachepath" command line option, which causes Ivy to write a classpath to the cached dependencies to a specified file. So, now the preamble of my scripts looks like this:


#!/bin/bash

tempfile=`mktemp /tmp/tfile.XXXXXXXXXX`

/usr/bin/java -jar /mnt/bizo/ivy-script/ivy.jar -settings /mnt/bizo/ivy-script/ivyconf.xml -cachepath ${tempfile} > /dev/null

classpath=`cat ${tempfile} | tr -d "\n\r"`

rm ${tempfile}

exec /opt/local/bin/scala -classpath ${classpath} $0 $@

!#
(scala code here)


Now all I need is a standard ivy.xml file living next to my script, and Ivy will automagically resolve all of my dependencies and insert them into the script's classpath for me.

Crisis averted. Life is once again filled with joy and happiness.

No comments: