Monday, September 22, 2008

Capistrano deployment with S3

We recently started using Capistrano for our deployments. It's a great tool and really simplifies doing remote deployments to ec2.

A lot of our projects are java, or java web projects, so we need to deploy the result of a build, not the raw source. I really didn't want to store builds alongside the source code in subversion. Before capistrano, I had written a couple of install shell scripts that would fetch the latest project binary from S3, using bucket names structure like:


with 'current' always pointing to the latest version.

There is a capistrano-s3 project that checks out from your scm, creates a .tar.gz, pushes this to s3, and then pushes this to your servers.

This isn't exactly what I wanted, since it's still missing a build step. I thought I could get by with something pretty simple -- just a scm implementation backed by S3. It turns out it was pretty easy.

Here's the S3 scm implementation. You need to have s3sync installed. Drop this in
And you can now use the following:

set :scm, :s3
set :repository, "my-bucket:my-path"

by default it will look in my-bucket:my-path/current/, you can also set

set :branch, "1.12"

and it will look in my-bucket:my-path/1.12/

If your AWS keys aren't available in the environment for s3sync, you can also set these in your capfile:

set :access_key, "my key"
set :secret_key, "my secret"

That's basically it. I'm pretty new to both capistrano and ruby, so any comments, feedback, etc. would be appreciated.


Timo said...

That's good stuff Larry. I'll definitely be using it for the Analyze deployment.

Nathan said...

Nice post.

I like the approach; it's much more scalable when the server count you're deploying to gets largish.

The one tradeoff I'm still unclear about is the notion of moving around a tar.gz of the entire repository versus keeping a cached-copy on the servers and just updating the files that changed.

One could implement either deploy style via S3 -- doing the non-tar.gz version would be a fairly simple modification on the recipe you've already provided. E.g., one would simply keep an S3 copy of the codebase files updated via s3sync (e.g., just update the files that change and/or optionally use the md5-checksumming features to check them all), and then do the inverse on the download side (just update the stuff that changes).

I guess which version would be fastest / most efficient would depend on the size of the repository: very large repositories that either don't compress well or take a long time to compress/unpack would probably do better just updating the files that changed in a cached copy.