Thursday, October 21, 2010

An experiment in file distribution from S3 to EC2 via bittorrent

Amazon's autoscaling service is fantastic. It allows you to dynamically scale the number of instances running your application based on a variety of triggers, including CPU usage, request latency, I/O usage, and more. Thus, you can increase your capacity in response to increased demand for your services.

One difficulty with this approach is that your response time is strictly bounded by the time it takes for you to spin up a new instance with your application running on it. This isn't a big deal for most servers, but some of our backend systems need multi-GB databases and indexes loaded onto them at startup.

There are several strategies for working around this, including baking the indexes into the AMI and distributing them via EBS volume; however, I was intrigued by the possibility of using S3's bittorrent support to enable peer-to-peer downloads of data. In an autoscaling situation, there are presumably several instances with the necessary data already running, and using bittorrent should allow us to quickly copy that file to a new instance.

Test setup:

All instances were m1.smalls running Ubuntu Lucid in us-east-1, spread across two availability zones. The test file was a 1GB partition of a larger zip file.

For a client, I used the version of Bittornado available in the standard repository (apt-get install -y bittornado). Download and upload speeds were simply read off of the curses interface.

For reference, I clocked a straight download of this file directly from S3 as taking an average of 57 seconds, which translates into almost 18 MB/s.

Test results:

First, I launched a single instance and started downloading from S3. S3 only gave me 70-75KB/s, considerably less than direct S3 downloads.

As the first was still downloading, I launched a second instance. The second instance quickly caught up to the first, then the download rate on each instance dropped to 140-150KB/s with upload rates at half that. Clearly, what was going on was S3 was giving each instance 70-75KB/s of bandwidth, and the peers were cooperating by sharing their downloaded fragments.

To verify this behavior, I then launched two more instances and hooked them into the swarm. Again, the new peers quickly caught up to the existing instances, and download rates settled down to 280-300KB/s on each of the four instances.

So, there's clearly some serious throttling going on when downloading from S3 via bittorrent. However, the point of this experiment is not the S3 -> EC2 download speed but the EC2 <-> EC2 file sharing speed.

Once all four of these instances were seeding, I added a fifth instance to the swarm. Download rates on this instance maxed out at around 12-13 MB/s. Once this instance was seeding, I added a sixth instance to the swarm to see if bandwidth would continue to scale up, but I didn't see an appreciable difference.

So, it looks like using bittorrent within EC2 is actually only about 2/3rds as fast as downloading directly from S3. In particular, even with a better tuned environment (eg, moving to larger instances to eliminate sharing physical bandwidth with other instances), it doesn't look like we would get any significant decreases in download times by using bittorrent.

No comments: