Monday, November 3, 2008

disk write performance on amazon ec2

For a few different projects here at Bizo, we've relied on Berkeley DB. For key/value storage and lookup, it's incredibly fast.

When using any DB, one of your main performance concerns for writes is going to be disk I/O. So, how fast is the I/O at amazon ec2?

I decided to do a quick test using BerekelyDB's writetest program. This is a small C program meant to simulate transaction writes to the BDB log file by repeatedly performing the following operations: 1. Seek to the beginning of a file, 2. write to the file, 3. flush the file write to disk. Their documentation suggests that "the number of times you can perform these three operations per second is a rough measure of the minimum number of transactions per second of which the hardware is capabable." You can find more details in their reference guide under Transaction throughput.

A quick disclaimer: This is not an exhaustive performance test! I am only measuring the above operations using the provided test program on a stock machine setup and single configuration.

For this test, I'm using an image based off of Eric Hammond's Ubuntu 8.04 image, on an ec2 small instance.

Anecdotally, I've noticed that writes to /tmp (/dev/sda1) seemed a lot faster than writes to /mnt (/dev/sda2, and the large disk in an ec2 instance), so we'll be testing both of these, as well as a 100 gig Elastic Block Store (EBS) volume mounted on /vol. All three are formatted as ext3, and mounted with defaults.

I made a small change to the test program (diff here), to print out the data size and file location in its output.


The writetest program was run with -o 10000 (10,000 operations) against each mount point with file sizes: 256, 512, 1024, 2048, 4096, and 8192 (bytes). Each run was repeated 50 times. You can download the test script here.


You can view the raw results here. I hacked together a small perl script (here), to join the rows together for a single run for a single file size. I then imported this data into Numbers, to generate some graphs. You can download both my raw csv files and Numbers spreadsheets here.

On to some graphs!


As you can see from this test, for BerkeleyDB log writes, /dev/sda1 (mounted as /), seems to have the most variance, but is also clearly faster than any of the other devices. Unfortunately, in an ec2 small instance, this is only 10G. You're expected to do all of your storage on /dev/sda1 (mounted as /mnt), which is much slower. Rounding it out is the EBS volume, which has a ton of cool features, but is slower still.

As a follow-up, it would be interesting to try EBS again using XFS, which most of the EBS guides recommend, due to its ability to freeze file writes for snapshots. I'm not sure if it's any better than ext3 for our BerkeleyDB write operations, but it's worth a shot.

No comments: