Tuesday, October 26, 2010

Rolling out to 4 Global Regional Datacenters in 25 minutes

Sometimes I just have to sit back and reflect on the amazing operational power available on AWS. As you know, we are hardcore AWS-ers here at Bizo and we've been running in all 4 regions for several months. Recently we needed to roll out a new service which we wanted to be Globally Load Balanced (GSLB) and the rollout was astoundingly quick and easy. The total time it took us to go from 0 to 4 regions was 25 minutes!!! Amazing!

Only 25 minutes to setup a service that is running in 4 regions and 8 datacenters that will autoscale to handle pretty much any amount of load we send it!

Shout out to AWS and Dynect for making it almost too easy...

Thursday, October 21, 2010

An experiment in file distribution from S3 to EC2 via bittorrent

Amazon's autoscaling service is fantastic. It allows you to dynamically scale the number of instances running your application based on a variety of triggers, including CPU usage, request latency, I/O usage, and more. Thus, you can increase your capacity in response to increased demand for your services.

One difficulty with this approach is that your response time is strictly bounded by the time it takes for you to spin up a new instance with your application running on it. This isn't a big deal for most servers, but some of our backend systems need multi-GB databases and indexes loaded onto them at startup.

There are several strategies for working around this, including baking the indexes into the AMI and distributing them via EBS volume; however, I was intrigued by the possibility of using S3's bittorrent support to enable peer-to-peer downloads of data. In an autoscaling situation, there are presumably several instances with the necessary data already running, and using bittorrent should allow us to quickly copy that file to a new instance.

Test setup:

All instances were m1.smalls running Ubuntu Lucid in us-east-1, spread across two availability zones. The test file was a 1GB partition of a larger zip file.

For a client, I used the version of Bittornado available in the standard repository (apt-get install -y bittornado). Download and upload speeds were simply read off of the curses interface.

For reference, I clocked a straight download of this file directly from S3 as taking an average of 57 seconds, which translates into almost 18 MB/s.

Test results:

First, I launched a single instance and started downloading from S3. S3 only gave me 70-75KB/s, considerably less than direct S3 downloads.

As the first was still downloading, I launched a second instance. The second instance quickly caught up to the first, then the download rate on each instance dropped to 140-150KB/s with upload rates at half that. Clearly, what was going on was S3 was giving each instance 70-75KB/s of bandwidth, and the peers were cooperating by sharing their downloaded fragments.

To verify this behavior, I then launched two more instances and hooked them into the swarm. Again, the new peers quickly caught up to the existing instances, and download rates settled down to 280-300KB/s on each of the four instances.

So, there's clearly some serious throttling going on when downloading from S3 via bittorrent. However, the point of this experiment is not the S3 -> EC2 download speed but the EC2 <-> EC2 file sharing speed.

Once all four of these instances were seeding, I added a fifth instance to the swarm. Download rates on this instance maxed out at around 12-13 MB/s. Once this instance was seeding, I added a sixth instance to the swarm to see if bandwidth would continue to scale up, but I didn't see an appreciable difference.

So, it looks like using bittorrent within EC2 is actually only about 2/3rds as fast as downloading directly from S3. In particular, even with a better tuned environment (eg, moving to larger instances to eliminate sharing physical bandwidth with other instances), it doesn't look like we would get any significant decreases in download times by using bittorrent.

Friday, October 1, 2010

Killing java processes

I often want to kill java processes, be it an unresponsive Eclipse, a blown-out jEdit after I try to open a 2GB file, a stalled JUnit test suite, a borked scalac compiler daemon or a random Tomcat instance.

It gets tiring to write,
$ jps -lv
48231 /opt/eclipse-3.5.1/org.eclipse.equinox.launcher_1.0.201.jar -Xmx1024m
10258 /opt/boisvert/jedit-4.3.2/jedit.jar -Xmx192M
5295 sun.tools.jps.Jps -Dapplication.home=/opt/boisvert/jdk1.6.0_21 -Xms8m
followed by,
$ kill 48231
You know, with the cut & paste in-between ... so I have this Ruby shell script called killjava, a close cousin of killall:
$ killjava -h
killjava [-9] [-n] [java_main_class]

-9, --KILL Send KILL signal instead of TERM
-n, --no-prompt Do not prompt user, kill all matching processes
-h, --help Show this message
that does the job. It's not like I use it everyday but everytime I use it, I'm glad it's there.

Download the script from Github (requires Ruby and UNIX-based OS).

modern IDEs influencing coding style?

It would be nice if globals, locals, and members could be syntax colored differently. That would be better than g_ and m_ prefixes.

- John Carmack

I saw this from John Carmack last week and thought, what a great idea! It seems very natural and easy to do and makes a lot more sense than crazy prefix conventions. I've been mostly programming in Java, so the conventions are a little different, but I'd love it if we could get rid of using redundant "this" qualifiers to signal member variables, and the super ugly ALL_CAPS for constants... it just seems so outdated.

Eclipse actually provides this kind of highlighting already:

Notice that the member variable "greeting" is always in blue, while the non-member variables are never highlighted. Also, the public static constant "DEFAULT_GREETING" is blue and italicized.

Notice that if you rename DEFAULT_GREETING, it's still completely recognizable as a constant:

I think it's interesting that modern IDEs are able to give us so much more information about the structure of our programs. Stuff you used to have to explicitly call out via conventions like these. How long until we're ready to make the leap and change our code conventions to keep up with our tools?

The main argument against relying on tools to provide this kind of information is that not all tools have caught up. I'm not sure I completely buy this. Hopefully you're not actually remotely editing production code in vi or something. There are a lot of web apps for viewing commits and performing code reviews, and they're unlikely to be as fully featured as your favorite IDE. Still, the context is often limited enough to avoid confusion, and the majority of our time is spent in our IDEs anyway.

So, can we drop the ALL_CAPS already?