Monday, March 12, 2012

Fault Tolerant MongoDB on EC2

While working on a project at Bizo I needed to connect a Rails app to a MongoDB backend both of which run in Amazon's Cloud (EC2). At Bizo we have a policy to not use non Amazon services when possible (to limit risk) - so we normally run most of our services straight off of EC2. I'd like to share what I've learned as best practices throughout the experience as I hope it might save some time and frustration for others.


Replica sets are the preferred way to run a distributed, fault tolerant MongoDB service. But as with any distributed system, nodes will eventually fail. Now replica sets are pretty good at handling failures, but they can't save you if too many nodes fail.
Specifically a replica set requires a minimum of two nodes to function at all times (1 primary and 1 secondary node). Thus a good rule of thumb is to run **at least 3 nodes** in a replica set, that way if a node fails your database service doesn't go down with it. The Rails app I was working with doesn't experience enormous amounts of traffic so 3 m1.large (64bit) nodes were sufficient for my needs. What follows is a rundown of our setup and how it handles common needs of fault tolerant systems.

Best Practices

Minimize Failure with AutoScaling, Availability Zones, CloudWatch and EBS Volumes

  • Use Autoscaling Groups, CloudWatch and EBS Volumes to replace failed nodes as soon as they go down. Since we run three nodes, our replica set is insulated from failure due to a single node crashing. But if two nodes crash the replica set goes with them. To solve this we use Cloudwatch alarms to trigger the Autoscaling Group whenever a node goes down - that way a new replacement node is automatically brought online within a few minutes of a failure to reduce the risk of nodes sequentially failing. Additionally each node stores it's data on an EBS Volume (network attachable hard drive) - that way when a node fails, it's replacement doesn't startup with missing data - it simply mounts the previous node's EBS.
  • To protect against multiple nodes failing simultaneously run each node in a separate availability zone. The above isn't sufficient to protect against things like hardware failures as all 3 instances could wind up on the same hardware. Running each node in a separate availability zone guarantees that our mongo instances run with a reasonable amount of separation (eg. they don't all end up on the same hardware box). Ideally you'd run each node in its own region (separate data center), but this causes headaches trying to configure firewalls as Amazon does not allow security groups to be used across multiple regions (see security below). So unless you want to setup a VPN for cross region communication - you're probably better off just running in separate availability zones.
    Assuming you've created the group and are running three nodes, each in a separate availability zone you can configure the auto scaling group using Amazon's command line tools like so:
    as-update-auto-scaling-group my-mongo-service \
    --region us-east-1 \
    --availability-zones us-east-1a us-east-1b us-east-1c \
    --max-size 3 \
    --min-size 3 \
    --desired-capacity 3
    Now if any node fails then a new one will startup to take its place in the proper availability zone.

If Everything Fails, have backups

It's always good to have backups just in case something really bad happens. Fortunately since we use EBS Volumes this is really easy - we create nightly snapshots of the primary node's EBS Volume on our cron server using Amazon's command line tools (ec2-create-snapshot). These snapshots are persisted to S3 and we can easily restore our replica set from these backups.

Use Elastic IPs

As nodes fail and are replaced you want both your Replica set and your database clients to be able to find and connect to the new nodes. The easiest way to do this in Amazon is to use Elastic IPs - special* static ip addresses that can be assigned to individual instances. Since each instance runs in a separate availability zone we just need one Elastic IP per zone. When a new node starts up to replace a failed instance, it checks which zone it was started in and assigns itself the matching Elastic IP. Both the client and replica set configuration should point at the Elastic IP address - that way failures and startups of new nodes will be seamless to your app. This is because cross-security group openings in the firewall need to use internal (not external) addresses and the DNS url in the console will resolve to an internal ip address from an EC2 instance.


This is where the headaches can start. Ideally you want to restrict access to your MongoDB instances to just your client application using Amazon's security groups. The way we normally set this up is to give your mongo instances a security group, say mongo-db-prod and your client app a security group, say cool-app-prod. Then mongo-db-prod would grant access on port 27017 (default mongodb port) to security group: cool-app-prod. Unfortunately what's not documented very well is that if you use the **external Elastic IP** addresses in your configuration it **will not work** with security groups! Instead you have to use the Elastic IPS DNS url (found in Amazon's web console) for security groups to work properly.

A Final Caveat

One thing to be careful of is if you require more than 5 nodes in a replica set you'll run into a problem using Elastic IPs - Amazon by default only allows 5 eips per region. You'll need to either ask Amazon to increase this limit on your account or seek out an alternative setup.
Well, that's it but If you have another setup for running MongoDB on EC2 I'd love to here it. Until next time.

No comments: