Monday, October 12, 2009

s3fsr 1.4 released

s3fsr is a tool we built at Bizo to help quickly get files into/out of S3. It's had a few 1.x releases, but by 1.4 we figured it was worth getting around to posting about.

Overview

While there a lot of great S3 tools out there, s3fsr's niche is that it's a FUSE/Ruby user land file system.

For a command line user, this is handy, because it means you can do:
# mount yourbucket in ~/s3
s3fsr yourbucketname ~/s3

# see the directories/files
ls ~/s3/

# upload
mv ~/local.txt ~/s3/remotecopy.txt

# download
cp ~/s3/remote.txt ~/localcopy.txt
Behind the scenes, s3fsr is talking to the Amazon S3 REST API and getting/putting directory and file content. It will cache directory listings (not file content), so ls/tab completion will be quick after the initial short delay.

S3 And Directory Conventions

A unique aspect of s3fsr, and a specific annoyance it was written to fulfill, is that it understands several different directory conventions used by various S3 tools.

This directory convention problem stems from Amazon's decision to forgo any explicit notion of directories in the API, and instead force everyone to realize that S3 is not a file system but a giant hash table of string key -> huge byte array.

Let's take an example--you want to store two files, "/dir1/foo.txt" and "/dir1/bar.txt" in S3. In a traditional file system, you'd have 3 file system entries: "/dir1", "/dir1/foo.txt", and "/dir1/bar.txt". Note that "/dir1" gets its own entry.

In S3, without tool-specific conventions, storing "/dir1/foo.txt" and "/dir1/bar.txt" really means only 2 entries. "/dir1" does not exist of its own accord. The S3 API, when reading and writing, never parses keys apart by "/", it just treats the whole path as one big key to get/set in its hash table.

For Amazon, this "no /dir1" approach makes sense due to the scale of their system. If they let you have a "/dir1" entry, pretty soon API users would want the equivalent of a "rm -fr /dir1", which, for Amazon, means instead of a relatively simple "remove the key from the hash table" operation, they have to start walking a hierarchical structure and deleting child files/directories as they go.

When the keys are strewn across a distributed hash table like Dynamo, this increases the complexity and makes the runtime nondeterministic.

Which Amazon, being a bit OCD about their SLAs and 99th percentiles, doesn't care for.

So, no S3 native directories.

There is one caveat--the S3 API lets you progressively infer the existence of directories by probing the hash table keys with prefixes and delimiters.

In our example, if you probe with "prefix=/" and "delimiter=/", S3 will then, and only then, split & group the "/dir1/foo.txt" and "/dir1/bar.txt" keys on "/" and return you just "dir1/" as what the S3 API calls a "common prefix".

Which is kind of like a directory. Except that you have to create the children first, and then the directory pops into existence. Delete the children, and the directory pops out of existence.

This brings us to the authors of tools like s3sync and S3 Organizer--their users want the familiar "make a new directory, double click it, make a new file in it" idiom, not a backwards "make the children files first" idiom. It is, understandably, different from what users expect.

So, the tool authors got creative and basically added their own "/dir1" marker entries to S3 when users' perform a "new directory" operation to get back to the "directory first" idiom.

Note this is a hack, because issuing a "REMOVE /dir1" to S3 will not recursively delete the child files, because to S3 "/dir1" is just a meaningless key with no relation to any other key in the hash table). So now the burden is on the tool to do its own recursive iteration/deletion of the directories.

Which is cool, and actually works pretty well, except that the two primary tools implemented marker entries differently:
  • s3sync created marker entries (e.g. a "/dir1" entry) with a hard-coded content that etags (hashes) to a specific value. This known hash is nice because it makes it easy to distinguish directory entries from file entries when listing S3 entries and, S3 knowing nothing about directories, the tool having to infer on its own which keys represent files and which represent directories.
  • S3 Organizer created marker entries as well, but instead of a known etag/hash, they suffixed the directory name, so the key of "/dir1" is actually "/dir1_$folder$". It's then the job of the tool is recognize the suffix as a marker directory entry, strip off the suffix before showing the name to the user, and use a directory icon instead of a file icon.
So, if you use a S3 tool that does not understand these 3rd party conventions, browsing a well-used bucket will likely end up looking odd with obscure/duplicate entries:
/dir1 # s3sync marker entry file
/dir1 # common prefix directory
/dir1/foo.txt # actual file entry
/dir2_$folder$ # s3 organizer maker entry file
/dir2 # common prefix directory
/dir2/foo.txt # actual file entry
This quickly becomes annoying.

And so s3fsr understands all three conventions, s3sync, S3 Organizer, and common prefixes, and just generally tries to do the right thing.

FUSE Rocks

One final note is that the FUSE project is awesome. Implementing mountable file systems that users can "ls" around in usually involves messy, error-prone kernel integration that is hard to write and, if the file system code misbehaves, can screw up your machine.

FUSE takes a different approach and does the messy kernel code just once, in the FUSE project itself, and then it acts as a proxy out to your user-land, process-isolated, won't-blow-up-the-box process to handle the file system calls.

This proxy/user land indirection does degrade performance, so you wouldn't use it for your main file system, but for scenarios like s3fsr, it works quite well.

And FUSE language bindings like fusefs for Ruby make it a cinch to develop too--s3fsr is all of 280 LOC.

Wrapping up

Let us know if you find s3fsr useful--hop over to the github site, install the gem, kick the tires, and submit any feedback you might have.


No comments: