Friday, May 1, 2009

google app engine (java) and s3

After struggling for way too long, I finally (sort of) got app engine talking to s3.

Background


I've used the python app engine before for a few small personal projects, and it rocks. Not having to worry, at all, about physical machines, or deployment is a huge win. And, on top of that, they provide some really nice services right out of the box (authentication, memcache, a data store, etc.). So, when they announced the availability of app engine for java, we were all really excited.

Of course, there are some limitations. No Threads, no Sockets (this really sucks), and not all JRE classes.... BUT, it's probably enough for most applications...

And, it just so happens I'm working on a project where all this sounds okay.

Local Environment


They provide a really nice local development environment. Of course, there's not 100% correlation between things working locally and things working remotely. It's to be expected, but can be a pain.

Some things to watch out for:

Connecting to S3


We normally use jets3t for all of our s3 access. It's a great library. However, it's not a great app engine choice because it uses Threads, and Sockets... It seemed like a big task to modify it to work for app engine... I thought using a simpler library as a base would be better.

The author of jets3t has some s3 sample code he published as part of his AWS book. After making some small changes to get over the XPath problem, I just couldn't get it to work. The code worked fine locally, and it would work when I first deployed to app engine, but after that it would fail with IOException: Unknown... Everything looked pretty straightforward... I even tried setting up a fake s3 server to see if there was some weird issues with headers, or what... nothing...

So, I decided to try out some even simpler approaches. After all, it's just a simple REST service, right? That lead me to two different paths that (sort-of) worked.

J2ME and J2SE Toolkit for Amazon S3 -- this is a small SOAP library designed for use in a J2ME environment. This works in app engine! At least for GETs (all I tested). It is very basic and only has support for 1MB objects.

S3Shell in Java -- This is a small REST library designed for use as a shell. There's a small bug (mentioned in the comments), and you need to remove some references to com.sun classes (for Base64 encoding), but otherwise it seems to work pretty well! You will have problems using PUTs locally, but it works fine in production.

Putting it all together


I decided to go with the S3Shell code. I had to make a few changes (as mentioned above), but so far so good. I put up a modified copy of the code on github, or you can download the jar directly. This code should work fine as-is on google app engine. As mentioned, there's an issue with local PUTs (Please vote for it!).

The functionality is pretty basic. If you add any other app-engine supported features (would definitely like to get some meta-data support), definitely let me know.

8 comments:

God is Great said...

hi i have followed the above procedure

- java.io.FileOutputStream is not supported by Google App Engine's Java runtime
environment in s3sh.java file

larry said...

You don't really need to use the s3sh.java -- I left it in there, but in your application you should just use S3Store directly, e.g.:

S3Store s3 = new S3Store(AWS_HOST, AWS_KEY, AWS_SECRET);
List<String> buckets = s3.listBuckets();

And you won't have any problems.

Oliver said...

Hi there, I'm trying to use the jar you suggested but I'm running into trouble with GAE because it doesn't support the use of classes from the sun.misc package

This package is used in S3Store, line 565:

contentMD5 = new BASE64Encoder().encode(md.digest());

Any thoughts?

larry ogrodnek said...

Oliver,
Are you using the 1.0.5 jar on github? (http://github.com/downloads/ogrodnek/s3-simple/s3-shell-1.0.5.jar)?

It includes its own Base64 class, and doesn't reference the sun.misc.* stuff...

Buddhika.jm said...

I also tried this. That works nicely with small files with few Kilo bites. I wanted to store files with the size of about 10MB. I tried this with large size files. But program does not sent the file to amazon S3 bucket as I expected. Do u have any idea where the problem is? Please help.

larry ogrodnek said...

Buddhika.jm, you're probably running into a problem with appengine request deadlines. By default all HTTP requests have a deadline of 5 seconds. This can be extended to a maximum of 10 seconds. See: http://code.google.com/appengine/docs/java/urlfetch/overview.html

Amazon has support for uploading files in chunks (see: http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadInitiate.html)

this may be able to help you... maybe you can schedule some chunks to upload using appengine's task api?

s3-simple does not currently have support for multipart upload. If you add it, please let me know and I will incorporate it.

Buddhika.jm said...

Hi Larry,
I have used a servelet to store file as below.
public class UploaderServlet extends HttpServlet
{
public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
S3Store s3 = new S3Store("s3.amazonaws.com", "public Key", "Private key");
s3.setBucket("eyetask");
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterator = null;
try {
iterator = upload.getItemIterator(req);
} catch (FileUploadException e) {
e.printStackTrace();
}

FileItemStream item = null;
try {
item = iterator.next();
} catch (FileUploadException e) {
e.printStackTrace();
}
InputStream stream = item.openStream();
String filename = item.getName();

byte[] byetFile = IOUtils.toByteArray(stream);

s3.storeItem(filename, byetFile, "public-read");
}
}

The error I am getting when uploading is “Uncaught exception from servlet com.google.apphosting.api.ApiProxy$RequestTooLargeException: The request to API call urlfetch.Fetch() was too large”. This error occurs only if the size of the file I am going to upload exceeds 1Mb. Do you have solution for this?

larry ogrodnek said...

Thanks for the clarification. It sounds like a similar issue. There's nothing provided in the library to help with this, but I think you might be able to use s3's multipart upload support to break the file into smaller chunks on the client side...