Thursday, January 27, 2011

Adventures In GWT-land Part #1: Awkward Baby Steps

Over the past several months we’ve been working on a (super) secret shiny new GWT application (it’s basically going to rock your socks off). This was my first GWT application and coming from a non-Java, non-GWT background where I was used to writing raw Javascript pretty often - it’s been interesting to say the least. What follows is the first in a multi-part series where I’d like to reflect on life in GWT-land and hopefully provide a few cool tips and code samples along the way.

First Steps:

Stepping into GWT for the first time when your used to pure Javascript - or “normal” CSS/HTML front-end development in general is pretty awkward. The biggest adjustment is that all client-side logic is defined in Java classes which later get compiled into several permutations of obfuscated Javascript. GWT also tries to be helpful and obfuscates all of your CSS classes in an attempt to prevent namespace collisions (more on this in a future post). If your thinking that FireBug becomes much less useful when working with GWT your absolutely right...but thanks to some really good debugging tools in GWT, this isn’t a terrible loss and you really won’t need it.

What’s with this Java -> Javascript stuff?

Why would somebody want to write a compiler for Java -> Javascript? One of the primary arguments for doing this is type-safety...which is fair enough I suppose - compile time checking is nice to have. You also get the benefit of native Java debugging tools, which despite the huge advances in client-side debugging in the past few years, Java’s debugging is still superior (mostly because of static typing). It’s really nice to be able to step through line by line, set breakpoints and inspect typed objects. However despite these benefits writing Java code that compiles into Javascript still feels weird.

The biggest reason for this awkwardness is that normally programmers write code in a language that is more expressive than your compile target, e.x. C/C++ compiles to assembly, Java to the JVM’s bytecode, CoffeeScript to Javascript etc. But with GWT your compile target is actually more expressive than the code you write and it’s not just a little bit more expressive, it’s a lot more expressive - which is a strange feeling indeed. Javascript has lambdas, prototypal inheritance, a less verbose syntax and a dynamic type system - all of which lead to a more expressive language, allowing you to do more with less code. Java on the other hand has none of these - expressing things that are normally trivial in Javascript (like a custom event system, currying etc.) can become a major chore (sometimes 4-5 classes or more) in GWT. Actually the whole process is rather akin to writing C/C++ code that compiles into Ruby or Python (ok, perhaps a slight exaggeration...but really only by a bit). But fear not, below are some tips to help make life easier for you in GWT-land.

Adjusting to life in GWT-land

  1. First use the gwt-mpv framework ( - it will save your sanity and quite possibly your soul. One of our awesome developers, Stephen has created a very nice model, view, presenter framework on top of GWT that removes a huge amount of boilerplate. It has some really nice stuff like validation, two way data binding, code generation for tedious boilerplate and more. Your fingers and brain will thank you for saving them from the verbosity of vanilla GWT, trust me...I’m an engineer.

  2. Abandon the notion of separation of content (HTML), presentation (CSS), and behavior (Javascript) found in traditional front-end development. GWT, being a framework is opinionated and uses widgets instead. Widgets are generally responsible for all three of these things at once - they know their own CSS classes, keep their own data model and have event handlers. The idea behind this is you can just drop a widget onto any page and have it “just work” with no external dependencies. The disadvantage is the approach is inflexible if you want to change any of those three things while holding the others constant. In practice Widgets work quite well until you need to change or extend one that lives in somebody else’s jar - then they can quickly turn into a pain. So choose your imported widgets carefully or be prepared to fork things (or mash ctrl-c ctrl-v a lot).

  3. Even though you’re writing in Java - you’re still working on the client side. Custom events are great for decoupling and facilitating easy testing of your objects as you only have to talk to a single event handler instead of multiple objects. Don’t go overboard though, you still want to avoid having events that call events that call events etc. as you’ll end up jumping all over the place trying to find out what really was supposed to happen when the original event fired.

Wednesday, January 26, 2011

EMR/Hive: recovering a large number of partitions

If you try to run "alter table ... recover partitions" on a table with a large number of partitions, you may run into this error:

FAILED: Error in metadata: org.jets3t.service.S3ServiceException: Failed to sanitize XML document destined for handler class$ListBucketHandler null 'null' -- ResponseCode: -1, ResponseStatus: null, RequestId: null, HostId: null
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

There's some discussion in the aws forums. The underlying cause is that it's running out of memory when trying to build the partition list.

A workaround is to increase the HADOOP_HEAPSIZE. This can be done by modifying with an EMR bootstrap action. On an m1.large instance, 2G seems to do the trick for us.

Upload a script like the following somewhere in s3:

You can now run this bootstrap action as part of your job:

elastic-mapreduce --create --alive \
--name "large partitions..." --hive-interactive \
--num-instances 1 --instance-type m1.large \
--hadoop-version 0.20 \
--bootstrap-action s3://<bucket/path>/

You should now be able to load your partitions.