bizo developer blog: August 2010

While I was at one of the Google I/O GWT sessions (courtesy of Bizo), a Google presenter mentioned how one of their internal GWT applications tracks user actions.

The idea is really just a souped-up, AJAX version of server-side access logs: capturing, buffering, and sending fine-grained user actions up to the server for later analysis.

The Google team was using this data to make A/B-testing-style decisions about features–which ones were being used, not being used, tripping users up, etc.

I thought the idea was pretty nifty, so I flushed out an initial implementation in BizAds for Bizo’s recent hack day. And now I am documenting my wild success for Bizo’s first post-hack-day “beer & blogs” day.

No Access Logs

Traditional, page-based web sites typically use access logs for site analytics. For example, the user was on a.html, then b.html. Services like Google Analytics can then slice and dice your logs to tell you interesting things.

However, desktop-style one-page webapps don’t generate these access logs–the user is always on the first page–so they must rely on something else.

This is pretty normal for AJAX apps, and Google Analytics already supports it via its asynchronous API.

We had already been doing this from GWT with code like:

public native void trackInGA(final String pageName) /*-{
  $wnd._gaq.push(['_trackPageview', pageName]);
}-*/;

And since we’re using a MVP/places-style architecture (see gwt-mpv), we just call this on each place change. Done.

Google Analytics is back in action, not a big deal.

Beyond Access Logs

What was novel, to me, about this internal Google application’s approach was how the tracked user actions were much more fine-grained than just “page” level.

For example, which buttons the user hovers over. Which ones they click (even if it doesn’t lead to a page load). What client-side validation messages are tripping them up. Any number of small “intra-page” things that are nonetheless useful to know.

Obviously there are a few challenges, mostly around not wanting to detract from the user experience:

How much data is too much?
Tracking the mouse over of every element would be excessive. But the mouse over of key elements? Should be okay.

How often to send the data?
If you wait too long while buffering user actions before uploading them to the server, the user may leave the page and you’ll lose them. (Unless you use a page unload hook, and the browser hasn’t crashed.)

If you send data too often, the user might get annoyed.

The key to doing this right is having metrics in place to know whether you’re prohibitively affecting the user experience.

The internal Google team had these metrics for their application, and that allowed them to start out batch uploading actions every 30 seconds, then every 20 seconds, and finally every 3 seconds. Each time they could tell the users’ experience was not adversely affected.

Unfortunately, I don’t know what exactly this metric was (I should have asked), but I imagine it’s fairly application-specific–think of GMail and average emails read/minute or something like that.

Implementation

I was able to implement this concept rather easily, mostly by reusing existing infrastructure our GWT application already had.

When interesting actions occur, I have the presenters fire a generic UserActionEvent, which is generated using gwt-mpv-apt from this spec:

@GenEvent
public class UserActionEventSpec {
  @Param(1)
  String name;
  @Param(2)
  String value;
  @Param(3)
  boolean flushNow;
}

Initiating the tracking an action is now just as simple as firing an event:

UserActionEvent.fire(
   eventBus,
   "someAction",
   "someValue",
   false);

I have a separate, decoupled UserActionUploader, which is listening for these events and buffers them into a client-side list of UserAction DTOs:

private class OnUserAction implements UserActionHandler {
  public void onUserAction(final UserActionEvent event) {
    UserAction action = new UserAction();
    action.user = defaultString(getEmailAddress(), "unknown");
    action.name = event.getName();
    action.value = event.getValue();
    actions.add(action);
    if (event.getFlushNow()) {
      flush();
    }
  }
}

UserActionUploader sets a timer that every 3 seconds calls flush:

private void flush() {
  if (actions.size() == 0) {
    return;
  }
  ArrayList<UserAction> copy =
    new ArrayList<UserAction>(actions);
  actions.clear();
  async.execute(
    new SaveUserActionAction(copy),
    new OnSaveUserActionResult());
}

The flush method uses gwt-dispatch-style action/result classes, also generated by gwt-mpv-apt, to the server via GWT-RPC:

@GenDispatch
public class SaveUserActionSpec {
  @In(1)
  ArrayList<UserAction> actions;
}

This results in SaveUserActionAction (okay, bad name) and SaveUserActionResult DTOs getting generated, with nice constructors, getters, setters, etc.

On the server-side, I was able to reuse an excellent DatalogManager class from one of my Bizo colleagues (unfortunately not open source (yet?)) that buffers the actions data on the server’s hard disk and then periodically uploads the files to Amazon’s S3.

Once the data is in S3, it’s pretty routine to setup a Hive job to read it, do any fancy reporting (grouping/etc.), and drop it into a CSV file. For now I’m just listing raw actions:

-- Pick up the DatalogManager files in S3
drop table dlm_actions;
create external table dlm_actions (
    d map<string, string>
)
partitioned by (dt string comment 'yyyyddmmhh')
row format delimited
fields terminated by '\n' collection items terminated by '\001' map keys terminated by '\002'
location 's3://<actions-dlm-bucket>/<folder>/'
;
 
alter table dlm_actions recover partitions;
 
-- Make a csv destination also in S3
create external table csv_actions (
    user string,
    action string,
    value string
)
row format delimited fields terminated by ','
location 's3://<actions-report-bucket/${START}-${END}/parts'
;
 
-- Move the data over (nothing intelligent yet)
insert overwrite table csv_actions
select dlm.d["USER"], dlm.d["ACTION"], dlm.d["VALUE"]
from dlm_actions dlm
where
    dlm.dt >= '${START}00' and dlm.dt < '${END}00'
;

Then we use Hudson as a cron-with-a-GUI to run this Hive script as an Amazon Elastic Map Reduce job once per day.

Testing

Thanks to the awesomeness of gwt-mpv, the usual GWT widgets, GWT-RPC, etc., can be doubled-out and testing with pure-Java unit tests.

For example, a method from UserActionUploaderTest:

UserActionUploader uploader = new UserActionUploader(registry);
StubTimer timer = (StubTimer) uploader.getTimer();
 
@Test
public void uploadIsBuffered() {
  eventBus.fireEvent(new UserActionEvent("someaction", "value1", false));
  eventBus.fireEvent(new UserActionEvent("someaction", "value2", false));
  assertThat(async.getOutstanding().size(), is(0)); // buffered
 
  timer.run();
  final SaveUserActionAction a1 = async.getAction(SaveUserActionAction.class);
  assertThat(a1.getActions().size(), is(2));
  assertAction(a1, 0, "anonymous", "someaction", "value1");
  assertAction(a1, 1, "anonymous", "someaction", "value2");
}

The usual GWT timers are stubbed out by a StubTimer, which we can manually tick via timer.run() to deterministically test timer-delayed business logic.

That’s It

I can’t say we have made any feature-altering decisions for BizAds based on the data gathered from this approach yet–technically its not live yet. But it’s so amazing that surely we will. Ask me about it sometime in the future.

For this last hackday, I decided to work on something more hardware hacking related. At this year's Maker Fair, I was really inspired by all the cool stuff people were building, so I picked up an arduino and started playing around with a couple of things.

I've always wanted to have some cool old-school analog VU type meters displaying web requests.

Here's my completed hackday project:

Here's a view of the components from the back:

It's battery operated and receives data wirelessly over RF from another arduino I have hooked up via serial to my laptop.

It's pretty simple, but I'm still totally psyched about how it came out.

The main components are some analog panel meters (kinda pricey, but awesome), and an RF receiver. The frame is a piece of scrap acrylic from TAP Plastics that I drilled and cut to size, and the stand is a piece of a wire clothes hanger bent to shape.

Connected to my computer is a another arduino (actually a volksduino) that receives updates over USB and sends the data out over RF:

You may be asking, why bother with wireless if you need a computer hooked up through serial anyway. Or you may ask why not just connect to a wireless network directly.

Well, I wanted the meters to be able to be moved around, or mounted on a wall... I wanted them wireless. But, it turns out that wireless and even ethernet solutions for connecting an arduino to the internet directly are comparatively pretty expensive. Even using bluetooth is expensive. My long term plan is to have a single arduino connected to the internet directly (via ethernet or wireless), and have it serve as a proxy over RF for the others... So this is a bit of work towards that.

I wrote a bit of Java code to connect to amazon's cloudwatch to pull the load balancer statistics for two of our services. I then discovered it's near impossible to connect to anything over USB in Java... It is ridiculous. Luckily, it's REALLY easy to do this with Processing, so I wrote a simple processing program that used my cloudwatch library and wrote it out to serial.

And that's really it. The arduino reads data over serial, and periodically sends it over RF. The arduino hooked up to the meters simply reads the values over RF and sets the meters to display a scaled version of the results. They're showing requests per second. We get a huge amount of requests per second with these services, so the numbers on the dial aren't actually correct (I need to make some custom faceplates). It also flashes an LED every time it gets a RF transmission.

Here's a quick video of it in action:

The one thing I'm not crazy about is that the maximum resolution you can get from cloudwatch is stats per minute, so the meters don't actually change as often as I would like.

Still, pretty cool. I'm looking forward to building some more displays like this in the future.

Links

Friday, August 13, 2010

Collecting User Actions with GWT

No Access Logs

Beyond Access Logs

Implementation

Testing

That’s It

hackday: analog meters

Blog Archive

Links

Friday, August 13, 2010

Collecting User Actions with GWT

No Access Logs

Beyond Access Logs

Implementation

Testing

That’s It

hackday: analog meters

Blog Archive

Subscribe To