Thursday, June 11, 2009

Google Visualizations Java Data Source Library

As with any data-oriented company, most of our projects revolve around collecting data, processing data, and exposing data to users. In that third category, we've been moving towards Google Visualizations to draw our pretty graphs and charts. So, while the free Android phone and Google Wave were attracting a lot of attention at Google I/O, from a practical standpoint, I was actually most excited about Google's new Data Source Java Library. We had previously written something similar to this in-house, but we were still working on some of the optional parts of the specification when this library was released.

In a nutshell, Google Visualizations is a Javascript library that draws charts and graphs. The data is inserted in one of three ways: programatically in Javascript, via a JSON object, or by pointing the Javascript at a Data Source URL. For example, Google spreadsheets have built-in functionality to expose their contents as a Data Source, so you can just point the Javascript at a special URL, and a graph of your spreadsheet's data will pop up on your webpage. If you use the last method, you can use Gadgets to easily create custom dashboards displaying your data.

The Data Source Java Library makes it very easy to implement a Data Source backed by whatever internal data store you might be using -- it's just a matter of creating a DataTable object and populating it with data. The library provides everything else, up to and including the servlet to drop into your web container. (We ended up implementing a Spring controller instead. The library provides helper code for this; I estimate using a Spring conroller instead of a servlet cost us four lines of code.)

The best part is that it also implements a SQL-like query language for you, so you can expose your data in different forms (which are required by different visualizations) based on the parameters to the URL you call. Dumping data into JSON objects is very straightforward. Writing a parser and interpreter for queries is a real pain.

The library lets you specify how much of the query language you want to implement and which parts you want to make the library worry about. The only (small) complaint I have about this is that this configuration is rather coarsely defined -- we wanted to support basic column SELECTs (to improve performance on our backend) but have the library handle the aggregation functions (which our backend does not support). It wasn't too tough working around this restriction, although it does cost us a bit of extra parsing (so we can get a copy of the complete query) and column filtering (because both our code and the library processes the SELECT phrase).

No comments: