Wednesday, August 26, 2009

Multidimensional Data Visualization in Python - Mixing Chaco and Mayavi

In a previous post, I recreated an infographic using the Chaco plotting library. Inspired by Peter Wang's lightning talk (scroll to about 5:15 in the video) at the recent SciPy Conference, I've extended this idea a bit to show the exploration of a "4D" data set (three axes and the color/size of the points) and using a 5th dimension (the date) as an interactive filter.

Since it's a whole lot easier to demonstrate than to describe, I made a short screencast of me playing with it:


While a bit hackish, the code is available for anyone wishing to play with or improve it.

I know I've said this before, but it bears repeating -- Mayavi is awesome.

Saturday, August 8, 2009

A Very Simple GUI Using Traits

I ran across this post today about some magic with simple syntax for GUI building that Richard Jones is playing around with. It occurred to me to give a simple example of traits usage to do the same thing.

So, thanks to traits, our class definition looks quite clean, and we can provide some manifest typing as well. I show a screen shot of my ipython session, which was launched with ipython -wthread:

So executing line [5] pops up the GUI shown. One nice thing about traits is the built-in MVC architecture, which allows me to change the value in the choices class attribute and it informs the listening label , which is updated automatically. Notice that the value f.choices inspected at the command line is updated as well:

The main drawback I see between this approach and Richard's is the size of the tool chain. The TraitsGUI piece requires wxPython (or QT--it works with either), and some dependencies.

Saturday, August 1, 2009

Infographic in Python using Chaco

This week I ran across a blog post about this New York Times infographic, which explains one of the measures of the "business cycle" based on industrial production (the data comes from the OECD originally). [Update: I also saw the post over at Juice Analytics in which they implemented this in excel.] Never one to pass up an opportunity to re-invent the wheel, I thought it would be a good exercise to implement this in Python using the excellent Chaco plotting toolkit which comes included in some python distributions. So, here's the beginning of that effort, after a few hours digging around the docs and hacking together a GUI:

This is a good start. I've posted the code for this at github.

To really flesh it out, you'd need to add in the Composite Leading Indicator data and make some of the elements update based on the selected range. It would also be cool to dynamically switch out the data for various countries, or view them concurrently. Any takers?

Information Density

I think what makes such a simple interface so compelling is that you are able to see the relationship between three pieces of data. Cross-plots are a great mechanism for visualizing relationships between two data sets, but they're made even more useful when you can highlight a range in a common index (e.g. time, in the case of time-series data, or depth, in the case of depth indexed data in the geophysics arena.) Even with a lot of information presented, the display is very clean--even sparse.

State and State-Transition

This particular graphic also reveals the "state" of the business cycle by partitioning the graph into quadrants. This data set has a very straightforward state inherent in it's construction, but one might imagine more sophisticated calculations of state decorating time series data such as this. I'm beginning to investigate the application of this to stock price streams and some derived state that can be displayed in ways that can be "replayed" and analyzed. Whether real "information" can be teased out of the data will remain to be seen, but I'll try to leverage the visual cortex to gain intuition about the data.

Any comments/suggestions about the approach are welcome.