Thursday, September 1, 2011

Modern Portfolio Theory - A Python Implementation

I was surprised last week to find there was no accessible Python implementation of the calculation of the Efficient Frontier (as defined by Markowitz in his presentation of Modern Portfolio Theory ~1957).* Since the problem seemed simple to solve with the tools at hand, I set out to "right the wrong" and develop an open implementation (available under an open, BSD license via github) that meets my needs.

In the process of building the basic metrics, I was able to experiment with NumPy's datetime support -- as mentioned in my earlier post. The API's are still a bit in flux, but I thought I would write up a full workflow using the tools as they are. I would greatly appreciate feedback/contributions/suggestions.  So let's dive in ...

Thursday, August 25, 2011

NumPy's New datetime Support and a Financial Database Roundtrip

Last month when I was in Austin, TX at the 10th Annual (!) SciPy Conference I asked around about the status of native datetime support in NumPy. Based on the response of my good friends at Enthought, I've been living under a rock for a while -- it's been finished for some time now.

So, a month later, I've wrangled some time to take a look. I also happened to have a particular problem I wanted to solve. Here's a teaser preview of the output of my Modern Portfolio Theory implementation:

I'll get into the details of that implementation in a later post. For now, let's tackle some data handling.

The immediate task I undertook was to refactor a simple asset price data fetcher (from Yahoo Finance, in this particular case) that I use. I always end up hammering the Yahoo Finance server to pull down price data when I'm doing some ad-hoc analysis, and I thought I should build a simple function to stash that data away locally in a sqlite database file. I know there are other approaches for fetching and manipulating this data out there -- notably pandas aggregation functions, paired with the matplotlib finance functions. I've looked to those tools for guidance, but wanted to have a bit more functional approach, fewer (or at least different) dependencies, and SQL as an interface to the data. I'll say up front that all the code is available under a BSD license via github.

First, I cleaned up the old data fetcher and made it more functional (in the programming sense -- rather than object oriented). Really, I just wanted to be able to do this:

I prefer my api for this over the matplotlib api mainly because I don't want to import datetime, followed by all the startdate = datetime.datetime(2000,1,1) nonsense, just to get a date (pun intended). Anyway, once I've got my data, I want to stuff it into a sqlite db file and be able to (losslessly) pull it back out into a numpy array. This seems fairly simple because pysqlite offers conversion support for python datetime objects. Unfortunately, I wanted dates stored as floats of seconds since the epoch, so I had to do a bit more work. This resulted in the code:

These two functions provide for accurate round-tripping despite the nuances of daylight savings time treatment by numpy and sqlite.

Another important point in the use of the converter function is how one determines that the type is a "datetime." Sqlite uses datetime as a designator for a built-in type -- I'm just overriding that when I register the converter function. In order to have the proper type recognized, we need to specify the detect_types argument when we instantiate our connection to the database for INSERT-ing or SELECT-ing. Also, notice how I specify the type using the as keyword in my SELECT statement:

Now, I have a nice interface for this kind of interaction with my data:

I'm still looking into potential problems with how queries might perform where I've indexed a float column for the date. Also, any suggestions about how to better handle the DST issue would be most welcome. It's pretty hackish now and I'd like to avoid wading into the morass that is Python's congenital complex about round-tripping of timestamps and how to properly do timezones.

Next step: Port some metrics code over to use the numpy datetimes rather than my manual handling of dates. I'll be sharing this code as well as an explanation in a later post.

UPDATE: I've added a handy convenience function, price_utils.symbol_exists to check what data is in the sqlite database for a particular symbol.

It returns a tuple of the form (num_recs, first_date, last_date). Again, the code is available via github. Enjoy!

Wednesday, March 3, 2010

Culture Whiplash: India & Singapore Trip

I'm a few weeks back from a 1 week around-the-world trip to India and Singapore. I travelled with an old friend to visit some of his friends and coworkers in India. The Singapore trip was business related and very short. Like everyone, I had my preconceived notions of India. I must say, I was at turns, surprised, impressed and bewildered by what I saw. I'll recount a few things here -- mostly to aid my own remembering1.


Poverty in India is ubiquitous and acute. While the most heartbreaking cases are the destitute beggars, orphans, and widows who are starving "out in the open," -- the real tragedy is that vast areas of rural India are dotted with villages wherein the majority of people are living hand to mouth. These villages are in a bizarre state of pre-industrialization. Most have a common well from which people haul water with a cell tower in sight and a cell phone in most pockets. I found myself asking why a generous, resourceful Indian culture allows these backward conditions to persist. My initial impulse to assume that a Western template of development offers a solution is, on further consideration, incorrect. Perhaps the "solution" to India's problems will never come from Westernization. Western patterns of infrastructure can serve as a distant example only. For instance, adding a million wastebaskets around India will not solve India's garbage problem (every public waste can I saw was empty, with litter lining the streets). The big, obvious problems that I witnessed -- sanitation, nutrition (India contributes to over half the world total of child deaths each year, primarily due to malnutrition) -- are so rooted in cultural norms that they call for uniquely Indian solutions.

To offer another example, in the backward villages through which we travelled, women who have lost their husbands are either banished from their homes, or not allowed outside the home because of superstition (seeing a widow is a "bad omen"). The Western solution to this would be legislation mandating various agencies to provide a social safety net, education initiatives, etc., etc. The scale of this one problem is, by itself, enough (given Western approaches) to completely overwhelm the coffers of any Indian state. It seems hopeless.

I am encouraged by the work of my new friends Paul and Grace Moses. A part of their ministry (in addition to a children's home) is to serve food to widows (over 250 of them currently) once a month. The rations are around US$4.00 per widow and consist of a kilo each of rice, sugar and dal (lentils) as well as some vitamins. Not only does this meet the immediate need of feeding the widows, it provides a mechanism by which they may be welcomed back into their families.

Now, to my Western sensibilities (and sense of justice), the idea of reintegrating a widow into a family strictly because of what she has to offer is unseemly at best. However, this program works with all the leverage available to it. My sense is that India will need many more indirect steps such as this -- particularly to development at the village level. Applying traditional Western thought to these problems is like saying that "in order to solve a math problem, we must first build a computer."

Anyway, that's probably quite enough opinion for one post.

We visited a beach and resort town on the southeast coast in the state of Tamil Nadu. The town of Mamallapuram (also referred to as Mahabalipuram) has a couple of interesting attractions: a beachside Hindu temple and a batholith/stock that has been carved with some fantastic art. Off the coast are some undersea remains of construction that are the subject of much speculation. The town was hit by a 30 ft wave in the 2004 tsunami, though very little of the tsunami's affects are visible. We were told that there was a large migration from the coastal areas since the tsunami, but its hard for me to discern movement within a population like India's -- people are everywhere.


We departed from the Chennai Airport late at night and arrived in Singapore early the next morning. That overnight flight may as well have been a trip to another planet. Singapore is culturally opposite India in many ways. The economics are vastly different, of course, and attitudes and the general societal posture are also markedly different. For example, upon disembarking at the airport in Chennai, India there was a row of machine-gun armed officers lining the exit breezeway to the parking lots, with filth and waste behind every column. Conversely, in Singapore the only police presence (besides a few cameras dotting the train terminals) was the lone police car we saw driving through downtown -- and the streets, sidewalks, terminals and stations were spotless. I'm sure money has a lot to do with this difference, but I've never seen a more glaring example of the broken window theory. Of course, Singapore had the highest per-capita execution rate in the world between 1994 and 1999. Twenty four of the 136 folks executed during that time were foreign nationals.

That said, Singapore is a beautiful place. The climate was warm, but pleasant. There is quite a bit to do (particularly if you like to shop) with tons of development taking place -- even in the midst of a global recession. It's hard to imagine that many of the development projects are in the billions of dollars (notably, a casino) while malnutrition is rampant in the same hemisphere. Crazy stuff.

This was the trip of a lifetime -- and one I hope to repeat soon. I'm anxious to see some of the friends I met along the way again.

1 I fully expect these first impressions of India to be corrected and refined as I see and learn more. So, as G.K. Chesterton once said, "There is nothing the matter with Americans except their ideals. The real American is all right; it is the ideal American who is all wrong."

Wednesday, February 17, 2010

Python Jobs Trend

As Matt Asay pointed out, there seems to be a real uptick in jobs in the Python space. Here's the chart:

You can play with the chart here.

While the raw number of jobs is still much lower, the rates of change are encouraging for open languages.

Tuesday, October 6, 2009

TDD in Python: P&L Statement - Step 2

As a continuation of my TDD experiment, I've implemented a bit more functionality toward a P&L statement. As usual, I've posted all the code to github. This exercise implements the queueing I listed as a need in my first post on this subject. So, for a "story problem":

Eddie buys 1000 shares of AAPL at 10:30am for 185.25 and later, at 11:00am he buys another 1500 shares for 184.00. As he unrolls his holdings, Eddie sells 1500 shares for 186.00. What is Eddie's realized gain/loss? What's the cost basis for Eddie's current holdings?

For this post, I'd like to take a first step in solving my story problem by defining my queue a bit better. The real answer to the problem above is "it depends on whether you use fifo or lifo queueing." I've conceived a Holding object to serve as my queue for any positions I've entered for a particular instrument (as uniquely identified by its symbol). So, let's define the behavior we expect out of a fifo-ordered Holding queue in the form of a test:

def portfolio_holding_fifo_test():
""" test of 'fifo' and 'lifo' queuing by adding and removing
a position.
p1 = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
p2 = position.Position(symbol="AAPL", qty=1500, price=184.00,
multiplier=1., fee=7.0, total_amt=276000.,
p3 = position.Position(symbol="AAPL", qty=-1500, price=186.00,
multiplier=1., fee=7.0, total_amt=279000.,

h = portfolio.Holding()


h.remove_from(p3, order='fifo')

assert h.qty==1000
assert len(h.positions)==1
# simple check to make sure the position that we expect is left over...
p = h.positions[0]
assert p.price==184.00

As you can see from lines 5-13 we use our earlier Position object to define the positions that we're holding or removing. I then introduce the notion of a holding object which I can add_to or remove_from. My assertions at the end of my test are some not-very-comprehensive checks to see if the queue behaves as I would expect ... the first position is drawn from when removing the final position, then any remaining shares are drawn from the second position.

Since I expect I'll need to generalize my queuing, I'll write a similar test with 'lifo' ordering:

def portfolio_holding_lifo_test():
""" test of 'lifo' queuing by adding and removing
a position.
p1 = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
p2 = position.Position(symbol="AAPL", qty=1500, price=184.00,
multiplier=1., fee=7.0, total_amt=276000.,
p3 = position.Position(symbol="AAPL", qty=-1500, price=186.00,
multiplier=1., fee=7.0, total_amt=279000.,

h = portfolio.Holding()


h.remove_from(p3, order='lifo')

assert h.qty==1000
assert len(h.positions)==1
# simple check to make sure the position that we expect is left over...
p = h.positions[0]
assert p.price==185.25

So, I've written my tests first -- now for the code. I create my Holding class and put it in a file called Setting up the class and creating my add_to method was a cinch:

# Enthought imports
from enthought.traits.api import (HasTraits, Float, Instance, List)

# Local imports
from position import Position

class Holding(HasTraits):
""" Queue for held positions in the same security (as identified
by symbol). The removal of entries are handled in a 'fifo'
or a 'lifo' order, depending on the order argument of the
remove_from method.

# Total quantity for a particular holding
qty = Float

# List of positions making up the holding
positions = List(Instance(Position))

def add_to(self, position):
self.qty += position.qty

As you can see, I've again used Traits to make my life easier. In this case I've got a simple qty Float value which I intend to use for ease in checking the total quantity held in my Holding object (as well as a checksum). I also have defined a List of positions as another Trait of my class. From there, the add_to method is pretty much the obvious implementation.

The hard part is in the remove_from, which actually implements the various behavior of the queue ordering. I've punted (for now) on my original goal of a 'wifo' (worst-in-first-out) queue order, as I suspect I'll need to refactor my position class to handle different sorting. The important bits of the code are in the handling of a full position, a partial position, or multiple positions depending of the quantity to be removed. Rather than reproducing all the code here, I'll just provide the link. I use recursion in the case of multiple removals, otherwise it's pretty easy to follow.

This was a fairly simple step, yet I feel much closer to my portfolio/statement goals -- and I feel like I can "turn my back" to some extent on the code I've written so far. That is, it is reasonably tested code that meets the api I've defined by my tests.

Some TDD observations:
1. It is getting a little easier to think in terms of tests first as I've done this second exercise.
2. I did write some additional tests to try to get at the edges of the functionality I was looking for, and to provide more comprehensive regression coverage.
3. Small tests force you into a good habit of writing small chunks of code that work, heightening the impression of progress.

Next up: 1) aggregating the holdings into a real portfolio and 2) introducing a method of logging activity and reporting performance in the form of a statement.

As always, comments or corrections are welcome!

Tuesday, September 29, 2009

Test Driven Python Development Experiment: P&L Statement - Step 1

I began trading equities in my personal portfolio earlier this Spring using an account with Interactive Brokers (whom I'd recommend for their nice tools, good access to products, and published APIs). One of the issues I've had with them, however, is the lack of clarity in a daily P&L statement. Without bogging down in too many details, it boils down to the fact that I'd like to see a few things that are not provided out-of-the box by Interactive Brokers:

  • Realized gain rolled up for each day.

  • An intraday, calculated P&L in the event that I wish to use it as part of an automated trading system.

  • A choice between FIFO, LIFO and what I call WIFO (worst-in-first-out, a more accurate and better-rhyming name for HIFO) handling of trades.

Developing these capabilities seems like an excellent exercise for me to experiment with Test Driven Development (TDD) and share my experiences with others, so consider this a test for TDD using a below-average developer with simple needs. The current state of the code is available at github.

Position Object

The first thing needed is a data structure to hold information about each position. The data populating this data structure will come from a text log of transactions, or, potentially, a database with transaction records. Interactive Brokers provides several formats of text output. I chose to use a text file provided in a TradeLog format. Looking at a few lines of a typical log gives an idea of the type and format of data provided:

ACT_INF|UXXXXXXX|Test Company|Advisor Client|1111 Main Street, San Antonio TX 78201 United States

OPT_TRD|305665068|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|10:22:55|USD|23.00|100.00|5.55|12765.00|-16.10|1.00
OPT_TRD|305668016|APVGG|AAPL JUL09 135 C|NASDAQOM|SELLTOCLOSE|C|20090512|10:24:59|USD|-23.00|100.00|5.61|-12903.00|-10.35|1.00
OPT_TRD|305681552|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|10:36:59|USD|46.00|100.00|5.30|24380.00|-32.20|1.00
OPT_TRD|305721409|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|11:25:29|USD|47.00|100.00|5.15|24205.00|-32.90|1.00
OPT_TRD|305767844|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|12:30:38|USD|23.00|100.00|4.70|10810.00|-16.10|1.00
OPT_TRD|306301447|APVGG|AAPL JUL09 135 C|NASDAQOM|BUYTOOPEN|O|20090513|14:19:58|USD|30.00|100.00|3.19|9570.00|-13.50|1.00

The optimal way to put this information into a data structure in python would probably be a to use a numpy structured array. However, since performance is not an issue (at this time), and I'd like to use this problem as a TDD test for myself, I'll forgo the numpy approach and use more of a brute-force, object-oriented method.

If I were to codify some requirements for my data structure by writing a test, it might look something like this:

import position

def position_test():
""" Dead simple test to see if I can store data in my position object
and get it back out
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,

assert p.price==185.25
assert p.description==""
assert p.display_date=="05/22/2003"
assert p.display_time=="08:11:08"

OK, so there are several things implicit in this test (don't worry, I'll break all those asserts into different tests before going forward). One obvious point is that I seem to require some fairly sophisticated handling of datetimes and default values. Other than that, we have a big pile of keyword arguments and a bunch of attributes in a fairly brain-dead object. The conventional approach is to build a giant __init__ with a bunch of lines like this nonsense:

if not description is None:
self.description = description
self.description = 0.0

Frankly, we deserve better than to waste our lives on silly boilerplate code for initialization, validation and defaults. This is where Traits comes to the rescue (there's a nice tutorial here). I'll not go into all the details here, but you'll see how how traits allows us to have much cleaner code as I progress.

As a very simple first test, let's take our first assert and make a stand-alone test:

def position_attribute_test():
""" Dead simple test to see if I can store data in my position object
and get it back out
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,

assert p.price==185.25

We can solve it with the following code, which includes all the fields (attributes/traits) I think I'll need:

from enthought.traits.api import (HasTraits, Enum, Float, Int,

class Position(HasTraits):
""" Simple object to act as a data structure for a position

While all attributes (traits) are optional, classes that contain or
collect instances of the Position class will probably require the following:
symbol, trans_date, qty, price, total_amt


symbol = Str
id = Int
description = Str
trans_date = Float
qty = Float
price = Float
multiplier = Float(1.0)
fee = Float
exchange_rate = Float(1.0)
currency = Str("USD")
total_amt = Float
filled = Str
exchange = Str

I now can throw my test code into a directory called "test" adjacent to this module (which I named and I use the excellent nosetest tool to harvest and run any tests I've got. It passes! ... but that was actually not much of a challenge, so let's look at another test function:

def position_initialization_test():
""" Test to see if I can handle fields for which I provide no data.
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,

assert p.description==""

... this is similarly a soft pitch over the middle of the plate. Traits has already solved this for us. The test passes with no change to the code.

How about something a little more challenging:

def position_dates_test():
""" Test to see if I'm handling dates correctly
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,

assert p.date_display=="05/22/2003"
assert p.time_display=="08:11:08"

So now I have to actually write some code. I won't go into all the painful issues with Python date handling, I'll just provide a link to the date utility code that I wrote to help smooth things over.

I do want to step back and write a test that defines a need for date handling that is currently a bit broken (at least in Python 2.5) -- converting a python datetime to a timestamp and a timestamp to a python datetime:

from date_util import dt_from_timestamp, dt_to_timestamp

def date_util_test():
""" Simple test of correctly transforming a timestamp to a python datetime,
and back to a timestamp

# test a not-very-random sequence of times
for ts in range(0, 1250000000, 321321):
# simply see if we can round-trip the timestamp and get the same result
dt = dt_from_timestamp(ts)
ts2 = int(dt_to_timestamp(dt))
assert ts2 == ts

Picking back up with my position_dates_test ... the instructive part of what I need to do here is in using a Property Trait as a good human interface for my trans_date. The trans_date value is actually a float value indicating seconds (and fractional seconds) since the epoch (which happens to fall in the same year that we put the first man on the moon, the summer of love, Woodstock and, notably, the year of my birth -- 1969!). While the current tally of seconds since 1969 may be at the front of my mind, it's not a particularly good way to represent dates for most people. In fact, most GUIs that handle datetime entries separate the date and time. This makes sense -- we're used to interacting with the two with separate pieces of hardware (a calendar and a clock). The date and time properties are defined after the other traits in my class like so:

date_display = Property(Regex(value='11/17/1969',
time_display = Property(Regex(value='12:01:01',

# Property methods
def _get_date_display(self):
return dt_from_timestamp(self.trans_date, tz=Eastern).strftime("%m/%d/%Y")

def _set_date_display(self, val):
tm = self._get_time_display()
trans_date = datetime.strptime(val+tm, "%m/%d/%Y%H:%M:%S" )
trans_date = trans_date.replace(tzinfo=Eastern)
self.trans_date = dt_to_timestamp(trans_date)

def _get_time_display(self):
t = dt_from_timestamp(self.trans_date, tz=Eastern).strftime("%H:%M:%S")
return t

def _set_time_display(self, val):
trans_time = datetime.strptime(self._get_date_display()+val, "%m/%d/%Y%H:%M:%S")
trans_time = trans_time.replace(tzinfo=Eastern)
self.trans_date = dt_to_timestamp(trans_time)

The getter and setter methods behave just as you would expect, and the whole thing provides a meaningful interface to the timestamp stored in my trans_date trait. There are some imports that I don't show here for brevity, but it's fairly clean code considering what it does. The best part of all -- my test passes!

Finally, I want to try another test, because I know I'll need to sort my Position objects within a collection. Here's a simple test function:

def position_sort_test():
""" Test to see if I can collect and sort these properly.
The objective is to have the objects sort by the trans_date

p0 = position.Position(id=102, symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,

p1 = position.Position(id=103, symbol="AAPL", qty=-1000, price=186.25,
multiplier=1., fee=7.0, total_amt=-186250.,

p2 = position.Position(id=101, symbol="AAPL", qty=500, price=184.00,
multiplier=1., fee=7.0, total_amt=62000.,

plist = [p0, p1, p2]


assert plist[0]==p2
assert plist[1]==p0
assert plist[2]==p1

This just collects a few positions into a list, calls the .sort method on the list and checks to see if it accurately sorts by the trans_date. The method I add to my class to accomplish this is actually quite simple. I define a __cmp__ method which correctly compares objects of this class and I'm good to go:

# support reasonable sorting based on trans_date
def __cmp__(self, other):
if self.trans_date < other.trans_date:
return -1
elif self.trans_date > other.trans_date:
return 1
else: return 0

This pretty much wraps up the functionality I want (so far) in my position class. This is evidenced by my nosetests results:

Oh, and one more thing -- thanks to the miracle of traits I can call the edit_traits method on any position object and get a nice form with all the fields. I've added a view definition to my position class to pare this down a bit:

traits_view = View(Item('symbol', label="Symb"),
buttons=['OK', 'Cancel'], resizable=True)

Now if p is a Position object then calling p.edit_traits() pops up the following dialog:

Again, all the code for this is available via github.

Given this simple exercise in TDD, my initial impression is that it's helpful for ensuring good test coverage for features, but I really need to go back and write corner-case tests and tests which will pick up minor regressions in the state of my code better. Also, I found that considering how to best design the code was often a separate exercise from designing tests -- so I could really sense I was switching mental contexts to try to drive the development with well thought-out tests. I'm willing to concede that this is unique to this developer. Overall, I think the jury is still out. I get the feeling I don't have enough experience with TDD to do it well, but hope springs eternal.

I'll handle proper queueing and adding/removing positions in a portfolio in the next post in this series. Until then, please raise any questions or corrections in the comments below.

Wednesday, August 26, 2009

Multidimensional Data Visualization in Python - Mixing Chaco and Mayavi

In a previous post, I recreated an infographic using the Chaco plotting library. Inspired by Peter Wang's lightning talk (scroll to about 5:15 in the video) at the recent SciPy Conference, I've extended this idea a bit to show the exploration of a "4D" data set (three axes and the color/size of the points) and using a 5th dimension (the date) as an interactive filter.

Since it's a whole lot easier to demonstrate than to describe, I made a short screencast of me playing with it:


While a bit hackish, the code is available for anyone wishing to play with or improve it.

I know I've said this before, but it bears repeating -- Mayavi is awesome.