Tuesday, September 29, 2009

Test Driven Python Development Experiment: P&L Statement - Step 1

I began trading equities in my personal portfolio earlier this Spring using an account with Interactive Brokers (whom I'd recommend for their nice tools, good access to products, and published APIs). One of the issues I've had with them, however, is the lack of clarity in a daily P&L statement. Without bogging down in too many details, it boils down to the fact that I'd like to see a few things that are not provided out-of-the box by Interactive Brokers:

  • Realized gain rolled up for each day.

  • An intraday, calculated P&L in the event that I wish to use it as part of an automated trading system.

  • A choice between FIFO, LIFO and what I call WIFO (worst-in-first-out, a more accurate and better-rhyming name for HIFO) handling of trades.

Developing these capabilities seems like an excellent exercise for me to experiment with Test Driven Development (TDD) and share my experiences with others, so consider this a test for TDD using a below-average developer with simple needs. The current state of the code is available at github.

Position Object


The first thing needed is a data structure to hold information about each position. The data populating this data structure will come from a text log of transactions, or, potentially, a database with transaction records. Interactive Brokers provides several formats of text output. I chose to use a text file provided in a TradeLog format. Looking at a few lines of a typical log gives an idea of the type and format of data provided:

ACCOUNT_INFORMATION
ACT_INF|UXXXXXXX|Test Company|Advisor Client|1111 Main Street, San Antonio TX 78201 United States

OPTION_TRANSACTIONS
OPT_TRD|305665068|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|10:22:55|USD|23.00|100.00|5.55|12765.00|-16.10|1.00
OPT_TRD|305668016|APVGG|AAPL JUL09 135 C|NASDAQOM|SELLTOCLOSE|C|20090512|10:24:59|USD|-23.00|100.00|5.61|-12903.00|-10.35|1.00
OPT_TRD|305681552|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|10:36:59|USD|46.00|100.00|5.30|24380.00|-32.20|1.00
OPT_TRD|305721409|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|11:25:29|USD|47.00|100.00|5.15|24205.00|-32.90|1.00
OPT_TRD|305767844|APVGG|AAPL JUL09 135 C|ISE|BUYTOOPEN|O|20090512|12:30:38|USD|23.00|100.00|4.70|10810.00|-16.10|1.00
OPT_TRD|306301447|APVGG|AAPL JUL09 135 C|NASDAQOM|BUYTOOPEN|O|20090513|14:19:58|USD|30.00|100.00|3.19|9570.00|-13.50|1.00

The optimal way to put this information into a data structure in python would probably be a to use a numpy structured array. However, since performance is not an issue (at this time), and I'd like to use this problem as a TDD test for myself, I'll forgo the numpy approach and use more of a brute-force, object-oriented method.

If I were to codify some requirements for my data structure by writing a test, it might look something like this:


import position

def position_test():
""" Dead simple test to see if I can store data in my position object
and get it back out
"""
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
trans_date=1053605468.54)

assert p.price==185.25
assert p.description==""
assert p.display_date=="05/22/2003"
assert p.display_time=="08:11:08"

OK, so there are several things implicit in this test (don't worry, I'll break all those asserts into different tests before going forward). One obvious point is that I seem to require some fairly sophisticated handling of datetimes and default values. Other than that, we have a big pile of keyword arguments and a bunch of attributes in a fairly brain-dead object. The conventional approach is to build a giant __init__ with a bunch of lines like this nonsense:

if not description is None:
self.description = description
else:
self.description = 0.0

Frankly, we deserve better than to waste our lives on silly boilerplate code for initialization, validation and defaults. This is where Traits comes to the rescue (there's a nice tutorial here). I'll not go into all the details here, but you'll see how how traits allows us to have much cleaner code as I progress.

As a very simple first test, let's take our first assert and make a stand-alone test:

def position_attribute_test():
""" Dead simple test to see if I can store data in my position object
and get it back out
"""
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
trans_date=1053605468.54)

assert p.price==185.25

We can solve it with the following code, which includes all the fields (attributes/traits) I think I'll need:

from enthought.traits.api import (HasTraits, Enum, Float, Int,
Str)

class Position(HasTraits):
""" Simple object to act as a data structure for a position

While all attributes (traits) are optional, classes that contain or
collect instances of the Position class will probably require the following:
symbol, trans_date, qty, price, total_amt

"""


side = Enum("BUYTOOPEN", ["SELLTOCLOS", "BUYTOOPEN", "SELLTOOPEN", "BUYTOCLOSE"])
symbol = Str
id = Int
description = Str
trans_date = Float
qty = Float
price = Float
multiplier = Float(1.0)
fee = Float
exchange_rate = Float(1.0)
currency = Str("USD")
total_amt = Float
filled = Str
exchange = Str

I now can throw my test code into a directory called "test" adjacent to this module (which I named position.py) and I use the excellent nosetest tool to harvest and run any tests I've got. It passes! ... but that was actually not much of a challenge, so let's look at another test function:

def position_initialization_test():
""" Test to see if I can handle fields for which I provide no data.
"""
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
trans_date=1053605468.54)


assert p.description==""

... this is similarly a soft pitch over the middle of the plate. Traits has already solved this for us. The test passes with no change to the code.

How about something a little more challenging:

def position_dates_test():
""" Test to see if I'm handling dates correctly
"""
p = position.Position(symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
trans_date=1053605468.54)

assert p.date_display=="05/22/2003"
assert p.time_display=="08:11:08"

So now I have to actually write some code. I won't go into all the painful issues with Python date handling, I'll just provide a link to the date utility code that I wrote to help smooth things over.

I do want to step back and write a test that defines a need for date handling that is currently a bit broken (at least in Python 2.5) -- converting a python datetime to a timestamp and a timestamp to a python datetime:

from date_util import dt_from_timestamp, dt_to_timestamp


def date_util_test():
""" Simple test of correctly transforming a timestamp to a python datetime,
and back to a timestamp
"""

# test a not-very-random sequence of times
for ts in range(0, 1250000000, 321321):
# simply see if we can round-trip the timestamp and get the same result
dt = dt_from_timestamp(ts)
ts2 = int(dt_to_timestamp(dt))
assert ts2 == ts


Picking back up with my position_dates_test ... the instructive part of what I need to do here is in using a Property Trait as a good human interface for my trans_date. The trans_date value is actually a float value indicating seconds (and fractional seconds) since the epoch (which happens to fall in the same year that we put the first man on the moon, the summer of love, Woodstock and, notably, the year of my birth -- 1969!). While the current tally of seconds since 1969 may be at the front of my mind, it's not a particularly good way to represent dates for most people. In fact, most GUIs that handle datetime entries separate the date and time. This makes sense -- we're used to interacting with the two with separate pieces of hardware (a calendar and a clock). The date and time properties are defined after the other traits in my class like so:

date_display = Property(Regex(value='11/17/1969',
regex='\d\d[/]\d\d[/]\d\d\d\d'),
depends_on='trans_date')
time_display = Property(Regex(value='12:01:01',
regex='\d\d[:]\d\d[:]\d\d'),
depends_on='trans_date')

###################################
# Property methods
def _get_date_display(self):
return dt_from_timestamp(self.trans_date, tz=Eastern).strftime("%m/%d/%Y")

def _set_date_display(self, val):
tm = self._get_time_display()
trans_date = datetime.strptime(val+tm, "%m/%d/%Y%H:%M:%S" )
trans_date = trans_date.replace(tzinfo=Eastern)
self.trans_date = dt_to_timestamp(trans_date)
return

def _get_time_display(self):
t = dt_from_timestamp(self.trans_date, tz=Eastern).strftime("%H:%M:%S")
return t

def _set_time_display(self, val):
trans_time = datetime.strptime(self._get_date_display()+val, "%m/%d/%Y%H:%M:%S")
trans_time = trans_time.replace(tzinfo=Eastern)
self.trans_date = dt_to_timestamp(trans_time)
return

The getter and setter methods behave just as you would expect, and the whole thing provides a meaningful interface to the timestamp stored in my trans_date trait. There are some imports that I don't show here for brevity, but it's fairly clean code considering what it does. The best part of all -- my test passes!

Finally, I want to try another test, because I know I'll need to sort my Position objects within a collection. Here's a simple test function:

def position_sort_test():
""" Test to see if I can collect and sort these properly.
The objective is to have the objects sort by the trans_date
trait.
"""


p0 = position.Position(id=102, symbol="AAPL", qty=1000, price=185.25,
multiplier=1., fee=7.0, total_amt=185250.,
trans_date=1045623459.68)

p1 = position.Position(id=103, symbol="AAPL", qty=-1000, price=186.25,
multiplier=1., fee=7.0, total_amt=-186250.,
trans_date=1053605468.54)

p2 = position.Position(id=101, symbol="AAPL", qty=500, price=184.00,
multiplier=1., fee=7.0, total_amt=62000.,
trans_date=1021236990.02)

plist = [p0, p1, p2]

plist.sort()

assert plist[0]==p2
assert plist[1]==p0
assert plist[2]==p1

This just collects a few positions into a list, calls the .sort method on the list and checks to see if it accurately sorts by the trans_date. The method I add to my class to accomplish this is actually quite simple. I define a __cmp__ method which correctly compares objects of this class and I'm good to go:

# support reasonable sorting based on trans_date
def __cmp__(self, other):
if self.trans_date < other.trans_date:
return -1
elif self.trans_date > other.trans_date:
return 1
else: return 0

This pretty much wraps up the functionality I want (so far) in my position class. This is evidenced by my nosetests results:

Oh, and one more thing -- thanks to the miracle of traits I can call the edit_traits method on any position object and get a nice form with all the fields. I've added a view definition to my position class to pare this down a bit:

traits_view = View(Item('symbol', label="Symb"),
Item('date_display'),
Item('time_display'),
Item('qty'),
buttons=['OK', 'Cancel'], resizable=True)

Now if p is a Position object then calling p.edit_traits() pops up the following dialog:

Again, all the code for this is available via github.

Given this simple exercise in TDD, my initial impression is that it's helpful for ensuring good test coverage for features, but I really need to go back and write corner-case tests and tests which will pick up minor regressions in the state of my code better. Also, I found that considering how to best design the code was often a separate exercise from designing tests -- so I could really sense I was switching mental contexts to try to drive the development with well thought-out tests. I'm willing to concede that this is unique to this developer. Overall, I think the jury is still out. I get the feeling I don't have enough experience with TDD to do it well, but hope springs eternal.

I'll handle proper queueing and adding/removing positions in a portfolio in the next post in this series. Until then, please raise any questions or corrections in the comments below.