Avoiding overfitting in an evolving market

This is a methodological note on how we approach in sample and out of sample testing.  Avoiding overfitting is of course paramount to building robust quantitative models.  Reserving some portion of one’s historical data for validation of one’s models is a great way to know whether you’ve overfit; if out of sample performance is comparable to in sample performance, you have some evidence that your modeling was less subject to data mining (in the perjorative sense of the term).  Slicing data by time is probably the most common way to do this, with the initial say 60-75% of the dates in the sample used for training (in sample) and the most recent 25-40% of dates used for validation (out of sample).

A difficulty of this approach is that markets evolve over time.  If you build a model which works great through several years ago, it may not reflect the increasing crowdedness of a trade, or changes in market microstructure or the macroeconomic environment, for example.  To address this problem we’ve devised a “striped” approach for our model building which alternates in sample and out of sample dates for some portion of the history, as follows:

Note that for the 2005-2016 period, 60% of the dates are in sample, with a contiguous block at the beginning and alternating months towards the end.  The last two years are reserved as fully out of sample.  You’ll also notice that in the striped 2010-2014 period the particular in sample months change each year: in even years (2010, 2012, 2014), odd months (Jan/Mar/May/Jul/Sep/Nov) are in sample; and in odd years, even months are in sample.  This helps to avoid seasonal bias in our results if, for example our returns are all coming from earnings season.

The benefit here is that we’re able to look at relatively recent data without burning all of the out of sample data through that relatively recent date.  The cost is that one must be very careful to not let out of sample data “bleed” into in sample periods which come afterwards.  In particular, we completely throw away data on anything we’re trying to predict (for example, returns) from the out of sample data prior to doing any modeling.

This is just one of several techniques we use to enhance our model construction technique, some of which I touched on in an article series last year on best practices in quant research.  The result is that we should see out of sample performance which is broadly comparable with, and roughly contemporaneous with, in sample performance, as we saw for our recently released CAM1 model:

It’s comforting to see that the degradation in cumulative returns when going out of sample is modest.  Good in sample selection is a very useful item in a quant’s toolbox and deserves careful consideration.