I will never forget the iconic mistake I made during my graduate training programme as a quantitative analyst.
My first assignment was to design and implement a multi-factor ranking model for the head of quants at an investment firm. This involved delivering a ‘What-if-analysis’ tool that would allow a fund manager to:
I immersed myself in the problem. I developed the application over a 6-month period. While testing the beta version, I played with various factors and weights. At this point I found a set of factors and weights that appeared to ‘shoot the lights out’. I showed this model to the head of quants who agreed with me: I had achieved something remarkable – a consistent high-return model, achieving in some cases annual returns of around 100%.
The model was an investor’s dream. It seemed too good to be true. It was too good to be true. So where had I gone wrong?
The concept of a data declaration date was not foreign to me. All financial line item forecasts (estimates) come with a declaration date (revision date). When calculating factors such as a 12-month forecasted earnings growth, one needs to consider the most recent forecast declared on, or before, the date under consideration.
My blind spot was in not realising that the data declaration date is just as important for historically declared fundamental data, such as headline earnings per share. My data load process would import fundamental data and save it against the date it was applicable to, without consideration of the date on which it was actually declared. In other words the model used data that would not have been available at that point of simulation during the period under consideration.
The following figure illustrates the ‘information as at date’ (IAAD) concept. When retrospectively modelling the period up to the IAAD, the model would, for example, utilise the stored value for EPS3. This lead to the ‘shooting the lights out’ scenario, as the modeller had overlooked the fact that the data value for EPS3 had not been reported at that time. The model was effectively knowing the unknowable:
Quintessence stores research data against the dates to which they are applicable (the value date). In addition, Quintessence stores the date on which the values were declared (the declaration date). This allows What-if-analysis models to look at any period in time, and only consider what information would have been available at the time, regardless of what values are stored for the period.
Never underestimate the importance of data declaration dates when modelling with data, retrospectively or otherwise.