How to Invest Systematically and Gain Alpha – Part II

Share This Post

By Vinesh Jha, CEO & founder of ExtractAlpha

In my previous post, I discussed some of the benefits of systematic investing. In this post, we will take a look at a few pitfalls.

Systematic investing typically relies on backtesting. This is the process of stepping through historical data and seeing how a strategy would have performed if it had been used at the time. However, backtesting is not without its pitfalls. 

There’s a really nice presentation by Deutsche Bank’s former quant research team on this topic, which I’d encourage everyone to read. Most of the things I’ll mention here are discussed there too, with perhaps a slightly different take.

Sample size matters

The first issue is sample size. Many quants we talk to ask for 5 to 10 years of historical data before they will consider backtesting a new strategy or factor. Unfortunately many of the most interesting new alternative datasets simply don’t have that much history available. For example, ExtractAlpha’s Transcripts Model – Asia is based on earnings call transcripts for Japan and Greater China which simply weren’t available 10 years ago. So we need to make compromises in some cases. The 5-10 year minimum should really be considered more of a guideline than a rule. If you’re working on very high frequency data, for example, you’ll get plenty of data points within just one year of data, and if you’re looking at macroeconomic effects, you may not experience a sufficient diversity of regimes within a single decade – so your backtest may not capture a broad enough range of possible future conditions.

I wish I had a time machine

Of course, things have changed substantially in ten years. We’ve been through high and low volatility regimes, trading costs have come down, and quant investing has become more competitive. We’ll discuss some good practices for taking those issues into account in the next post. For now we will mention two related dangers of looking back several years: lookahead bias and its pernicious special case, survivorship bias.

Lookahead bias refers to the practice of assuming that we would have known something historically which we did not in fact know. A cynic could argue that all backtesting is an exercise in lookahead bias, but we can certainly control for certain aspects of lookahead in our test design. For example, quarterly financials are not reported immediately after a company’s fiscal quarter ends, and if we don’t make conservative enough assumptions about the data’s historical availability, we may “peek ahead” into the future in assuming we know the health of a company’s balance sheet before it was made available. Having accurately timestamped data – and knowing what time zone that timestamp is in! – is a good way to avoid lookahead bias. 

Lookahead can manifest in other, sneakier ways too; one common culprit is data which is presented as static but which really should be stored as a time series. The classic example is index membership. Let’s suppose we want to test whether the stocks in the S&P 500 with tickers starting with the letter A typically outperform. If we take the current 500 members and select out the A’s, we can build a portfolio of those stocks and compare it to the S&P 500 index. Here we go:

Pretty awesome, right? We beat the benchmark handily! But of course we didn’t know back in January 2000 which stocks would be in today’s S&P 500, and the fact that they survived to become large caps means that almost by definition they did relatively well over this period, regardless of what letters were in their ticker symbols. That’s survivorship bias, one of the most common errors for a new quant to make, and the term for a dataset which does not exhibit survivorship bias is point-in-time (PiT). I talked about a more ESG-flavored variant of this issue in this post which questions the WSJ’s claim that more diverse companies outperform.

As an aside, there’s another problem with my backtest here: I’m comparing an equally weighted portfolio to a cap-weighted index. If the smaller names in the portfolio outperformed the larger names which dominate the index, that could cause apparent outperformance that’s not really attributable to our strategy. It’s always worth thinking about what the right benchmark should really be; for example, I often see long/short portfolios nonsensically benchmarked against a long-only index.

Reality check

I’m a big fan of scanning academic research for idea generation, but academics tend to ignore some realities which practitioners simply can’t. We can’t ignore transaction costs and market impact; we may find a factor which seems to predict returns very nicely over short horizons, but if we trade large quantities of capital too quickly we’ll move the market against us and will generate excessive trading costs. And if we don’t act quickly enough on such a factor because our trading process exhibits too much latency, the factor’s efficacy may decay. (Fast moving factors can still be useful in a long horizon strategy if implemented correctly, though).

Furthermore, if we have a strategy which includes short sales, we may not be able to realistically borrow sufficient amounts of the stocks we need, if the demand for borrow exceeds the supply. This is a somewhat difficult thing to simulate historically, but we can make assumptions about our universe which can mitigate the effects of this shortcoming somewhat. We’ll touch on that in the next post too.

Monkey business

The final pitfall I’ll mention is perhaps the hardest one to overcome: the risk of overfitting a model. Part 1’s example of butter in Bangladesh is related to overfitting: Leinweber sifted through economic variables until he found one which spuriously explained the S&P 500. Similarly, if we try a sufficient number of factors and a sufficient number of parametrizations of those factors, we will eventually find the proverbial monkey at a typewriter who seems to have written Shakespeare. What’s the likelihood that he will continue to write Shakespeare going forward?

The best controls for overfitting are hypothesis development and parsimony. We should have a good economic or behavioral intuition for why a factor should work or not work, and then model it in a simple way. We can then run some sensitivity analysis to make sure we haven’t stumbled upon a randomly good parametrization, but simple expressions of an idea tend to be more robust than complex ones. This holds even if we are utilizing advanced machine learning techniques.

In summary

The pitfalls you can avoid while backtesting are:

  • Sample size: Using a backtest with a small sample size can lead to inaccurate results The minimum recommended sample size is 5-10 years, but this may vary depending on the strategy and the market conditions
  • Lookahead bias: This is the backtesting pitfall of using information that was not available at the time you are simulating. This can lead to inflated returns and unrealistic expectations
  • Survivorship bias: This is a special case of Lookahead Bias where, in a backtest, we use information on what companies or other entities later survived (or thrived or failed), which was not known at the time.
  • Transaction costs: Backtests often do not take into account the costs of trading, such as commissions and slippage. This can significantly reduce the returns of a strategy
  • Overfitting: This is the practice of fitting a model too closely to the historical data. This can lead to a model that performs well in the backtest but does not generalize well to new data

I hope you found this discussion of pitfalls useful. In the next and final post of the series, I cover some best practices for successful systematic investment research.

More To Explore

Chloe Miao

Chloe joined ExtractAlpha in 2023. Prior to joining, she was an associate director at Value Search Asia Limited. She earned her Masters of Arts in Global Communications from the Chinese University of Hong Kong.

Matija Ratkovic

Matija is a specialist in software sales and customer success, bringing experience from various industries. His career, before sales, includes tech support, software development, and managerial roles. He earned his BSc and Specialist Degree in Electrical Engineering at the University of Montenegro.

Jack Kim

Jack joined ExtractAlpha in 2022. Previously, he spent 20+ years supporting pre- and after-sales activities to drive sales in the Asia Pacific market. He has worked in many different industries including, technology, financial services, and manufacturing, where he developed excellent customer relationship management skills. He received his Bachelor of Business in Operations Management from the University of Technology Sydney.

Perry Stupp

Perry brings more than 20 years of Enterprise Software development, sales and customer engagement experience focused on Fortune 1000 customers. Prior to joining ExtractAlpha as a Technical Consultant, Perry was the founder, President and Chief Customer Officer at Solution Labs Inc. a data analytics company that specialized in the analysis of very large-scale computing infrastructures in place at some of the largest corporate data centers in the world.

Perry Stupp

Perry brings more than 20 years of Enterprise Software development, sales and customer engagement experience focused on Fortune 1000 customers. Prior to joining ExtractAlpha as a Technical Consultant, Perry was the founder, President and Chief Customer Officer at Solution Labs Inc. a data analytics company that specialized in the analysis of very large-scale computing infrastructures in place at some of the largest corporate data centers in the world.

Janette Ho

Janette has 22+ years of leadership and management experience in FinTech and analytics sales and business development in the Asia Pacific region. In addition to expertise in quantitative models, she has worked on risk management, portfolio attribution, fund accounting, and custodian services. Janette is currently head of relationship management at Moody’s Analytics in the Asia-Pacific region, and was formerly Managing Director at State Street, head of sales for APAC Asset Management at Thomson Reuters, and head of Asia for StarMine. She is also a board member at Human Financial, a FinTech firm focused on the Australian superannuation industry.

Leigh Drogen

Leigh founded Estimize in 2011. Prior to Estimize, Leigh ran Surfview Capital, a New York based quantitative investment management firm trading medium frequency momentum strategies. He was also an early member of the team at StockTwits where he worked on product and business development.  Leigh is now the CEO of StarKiller Capital, an institutional investment management firm in the digital asset space.

Andrew Barry

Andrew is the CEO of Human Financial, a technology innovator that is pioneering consumer-led solutions for the superannuation industry. Andrew was previously CEO of Alpha Beta, a global quant hedge fund business. Prior to Alpha Beta he held senior roles in a number of hedge funds globally.

Natallia Brui

Natallia has 7+ years experience as an IT professional. She currently manages our Estimize platform. Natallia earned a BS in Computer & Information Science in Baruch College and BS in Economics from BSEU in Belarus. She has a background in finance, cybersecurity and data analytics.

June Cook

June has a background in B2B sales, market research, and analytics. She has 10 years of sales experience in healthcare, private equity M&A, and the tech industry. She holds a B.B.A. from Temple University and an M.S. in Management and Leadership from Western Governors University.

Steven Barrett

Steve worked as a trader at hedge funds and prop desks in Hong Kong and London for 15+ years. He also held roles in management consultancy, internal audit and business management. He holds a BA in Business Studies from Oxford Brookes University and an MBA from Hong Kong University of Science & Technology.

Jenny Zhou, PhD

Jenny joined ExtractAlpha in 2023. Prior to that, she worked as a quantitative researcher for Chorus, a hedge fund under AXA Investment Managers. Jenny received her PhD in finance from the University of Hong Kong in 2023. Her research covers ESG, natural language processing, and market microstructure. Jenny received her Bachelor degree in Finance from The Chinese University of Hong Kong in 2019. Her research has been published in the Journal of Financial Markets.

Kristen Gavazzi

Kristen joined ExtractAlpha in 2021 as a Sales Director. As a past employee of StarMine, Kristen has extensive experience in analyst performance analytics and helped to build out the sell-side solution, StarMine Monitor. She received her BS in Business Management from Cornell University.

Triloke Rajbhandary

Triloke has 10+ years experience in designing and developing software systems in the financial services industry. He joined ExtractAlpha in 2016. Prior to that, he worked as a senior software engineer at HSBC Global Technologies. He holds a Master of Applied Science degree from Ryerson University specializing in signal processing.

Jackie Cheng, PhD

Jackie joined ExtractAlpha in 2018 as a quantitative researcher. He received his PhD in the field of optoelectronic physics from The University of Hong Kong in 2017. He published 17 journal papers and holds a US patent, and has 500 citations with an h-index of 13. Prior to joining ExtractAlpha, he worked with a Shenzhen-based CTA researching trading strategies on Chinese futures. Jackie received his Bachelor’s degree in engineering from Zhejiang University in 2013.

Yunan Liu, PhD

Yunan joined ExtractAlpha in 2019 as a quantitative researcher. Prior to that, he worked as a research analyst at ICBC, covering the macro economy and the Asian bond market. Yunan received his PhD in Economics & Finance from The University of Hong Kong in 2018. His research fields cover Empirical Asset Pricing, Mergers & Acquisitions, and Intellectual Property. His research outputs have been presented at major conferences such as AFA, FMA and FMA (Asia). Yunan received his Masters degree in Operations Research from London School of Economics in 2013 and his Bachelor degree in International Business from Nottingham University in 2012.

Willett Bird, CFA

Prior to joining ExtractAlpha in 2022, Willett was a sales director for Vidrio Financial. Willett was based in Hong Kong for nearly two decades where he oversaw FIS Global’s Asset Management and Commercial Banking efforts. Willett worked at FactSet, where he built the Asian Portfolio and Quantitative Analytics team and oversaw FactSet’s Southeast Asian operations. Willett completed his undergraduate studies at Georgetown University and finished a joint degree MBA from the Northwestern Kellogg School and the Hong Kong University of Science and Technology in 2010. Willett also holds the Chartered Financial Analyst (CFA) designation.

Julie Craig

Julie Craig is a senior marketing executive with decades of experience marketing high tech, fintech, and financial services offerings. She joined ExtractAlpha in 2022. She was formerly with AlphaSense, where she led marketing at a startup now valued at $1.7B. Prior to that, she was with Interactive Data where she led marketing initiatives and a multi-million dollar budget for an award-winning product line for individual and institutional investors.

Jeff Geisenheimer

Jeff is the CFO and COO of ExtractAlpha and directs our financial, strategic, and general management operations. He previously held the role of CFO at Estimize and two publicly traded firms, Multex and Market Guide. Jeff also served as CFO at private-equity backed companies, including Coleman Research, Ford Models, Instant Information, and Moneyline Telerate. He’s also held roles as advisor, partner, and board member at Total Reliance, CreditRiskMonitor, Mochidoki, and Resurge.

Vinesh Jha

Vinesh founded ExtractAlpha in 2013 with the mission of bringing analytical rigor to the analysis and marketing of new datasets for the capital markets. Since ExtractAlpha’s merger with Estimize in early 2021, he has served as the CEO of both entities. From 1999 to 2005, Vinesh was the Director of Quantitative Research at StarMine in San Francisco, where he developed industry leading metrics of sell side analyst performance as well as successful commercial alpha signals and products based on analyst, fundamental, and other data sources. Subsequently, he developed systematic trading strategies for proprietary trading desks at Merrill Lynch and Morgan Stanley in New York. Most recently he was Executive Director at PDT Partners, a spinoff of Morgan Stanley’s premiere quant prop trading group, where in addition to research, he also applied his experience in the communication of complex quantitative concepts to investor relations. Vinesh holds an undergraduate degree from the University of Chicago and a graduate degree from the University of Cambridge, both in mathematics.

Subscribe to the ExtractAlpha monthly newsletter