How to Build Predictive Models with Financial Data

How to Build Predictive Models with Financial Data
Learn how to harness financial data for predictive modeling, improving accuracy and insights for better decision-making in finance.

Share This Post

Predictive financial modeling transforms historical data into actionable forecasts using advanced statistical methods and machine learning. Companies leveraging these techniques often see revenue increases of 10-20% compared to traditional approaches. Here’s a quick breakdown of how to get started:

  1. Understand Predictive Modeling: It combines past data, mathematical rules, and machine learning to forecast outcomes like market trends, risks, and earnings.
  2. Key Applications: Common uses include stock trading, risk assessment, fraud detection, and cash flow forecasting.
  3. Data Preparation: High-quality data is critical. Clean and preprocess data to fix missing values, standardize formats, and remove errors.
  4. Feature Engineering: Create meaningful features, such as financial ratios, lagged variables, or technical indicators, to improve model accuracy.
  5. Model Selection: Choose models based on your goals and data complexity – options include linear regression, decision trees, SVMs, and deep learning (e.g., LSTM for time series).
  6. Validation: Use techniques like k-fold cross-validation and backtesting to ensure reliability. Regular updates and monitoring are essential to maintain accuracy.
  7. Alternative Data: Incorporate unique data sources like satellite imagery, social media sentiment, or credit card trends for sharper predictions.

Quick Overview

  • Steps: Data preparation → Feature engineering → Model selection → Validation
  • Tools: Python libraries (pandas, Scikit-learn, Featuretools, etc.)
  • Metrics: MAE, RMSE, Sharpe ratio, precision, recall
  • Best Practices: Regular updates, ensemble methods, and monitoring for data drift

Building predictive models requires a mix of technical skills, quality data, and thoughtful validation. By following these steps, you can create models that provide accurate and actionable financial insights.

Using Predictive Analytics for Improving Financial Forecasting

Setting Up Money Data

Good data is key to good market guesses. Bad data can cause big errors and expensive wrong moves. To make strong models, you need good data and a careful setup method.

Looking for Money and Other Datasets

The hunt for good money data isn’t just about taking the first set you find. You must check its quality, range, and fit with your systems. Jonathan Gerber, the top person at RVW Wealth, says it this way:

"Historical consistency is crucial when selecting financial datasets for predictive modeling. Datasets must demonstrate reliability over time and have a proven track record of forecasting outcomes correctly. Regulatory compliance is also nonnegotiable to protect clients’ interests." [5]

Old ways to look at money, like price lists, profit reports, and big signs of how the economy is doing, are basics in money study. But lately, other forms of data have changed the game. Look at Consumer Edge, for one. They check info from more than 100 million credit and debit cards, linked to over 30,000 brands and more than 700 big firms. This shows what people are buying right now [4].

For special kinds of data, sites like LobbyingData.com show a lot about who talks to US leaders. They keep an eye on more than 1.6 million deals and over 200,000 groups, giving early hints about big rule changes that might hit some areas or firms [4].

When to get the data is key too. Dana Ronald, who runs the Tax Crisis Institute, says this is big:

"Accuracy, relevance, and timeliness are key when choosing financial datasets. In finance, even slightly outdated data can lead to inaccurate predictions. Always ensure your data aligns with your model’s goals and comes from credible sources." [5]

When looking at sets of data, think about points like how easy it is to get, how much info it has, how well it fits your need, and if you can trust the place it came from. Sellers need to tell you in a simple way where they get their data and if it is true [2]. Also, make sure their data fits what your work needs it to do [3].

Once you have your data sets, you should clean them and put them in a good shape for accurate work.

Data Cleaning and Preprocessing

Data from money matters is often not ready to use. Things like missing info, different styles, and mistakes are usual and you need to fix them. Cleaning data may be boring, but it is a must-do step that changes how well your model works.

If info is missing, pick a plan that suits your data. For example, using the next value works well for small breaks in stock prices, while filling in or leaving out may be better for bigger breaks. In data about money earned, a missing bit might mean no news was shared – a fact that could be important.

You must also use a set date and time style, like MM/DD/YYYY, and think about market breaks and hours. If your data covers more than one time zone, change time marks to Eastern Time (ET) to match U.S. market times.

For data from other places, change values to U.S. dollars using today’s money swap rates. Be aware that money changes can shift relations in your model.

Be careful with odd points. Some wild values may show big market moves, while others could just be wrong. Tools like the interquartile range (IQR) or z-scores might help spot weird points, but check each one before changing.

Check that all related data points make sense together. For instance, how much a company is worth should match its stock price times its shares out there.

After cleaning and fixing your data, use special tools to make more steps easier.

Tools for Data Preparation

For prepping money data, Python’s pandas library is really helpful. It makes it easier to deal with missing info, put data sets together, and change how data looks [6] [8].

Other Python tools like NumPy, Scikit-learn, and Statsmodels can up your workflow as you get data ready [7]. NumPy does number work fast, while Scikit-learn has tools like scalers and encoders to ready data for machine learning.

Start by loading your data with pandas’ read_csv() or read_excel() functions, good for messy sets. Pandas help manage missing info with things like fillna() to fill forward, dropna() to get rid of incomplete parts, and interpolate() to guess missing values [6] [8].

Changing data is easy with pandas too. You can change date styles, make time data monthly instead of daily, and find rolling averages or risk numbers. For big data sets, pandas can cut memory use with things like type changes and Parquet storage, much better than CSV files.

Creating and Selecting Features

Once your data is cleaned and prepped, the next step is crafting features that help your model identify patterns and improve predictions. This process, known as feature engineering, plays a key role in boosting the performance of machine learning models in financial data analysis [10]. The right features can turn a mediocre model into one that delivers actionable trading insights.

Building Predictive Features

Financial markets churn out massive amounts of raw data, but raw numbers alone won’t cut it. The challenge is to transform this data into meaningful features that capture hidden trends and behaviors in the market.

Start with financial ratios – these are often some of the most insightful features. Ratios like price-to-earnings, debt-to-equity, and return on equity can reveal more about a company’s financial health than raw numbers ever could. For example, a company’s current ratio (current assets divided by current liabilities) offers a clearer picture of liquidity than simply comparing assets and liabilities.

Lagged variables are another powerful tool. By using historical data points, such as stock prices from 5, 10, or 20 days ago, you can create features that help predict future trends. Other lagged metrics, like volume or volatility, can also be useful. Time series data opens the door for techniques like rolling statistics – think 20-day moving averages, 30-day volatility, or rolling correlations – which help capture market trends over time [9].

Technical indicators derived from market data can also be turned into effective features. Indicators like Bollinger Bands and MACD (Moving Average Convergence Divergence) condense complex price and volume patterns into single, digestible numbers for your model.

Don’t stop there – consider feature interactions. Combining features through arithmetic operations can uncover relationships that individual features might miss. For instance, multiplying a stock’s momentum indicator by its trading volume creates a "momentum-volume" feature that reflects both price movement and activity levels.

Time-based features add another layer of depth. Extract details like the day of the week, month, quarter, or whether it’s earnings season. Markets often behave differently on Mondays compared to Fridays or during December compared to other months.

If you have access to alternative data, use it to measure sentiment, news flow, or unusual activity. For example, social media data can be analyzed to count positive and negative mentions over specific timeframes. These engineered features set the stage for selecting the most impactful predictors for your model.

Choosing the Most Relevant Features

Once you’ve built a diverse set of features, the next step is narrowing them down to the ones that truly matter. Feature selection is essential for improving your model’s accuracy and efficiency while avoiding overfitting – a problem where the model performs well on training data but struggles with new data [12].

Filter methods provide a quick way to screen features. These methods rely on statistical measures to rank features without requiring a model to be trained, making them computationally efficient [11]. For example, correlation analysis can identify features that move in sync with your target variable, while mutual information can uncover non-linear relationships that correlation might miss [9].

Wrapper methods take a more hands-on approach by testing different combinations of features using actual models. These methods split data into subsets and evaluate performance as features are added or removed [11]. Techniques like forward selection (starting with no features and adding them one by one) and backward elimination (starting with all features and removing the least helpful ones) can uncover feature combinations that boost performance, though they require more time.

Embedded methods blend the strengths of filter and wrapper techniques while keeping computational costs manageable [11]. For instance, Lasso regression can automatically select features by shrinking less important ones to zero, while Ridge regression reduces the influence of less critical features without eliminating them entirely.

The impact of smart feature selection can be substantial. In one study, feature selection improved the accuracy of K-Nearest Neighbors from 49% to 82%, a Decision Tree from 84% to 86%, and a Multi-layer Perceptron from 71% to 78% [11]. Another experiment showed overall model accuracy reaching 90%, with precision and recall each improving by 5.5% [11].

Dimensionality reduction techniques like Principal Component Analysis (PCA) offer another way to streamline features. PCA combines original features into new ones that retain key patterns while discarding noise [9].

Tree-based methods, such as random forests, can also be highly effective for feature selection. For example, using random forest feature selection improved model accuracy by 8% compared to stepwise selection [13].

By carefully selecting features, you not only simplify your model but also enhance its ability to predict market trends, building on the groundwork laid during data preparation.

Python Tools for Feature Engineering

Python

Python offers a variety of tools to simplify feature engineering for financial modeling. These libraries can save time and automate many of the tasks involved [14].

  • Scikit-learn: A go-to library for feature selection and engineering. Use tools like SelectKBest to pick top features based on statistical tests. For example, the mutual_info_regression function in Scikit-learn’s feature_selection module can calculate the mutual information score between each feature and the target variable [16]. Scaling tools like MinMaxScaler and StandardScaler ensure features are on a comparable scale [16].
  • Feature-engine: This library is tailored for feature engineering workflows. Unlike Scikit-learn, it works directly with dataframes, preserving column order and names – particularly useful for financial datasets where feature tracking is critical [15].
  • Featuretools: Ideal for automating feature creation, especially with time-based and relational data. It can generate lagged features, rolling statistics, and aggregations across time windows [14].
  • Tsfresh: Designed for time-series data, Tsfresh extracts hundreds of features and applies statistical tests to identify the most relevant ones [14].
  • ta: This package specializes in technical indicators like Bollinger Bands and MACD, which are often used in stock price prediction models [16].

Start with straightforward features like ratios and moving averages, then layer in more advanced ones. Be sure to test each feature’s impact on your model before adding it to the final set [9].

Choosing and Testing Models

Once you’ve fine-tuned your features, the next step is selecting the right predictive model for your financial data. The model you choose can have a big impact on the quality of your predictions, so it’s essential to understand the strengths and limitations of different approaches.

Picking the Right Predictive Model

Financial markets are known for their complexity, with patterns that can be challenging to decode. The ideal model for your needs will depend on your specific use case, the nature of your data, and the computational resources at your disposal. According to McKinsey, companies that leverage advanced analytics are 23 times more likely to acquire customers, 6 times more likely to retain them, and 19 times more likely to boost profitability [17].

  • Linear regression is a straightforward option, offering interpretability and low computational demands. It’s great for identifying basic relationships but often struggles with the non-linear dynamics typical of financial data.
  • Decision trees provide a visual, easy-to-understand approach, making them useful for tasks like credit scoring. However, they can overfit without proper tuning.
  • Support Vector Machines (SVM) excel in handling high-dimensional data and non-linear patterns using kernel functions. For instance, SVM has achieved an 87% accuracy rate in predicting stock market trends, compared to 75% for linear regression [17].
  • Deep learning models, particularly LSTM (Long Short-Term Memory) networks, are powerful for capturing intricate temporal patterns. One study reported a 90% accuracy rate for LSTM in forecasting stock price movements [17].

For time series data, models like ARIMA are particularly effective when dealing with seasonal trends and economic indicators.

When selecting a model, aim for a balance between accuracy and ease of interpretation.

Model Type Best Use Cases Accuracy Potential Interpretability Computational Cost
Linear Regression Basic correlations, quick insights Moderate High Low
Decision Trees Credit scoring, classification Moderate High Low
SVM High-dimensional data, non-linear patterns High Low High
LSTM/Deep Learning Time series forecasting, complex patterns High Low High

Once you’ve chosen a model, the next step is to evaluate its performance using metrics tailored to financial applications.

Measuring Model Performance

After selecting your model, it’s crucial to evaluate how well it performs in real-world financial decision-making. For regression tasks, metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared are commonly used. For classification tasks, such as predicting market movements, metrics like accuracy, precision, recall, and the F1-score are essential – especially when dealing with imbalanced datasets.

In financial applications, domain-specific metrics are equally important. The Sharpe ratio, for example, measures risk-adjusted returns, while Net Present Value (NPV) and Internal Rate of Return (IRR) are critical for investment decisions. Additionally, the Capital Asset Pricing Model (CAPM) helps assess expected returns relative to market risk [19]. Models evaluated using MAE and RMSE have shown a 15% improvement in accuracy compared to those relying solely on R-squared [20].

"Choosing the right evaluation metric is critical to a given project and can be a measure of the experience and maturity of those involved in it."
– Antonio Pedro Ramos, PhD, Research Scientist at José Luiz Egydio Setúbal Foundation [18]

Using multiple metrics ensures that your evaluations align with your business objectives.

Best Practices for Model Validation

A robust validation process is key to ensuring your model performs well on unseen data. Techniques like k-fold cross-validation (with at least five folds) provide a reliable assessment and can reduce prediction errors by up to 20% compared to simple train-test splits [20]. For time series data, it’s important to use cross-validation methods that respect the sequence of the data to avoid look-ahead bias.

Out-of-sample testing, where a holdout dataset is reserved for evaluation, offers a realistic measure of how the model will perform in the future. Additionally, backtesting, which simulates real-time performance by applying the model to historical data, is a valuable tool for validating trading strategies.

To keep models relevant, regular recalibration is essential. Combining predictions from multiple algorithms through ensemble methods can improve performance by as much as 25% [20]. Monitoring for data drift is equally important, as addressing drift proactively can enhance model robustness by approximately 30% in high-variance environments.

Other best practices include benchmarking against simple baseline models, documenting testing conditions for reproducibility, and incorporating feedback from end-users. Peer reviews and audits further strengthen the reliability of your solutions. Automated testing is another area where organizations see benefits – 92% report higher accuracy with automation compared to manual methods [20]. Setting up automated validation pipelines ensures consistent evaluation as you refine and improve your models.

sbb-itb-ae4776d

Building Models with Python

Once you’ve completed model selection and validation, Python becomes your go-to tool for integrating data preparation, modeling, and even incorporating alternative data sources to craft more accurate financial predictions.

Setting Up the Workflow

Python’s ecosystem is a treasure trove for financial modeling. Libraries like NumPy and SciPy lay the groundwork for mathematical and statistical computations. Pandas is indispensable for handling and analyzing data, especially time-series data, which is a cornerstone of finance. As Dan Buckley from DayTrading.com puts it:

"Python is a cornerstone in finance, offering a various of packages and libraries that cater to various financial analysis needs." [21]

For visualizing trends and insights, Matplotlib and Seaborn are your go-to tools. When it comes to machine learning, scikit-learn provides algorithms for predictive modeling, while statsmodels is great for statistical analysis and model fitting.

To get started, import the essential libraries:

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error, r2_score import statsmodels.api as sm 

For accessing financial data, tools like Quandl are invaluable, and libraries such as Zipline are excellent for algorithmic trading.

With your environment ready, you can move on to training and evaluating your model using the prepared data.

Training and Testing the Model

The modeling process begins with your cleaned and prepared dataset. Start by loading your data and using Pandas’ .describe() and .info() methods to evaluate its quality. Visual tools like Matplotlib and Seaborn can help you uncover patterns, outliers, and correlations, which are critical for feature selection. For time-series data, always split your dataset chronologically to avoid look-ahead bias.

Once you’ve identified the key features, follow a workflow like this to train your model:

# Split data chronologically train_size = int(len(data) * 0.8) train_data = data[:train_size] test_data = data[train_size:]  # Prepare features and target X_train = train_data[feature_columns] y_train = train_data['target'] X_test = test_data[feature_columns] y_test = test_data['target']  # Train model model = RandomForestRegressor(n_estimators=100, random_state=42) model.fit(X_train, y_train)  # Make predictions predictions = model.predict(X_test)  # Evaluate performance mse = mean_squared_error(y_test, predictions) r2 = r2_score(y_test, predictions) 

Evaluate your model using statistical metrics like mean squared error (MSE) and R². Depending on your goals, you might also consider financial metrics such as the Sharpe ratio or maximum drawdown.

Improving Predictions with ExtractAlpha

ExtractAlpha

After assessing your model’s performance, you can take it a step further by integrating alternative data sources to improve prediction accuracy. For instance, ExtractAlpha offers specialized datasets and signals that can significantly enhance your model. Their TrueBeats signal, for example, delivered 7% returns year-to-date with a 3.2 Sharpe ratio. Similarly, the Estimize signal yielded 6% returns with a 2.1 Sharpe ratio, while the Digital Revenue Signal achieved 5% returns with a 2.9 Sharpe ratio [24].

Vinesh Jha, CEO and founder of ExtractAlpha, explains their approach:

"We understand what you need – we are creators and quant consumers of data – and we offer research capabilities. Think of ExtractAlpha as your alt data research arm – focusing solely on identifying and delivering value found in datasets." [22]

Their research goes beyond raw data. For instance, their analysis of New Constructs‘ Core Earnings metric found a 48% autocorrelation with next year’s value, compared to just 31% for net income. This gap between Core Earnings and reported Net Income proved to be a strong predictor, with a long/short portfolio strategy based on this insight generating annualized returns of 10.1% and a Sharpe ratio of 1.44 between 2015 and 2021 [25].

To integrate these insights into your Python workflow, you can use ExtractAlpha’s APIs to access their datasets and incorporate the data into your model as additional features. As one portfolio manager in London remarked:

"We like testing your data because it’s clean, and we don’t have to go back and forth like we do with most other providers." [23]

Organizations that leverage predictive analytics in their forecasting processes report a 10–20% improvement in accuracy compared to those relying on traditional methods [1]. By combining Python’s robust modeling tools with ExtractAlpha’s alternative data, you can build financial models that deliver sharper predictions and better investment decisions.

Conclusion

Throughout our discussion on data sourcing, feature engineering, and model validation, we’ve highlighted the core elements that contribute to creating effective financial models. Predictive modeling takes raw financial data and transforms it into actionable insights by blending technical skills with practical business strategies.

Key Takeaways

  • Data quality is the foundation of reliability. As we’ve explored, ensuring data accuracy, completeness, and timeliness is essential for dependable predictions. Following solid data quality practices directly impacts the trustworthiness of your models [27].
  • Feature engineering drives performance. According to the Pareto principle, roughly 20% of features often account for 80% of a model’s performance [20]. Identifying and refining these key features can significantly boost your model’s accuracy.
  • Model validation avoids costly errors. Techniques like k-fold cross-validation reduce errors by up to 20% compared to simple train-test splits [20]. Additionally, ensemble methods can improve prediction accuracy by as much as 25% over individual models [20].
  • Ongoing monitoring keeps models effective. Regular updates to your models can improve forecasting accuracy by 10-15% [20]. In fast-changing markets, timely adjustments can enhance performance by nearly 25% [20].
  • Alternative data adds an edge. Specialized datasets, like those from ExtractAlpha, bring in alternative data signals that complement traditional metrics, giving your models a competitive advantage.

Next Steps for Financial Professionals

To elevate your predictive modeling efforts, start by clearly defining the business problem you aim to solve. This focus will guide your data collection and analysis efforts [26]. Implement strong data governance policies, ensuring secure storage and maintaining data integrity [27].

Transparency is key – document your assumptions, methodologies, and outputs thoroughly [28]. This practice becomes even more critical as your models grow in complexity and play a larger role in investment decisions.

Regularly revisit and update your models to align with evolving market conditions, regulatory changes, and economic trends [28]. A structured schedule for recalibration and performance reviews will help maintain accuracy and relevance.

Finally, integrating high-quality alternative data with tools like Python’s modeling capabilities and proper validation techniques can create a robust framework for financial forecasting. By applying the principles in this guide, you’ll be better equipped to enhance your investment strategies and make more informed decisions.

FAQs

What are the biggest challenges in cleaning and preparing financial data for predictive modeling, and how can they be solved?

Cleaning and preparing financial data often presents hurdles such as missing or inconsistent entries, outliers, and the overwhelming size and complexity of datasets. If left unresolved, these issues can compromise the accuracy of your predictive models.

Here’s how you can address these common challenges:

  • Handle missing data with imputation techniques, like filling gaps using averages, medians, or predictive algorithms.
  • Spot and manage outliers by using statistical methods or setting thresholds based on your specific domain knowledge.
  • Simplify large datasets by employing methods like dimensionality reduction (e.g., principal component analysis) and ensure all data is uniformly formatted.

By implementing these approaches and utilizing tools like Python libraries – such as pandas, NumPy, and scikit-learn – you can transform your financial data into a clean, dependable foundation for accurate modeling.

How can using alternative data improve the accuracy of financial predictive models, and what are some examples?

How Alternative Data Enhances Financial Predictions

Alternative data brings a fresh perspective to financial predictive models by providing insights that traditional data sources might miss. These unconventional data sets offer real-time information, uncovering trends and patterns that help analysts make sharper predictions.

For instance, satellite images can be used to estimate foot traffic at retail stores, while social media sentiment gives a glimpse into public opinion. Geolocation data helps track customer visits, and shipping or logistics data sheds light on supply chain activity. By blending these data sources into their analysis, financial experts can better predict market movements, evaluate company performance, and anticipate economic shifts – resulting in smarter investment decisions.

What should I consider when choosing a predictive model for financial data analysis?

When choosing a predictive model for financial data analysis, start by examining the characteristics of your data – its complexity, quality, and structure. The model you select should align closely with your specific objectives, whether that’s forecasting stock prices, predicting earnings, or spotting market trends.

Key considerations include the model’s interpretability, its capacity to handle large datasets, and its ability to adjust to shifting market dynamics. Additionally, assess the model’s accuracy, reliability, and its fit for your particular financial tasks. Striking the right balance between these factors will help you select a model that serves your needs well and aids in making sound decisions.

Related posts

More To Explore

Proprietary Trading Firms in Nebraska

Introduction Nestled in the heart of the Great Plains, Nebraska, known for its expansive landscapes and Midwestern charm, is witnessing a transformation in its economic

Alan Kwan

Alan joined ExtractAlpha in 2024. He is a tenured associate professor of finance at the University of Hong Kong, where he serves as the program director of the MFFinTech, teaches classes on quantitative trading and big data in finance, and conducts research in finance specializing in big data and alternative datasets. He has published research in prestigious journals and regularly presents at financial conferences. He previously worked in technical and trading roles at DC Energy, Bridgewater Associates, Microsoft and advises several fintech startups. He received his PhD in finance from Cornell and his Bachelors from Dartmouth.

John Chen

John joined ExtractAlpha in 2023 as the Director of Partnerships & Customer Success. He has extensive experience in the financial information services industry, having previously served as a Director of Client Specialist at Refinitiv. John holds dual Bachelor’s degrees in Commerce and Architecture (Design) from The University of Melbourne.

Chloe Miao

Chloe joined ExtractAlpha in 2023. Prior to joining, she was an associate director at Value Search Asia Limited. She earned her Masters of Arts in Global Communications from the Chinese University of Hong Kong.

Matija Ratkovic

Matija is a specialist in software sales and customer success, bringing experience from various industries. His career, before sales, includes tech support, software development, and managerial roles. He earned his BSc and Specialist Degree in Electrical Engineering at the University of Montenegro.

Jack Kim

Jack joined ExtractAlpha in 2022. Previously, he spent 20+ years supporting pre- and after-sales activities to drive sales in the Asia Pacific market. He has worked in many different industries including, technology, financial services, and manufacturing, where he developed excellent customer relationship management skills. He received his Bachelor of Business in Operations Management from the University of Technology Sydney.

Perry Stupp

Perry brings more than 20 years of Enterprise Software development, sales and customer engagement experience focused on Fortune 1000 customers. Prior to joining ExtractAlpha as a Technical Consultant, Perry was the founder, President and Chief Customer Officer at Solution Labs Inc. a data analytics company that specialized in the analysis of very large-scale computing infrastructures in place at some of the largest corporate data centers in the world.

Perry Stupp

Perry brings more than 20 years of Enterprise Software development, sales and customer engagement experience focused on Fortune 1000 customers. Prior to joining ExtractAlpha as a Technical Consultant, Perry was the founder, President and Chief Customer Officer at Solution Labs Inc. a data analytics company that specialized in the analysis of very large-scale computing infrastructures in place at some of the largest corporate data centers in the world.

Janette Ho

Janette has 22+ years of leadership and management experience in FinTech and analytics sales and business development in the Asia Pacific region. In addition to expertise in quantitative models, she has worked on risk management, portfolio attribution, fund accounting, and custodian services. Janette is currently head of relationship management at Moody’s Analytics in the Asia-Pacific region, and was formerly Managing Director at State Street, head of sales for APAC Asset Management at Thomson Reuters, and head of Asia for StarMine. She is also a board member at Human Financial, a FinTech firm focused on the Australian superannuation industry.

Leigh Drogen

Leigh founded Estimize in 2011. Prior to Estimize, Leigh ran Surfview Capital, a New York based quantitative investment management firm trading medium frequency momentum strategies. He was also an early member of the team at StockTwits where he worked on product and business development.  Leigh is now the CEO of StarKiller Capital, an institutional investment management firm in the digital asset space.

Andrew Barry

Andrew is the CEO of Human Financial, a technology innovator that is pioneering consumer-led solutions for the superannuation industry. Andrew was previously CEO of Alpha Beta, a global quant hedge fund business. Prior to Alpha Beta he held senior roles in a number of hedge funds globally.

Natallia Brui

Natallia has 7+ years experience as an IT professional. She currently manages our Estimize platform. Natallia earned a BS in Computer & Information Science in Baruch College and BS in Economics from BSEU in Belarus. She has a background in finance, cybersecurity and data analytics.

June Cook

June has a background in B2B sales, market research, and analytics. She has 10 years of sales experience in healthcare, private equity M&A, and the tech industry. She holds a B.B.A. from Temple University and an M.S. in Management and Leadership from Western Governors University.

Jenny Zhou, PhD

Jenny joined ExtractAlpha in 2023. Prior to that, she worked as a quantitative researcher for Chorus, a hedge fund under AXA Investment Managers. Jenny received her PhD in finance from the University of Hong Kong in 2023. Her research covers ESG, natural language processing, and market microstructure. Jenny received her Bachelor degree in Finance from The Chinese University of Hong Kong in 2019. Her research has been published in the Journal of Financial Markets.

Kristen Gavazzi

Kristen joined ExtractAlpha in 2021 as a Sales Director. As a past employee of StarMine, Kristen has extensive experience in analyst performance analytics and helped to build out the sell-side solution, StarMine Monitor. She received her BS in Business Management from Cornell University.

Triloke Rajbhandary

Triloke has 10+ years experience in designing and developing software systems in the financial services industry. He joined ExtractAlpha in 2016. Prior to that, he worked as a senior software engineer at HSBC Global Technologies. He holds a Master of Applied Science degree from Ryerson University specializing in signal processing.

Qayyum Rajan

Qayyum (“Q”) joined ExtractAlpha in 2024 as the head of a new division, EA Labs. Q is a data scientist recognized for his innovative work in fintech and venture building. Prior to ExtractAlpha, he founded Nuu Ventures, a venture studio that acquired and scaled startups with a focus on lean growth and strategic exits. Previously, he co-founded iComply Investor Services and ESG Analytics, leveraging AI to assess ESG performance. A recipient of British Columbia’s Top 30 Under 30 award, Q also serves on the Fintech Advisory Committee for the BC Securities Commission and is known for his commitment to disrupting traditional business models through technology.

Yunan Liu, PhD

Yunan joined ExtractAlpha in 2019 as a quantitative researcher. Prior to that, he worked as a research analyst at ICBC, covering the macro economy and the Asian bond market. Yunan received his PhD in Economics & Finance from The University of Hong Kong in 2018. His research fields cover Empirical Asset Pricing, Mergers & Acquisitions, and Intellectual Property. His research outputs have been presented at major conferences such as AFA, FMA and FMA (Asia). Yunan received his Masters degree in Operations Research from London School of Economics in 2013 and his Bachelor degree in International Business from Nottingham University in 2012.

Willett Bird, CFA

Prior to joining ExtractAlpha in 2022, Willett was a sales director for Vidrio Financial. Willett was based in Hong Kong for nearly two decades where he oversaw FIS Global’s Asset Management and Commercial Banking efforts. Willett worked at FactSet, where he built the Asian Portfolio and Quantitative Analytics team and oversaw FactSet’s Southeast Asian operations. Willett completed his undergraduate studies at Georgetown University and finished a joint degree MBA from the Northwestern Kellogg School and the Hong Kong University of Science and Technology in 2010. Willett also holds the Chartered Financial Analyst (CFA) designation.

Julie Craig

Julie Craig is a senior marketing executive with decades of experience marketing high tech, fintech, and financial services offerings. She joined ExtractAlpha in 2022. She was formerly with AlphaSense, where she led marketing at a startup now valued at $4B. Prior to that, she was with Interactive Data where she led marketing initiatives and a multi-million dollar budget for an award-winning product line for individual and institutional investors.

Jeff Geisenheimer

Jeff is the CFO and Head of Operations and Compliance at ExtractAlpha, directing our financial, operational, compliance, and strategic management. He previously served as CFO at Estimize and at two publicly traded firms, Multex and Market Guide. Jeff also served as CFO at private-equity–backed companies including Coleman Research, Ford Models, Instant Information, and Moneyline Telerate. He has also held roles as advisor, partner, and board member at Total Reliance, CreditRiskMonitor, Mochidoki, and Resurge.

Vinesh Jha

Vinesh founded ExtractAlpha in 2013 with the mission of bringing analytical rigor to the analysis and marketing of new datasets for the capital markets. Since ExtractAlpha’s merger with Estimize in early 2021, he has served as the CEO of both entities. From 1999 to 2005, Vinesh was the Director of Quantitative Research at StarMine in San Francisco, where he developed industry leading metrics of sell side analyst performance as well as successful commercial alpha signals and products based on analyst, fundamental, and other data sources. Subsequently, he developed systematic trading strategies for proprietary trading desks at Merrill Lynch and Morgan Stanley in New York. Most recently he was Executive Director at PDT Partners, a spinoff of Morgan Stanley’s premiere quant prop trading group, where in addition to research, he also applied his experience in the communication of complex quantitative concepts to investor relations. Vinesh holds an undergraduate degree from the University of Chicago and a graduate degree from the University of Cambridge, both in mathematics.

Subscribe to the ExtractAlpha monthly newsletter