The Quant Stack

The Quant Stack

Share this post

The Quant Stack
The Quant Stack
Automating Alpha Pt.2 - Best Practices
Copy link
Facebook
Email
Notes
More

Automating Alpha Pt.2 - Best Practices

Best Practices, Tips, and Tricks For Automating Alpha Discovery

Quant Arb's avatar
Quant Arb
Jun 18, 2024
∙ Paid
13

Share this post

The Quant Stack
The Quant Stack
Automating Alpha Pt.2 - Best Practices
Copy link
Facebook
Email
Notes
More
Share

Introduction

In the previous article, we provided a very high level run-through for automated alpha generation. We’ll continue down the path of a Birds Eye view in this article, but the next will be much more code focused. It’s taken a fairly long while because of the heavy focus on code, so hence a later part of the series.

I think a bit of code is useful for getting people experimenting faster, which of course, is the only real way to grasp any of what I tell you to the fullest extent, but it’s hard to convey high-level ideas across code, such as mental frameworks for approaching the challenge itself.

Index


  1. Introduction

  2. Index

  3. Input Data Normalization

  4. Tuning of Selection Likelihoods

  5. Known Strategies as Base Alphas

  6. Too Many Inputs & Unrelated Data

  7. Automated Alpha is Great For Bad Researchers

  8. Breadth Beats Depth

  9. Exponential Temporal Weighting

  10. The Many Outweigh The Few

  11. Wholistic Perspective

Input Data Normalization


It is important to ensure that our data has been properly normalized before we use it as input data for our genetic algorithm. This means that we should ensure that our data is homogenous and comparable. Raw price and financial data are not comparable. From high - low, this would mean that stocks with a larger price on average will have a much larger score. This adds additional noise to the process and makes it harder for us to work with.

We can choose to only work with returns, but this again limits us by preventing us from using perfectly acceptable formulas such as:

\(\alpha = \frac{(high-low)}{close} \)

This is fine because we are normalizing it to ensure that our prices do not scale with the overall value of close, which is something we, as the researcher know, is not a place to find any real alpha at all. Thus, we embed this rule to ensure that the algorithm knows it as well.

For financial data, we can adjust with the market capitalization, or simply the price if it is already on a per-share basis.

This leaves us with 4 options:

  1. Unconstrained optimization, and we simply assume that alphas like high - low will evolve into a normalized version of themselves in the algorithm’s attempt to reduce noise.

  2. Forcefully normalize everything, and use close-to-close returns (instead of close) or P/E ratio (instead of profits).

  3. Use normalized-based alphas instead of the raw data. I.e. for High, we turn it into High/VWAP, where VWAP is our chosen normalization variable for all raw price-based input data sources. Under this model, all raw price inputs must be divided by VWAP as it is our normalization variable.

Option 3 is by far the best option here, where we set a default normalization source for every single type of raw data that we have. If we have the equation:

\(\alpha = high - low\)

It then becomes:

\(\alpha = \frac{high}{VWAP} - \frac{low}{VWAP} = \frac{(high-low)}{VWAP}\)

This is a great solution because the VWAP will eventually factor out if we introduce something like a divide by close to the equation:

\(\alpha = \frac{\frac{(high-low)}{VWAP}}{\frac{Close}{VWAP}} = \frac{(high-low)}{Close}\)

Normally, we would have to increase the height of the tree from 2 layers (bottom layer is high and low, top layer is subtract) to 3 layers:

When all we are really doing is normalizing the equation, and not attempting to add any complexity here. However, if we decided to deviate from the default standard normalization metric that all the raw price data inputs have to use (VWAP), and used Close to divide it instead, then we would end up with 3 layers:

Thus, when we are counting the height of the tree, we should ignore it if it meets the following criteria:

  1. Division operation

  2. Denominator is a default normalization variable

  3. All numerators are in the raw input group associated with the denominator’s default normalization variable status

Thus, we can normalize our alphas without penalizing the search process for this normalization, without strictly forcing them into one choice for normalization if it decides it’s worth increasing the complexity.

Tuning of Selection Likelihoods


This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Quant Arb
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More