Fixing Learning To Rank
Why LTR doesn't work and how to make it work
Introduction
A recent paper called “Learning To Rank” has gained quite a lot of attention, and rightfully so, it addresses a really interesting problem for statistical arbitrage managers - how to get your forecast to align with what you are actually trading. In reality, we are not purely concerned with mean squared error but instead with the relative performance of assets. So, to fight this, the paper proposes that we should be learning to rank assets and not learning to predict assets. This makes sense for statistical arbitrage managers as you will usually hold a portfolio with some assets long and some assets short. You care mostly about how they perform relative to one another, and not their absolute performance (both assets could go up 1000s of % but I only make money on the difference).
To tackle this, and get us closer to predicting the difference, the paper suggests that we implement one of 3 different options. It presents 3 LTR models:
LambdaMART
ListNet (Neural Network)
ListMLE (Neural Network
The issue is that NONE OF THEM WORK as they have described it. BUT I will explain how to fix this performance issue and why the version described in the paper underperforms the benchmark, but our improved version does significantly better than the benchmark.
Selecting A Model
The paper presents 3 options for models. Two are based on neural networks and one is a decision tree. From my testing it was immediately very clear that the best performing model was the decision tree. This is fairly expected and usually ends up being the case in the world of quant, especially when you are not working with high-frequency data. Thus, we go with LambdaMART.
LambdaMART gets its name from two ideas, LambdaRank - which is a smart way of computing gradients that directly optimize ranking metrics like NDCG, and MART (Multiple Additive Regression Trees) - gradient boosted decision trees.
LambdaMART works by sequentially building an ensemble of decision trees that optimise for a ranking objective. This does not directly predict returns, and instead focuses on learning a scoring function which orders assets from most attractive to least attractive - which in our case is the rank of forward returns. During the training process, LambdaRank computes gradients which encourage incorrectly ordered pairs of assets to swap places, with larger emphasis placed on swaps that improve ranking metrics (NDCG for example).
Why LambdaMART Fails
In many real world scenarios that use LTR models, this is an excellent objective. Unfortunately, this grossly misaligns with what actually generates PnL in a statistical arbitrage managers context.


