Researching HFT Strategies
How to work well with high frequency data
Introduction
This article is a mix of various thoughts I have about how to work well with HFT data and perform high quality quantitative research. HFT data is interesting to deal with because it’s cumbersome, slow, and often messy. You want to find out what happened but you need to dig through 20 different data types from your internal logs, and on top of that often the dataset is big enough to keep your server forever if you try to process any attempt at a multi-year backtest.
How should we approach things such that we end up somewhere productive?
Well, that’s roughly what I aim to talk about in this article. I can’t promise this will be a detailed tutorial on how to be an HFT researcher, you won’t get anywhere near that far from reading articles - in fact, you’ll need to get working with the data itself if you want to travel that far (and probably get a bit of mentorship along the way as we all tend to get), but I do think this article provides insights that can only be acquired through many years of working with the data (even if to truly become a pro you need to spend some time with the data), I would argue that a lot of this information would otherwise take ages of toiling around to figure out. Part of my professional experience has involved running research in HFT operations and as part of that I have gained insights into how to organize the research process in order for it to produce useful results. I do hope this article is useful for those in the industry who have to work with HFT data regularly. These are observations from my experience working in various HFT operations.


