Systematic trading is a great way to practice financial statistics. However, it is not an easy way to make money.
The idea is alluring because it seems to only depend on being able to program and having a good idea, neither of which is that hard. Unlike manual trading, a computer strategy isn’t affected by bad judgment or emotion, and it can process far more data than a human can, at least in terms of the sheer number of numbers it can look at. The main point, though, is that its historic performance can be looked at before deciding to invest — which should at least be an indication of the performance that it should realise when the strategy is live.
Although systematic trading has these advantages, it is best to start any research program with a realistic impression of the difficulties that will haunt it, which is why this post will go through a few of the problems that you might hit when doing this.
Before fast computers with good internet connections were commonplace, the people who had the capability to trade systematically really did find it comparatively easy to make money. Back then there were lots of correlations that were reasonably predictable and which could be monetized. The problem was that as more and more time and money went into the industry, they saturated the easier and then the not-so-easy strategies, and the opportunities went away. It is a general rule that if there are people who make money from a strategy, and they have enough capital, they’ll saturate the strategy until they can’t make more money by putting more capital into it. Then if someone else comes in, they obviously won’t make money off the strategy unless they’re better at exploiting it.
This leads to two inescapable conclusions: To make money from a particular strategy, you need to be either better than the other people looking at it, or the only person trying it. “Better” here means that you have better, faster data, more robust IT systems, better risk models and risk controls, better cost models and controls, and more experienced people helping you. In that case, you are playing a different game, and this article will not be as relevant to you.
For everyone else, your best hope is to pick a strategy that nobody else is using. This means either using new data that nobody else is using, or to use commonly used data in a new way.
Using new data is actually a good recommendation. There’s as much data about as there is stuff in the world, and new data could be anything from the results of phone surveys to weather forecasting. You can even buy and use data on which private jet landed where, and put that into a strategy. The problem is that some data isn’t that useful, some data is expensive, and some data is time consuming and fiddly to use. Some data would appear to be new, but on close examination might turn out to be surprisingly well used in other forms: News sentiment is popular at the moment, but it’s wise to remember that it’s giving you less data, not more, than the original news feeds, which are actively used for trading anyway. Even following the holiday movements of CEOs around is all very well, but there will be portfolio managers who know the CEOs personally, and they’ll have far more information than a timeseries of airports.
But there is genuine potential in new data, if the data bears up to scrutiny.
A well known approach is to look at price graphs to try to see patterns or things that will help you guess where the price is going. But prices, as well as volumes, debt, earnings, profits, and many many more data sets make up a common body of data that a very large number of quant houses have access to, and use as much as they can. For this kind of data, you need to do something that both (a) nobody else is doing with the data, and (b) has an economic reason for why it works (otherwise it’ll just be a spurious correlation, and it’ll die away).
(a) You should have reason to think that nobody is running the same strategy with the same data. The strategy should be truly original, and not just a really complicated way of doing what’s been done before. A more complicated strategy might have lots of little catches, for instance having different behaviour if the volume is high and the price is high, unless the company is heavily indebted. Or the complication might be mathematical sophistication. You could convolve hourly returns with vectors taken from the hydrogenic wave function, and the resulting vectors regressed against the forward return. You might be right that few people have tried exactly that, but you shouldn’t assume that the space of linear kernels applied to prices hasn’t been thoroughly explored, because it has. Either way, just taking an obvious strategy and adding complexity to it probably won’t make it all that different, yet it will make it harder to exclude spurious statistical signals: your search space will be larger so the number of spurious false positives will be larger.
Combining several data sets is another way to come up with something new, but the same effect that means that there are explosively more strategies out there using multiple data sets also makes it explosively harder to be sure that your combination isn’t spurious. And the economically obvious combinations will all be extensively looked at already, as well as a fair few which aren’t economically obvious.
(b) The strategy has to be economically rational. If there is no reason behind it, it won’t work. Prices move because of information and people trading. There’s no secret black box somewhere which contains a weird set of rules to be uncovered. If it seems to work that prime-numbered days of the year always have positive returns, it only seems like that because of a coincidence. If you could get complete statistical certainty that an economically unusual strategy works, then there would be reason to believe it works. However, if that happens, then it would probably be because of messed up statistics: It’s very difficult to get statistical certainty in finance. Achieving statistical confidence in finance will be a topic for another blog post, and the topic is very important — in some ways it’s the main reason that it’s hard to find new working strategies on old data — it might be argued that a given set of data has only a small number of orthogonal strategies that can be shown to work statistically significantly, and as you open up your search space, you will find rehashes of simpler strategies, along with more and more spurious strategies.
It’s not all bad though:
This has described the initial difficulties in finding a data set on which to base a good strategy. There are factors which make it slightly less bleak than it could be:
1) There are lots of potential strategies that won’t be fully exploited if they’re too small for the big fish to care about. A strategy that only makes a few tens of thousands of pounds a year won’t be interesting to many quant houses.
2) The turnover is high: People come and go. Hedge funds come and go. People will stop looking at a given strategy for more reasons than just that it doesn’t make money any more.
3) Quants don’t talk to each other as much as people do in other industries. And that guardedness means that there isn’t a vast body of knowledge on which a new hedge fund can build. This is good news for small fish, since it means that they’re not starting the race with much of a handicap, at least intellectually.
In conclusion, systematic strategies have to be based on either new data or genuinely new methods, both of which pose their own problems. This is subtly different from good old human decision-making strategies. People have access to a wealth of information not available to a computer: They have seen the new smart phone with the same eyes that will choose whether or not to buy it, and determine whether it will be a success. People also understand the underlying factors that make a company successful far better than any current computer system. It is very likely that most good human stock pickers will have a genuine economic insight that, when informed by the right data and analytics, will be both rational and somewhat unique. The crucial difference is that they are using a much more diverse set of data to make each decision, and they’re not choosing a way to combine the data on statistical inferences between the data and previous price movements, but rather understanding the significance of each data point in isolation.