All posts by Hugo Martay

This is a short and completely non-technical post that outlines our view on how to hedge a portfolio.

Our view is very simple and has 5 parts to it:

      • A portfolio manager takes 2 kinds of risk: stock risk, which is when your stocks go down for their own reasons, and factor risk, when you might lose money because the market crashes.
      • Stock risk is minimized by having a well diversified portfolio with not too much weight in any one stock.
      • A lot of the factor risk can be eliminated by hedging the market using an ETF or index future.
      • If you don’t have access to a risk-management utility, then we’d recommend hedging about 70% of your net position. For example, a $75M long amd $25M short portfolio would be hedged with roughly $35M short in a market future.
      • Do not be tempted to completely net out your position: $75M long, $25M short, with a $50M short hedge in a market index is not a good idea at all (unless your position is entirely large-cap names that are actually consituents of the index). If you dollar-neutral hedge, you’ll probably lose money if the stock market goes up because your stocks won’t go up as much as the market does.

We have checked that this rule holds both when the market is calm and during crashes. There is some variation in the ideal way to hedge from year to year, but it’s not true that during a crash you should make sure you’re dollar neutral.

The value of 70% comes from lots of what-if analysis and beta calculations, and seems pretty robust. It does vary slightly depending on what’s in your portfolio, the investment time, and what you’re using to hedge, but as a rule of thumb, it appears to work well in a lot of situations.



This summarises the general view that OTAS Technologies has toward hedging.


A previous post ( discussed the problem that the average beta of stocks in a typical portfolio is less than 1.

The consequence of this was that if you try to use a dollar hedge, you frequently end up with an overall short position — in other words, the dollar-hedged portfolio should be expected to lose money if the market goes up. The conclusion is that you should typically only hedge about 60%-70% of your dollar position, depending on the exact makeup of your portfolio.

However, you could worry that in times of crisis, the stocks might move together much more — if they’re driven by large-scale macro forces, then perhaps their correlations might go up.

I attempt to answer this question here by plotting average beta vs time. The article is somewhat technical, but the interesting result is in Figure 1, so feel free to scroll to that.




I picked the 1000 largest Euro-denominated (so that we don’t just measure currency vol) stocks and worked with their beta against the Eurostoxx-50 index, for the last 10 years. Each stock has its beta calculated in a moving (Gaussian) window with width 100 trading days, and the mean is taken.

Beta is hard to estimate based on too little data. This is because it involves dividing two quantities which are both products of returns. Given that stock returns tend to have quite noticeable outliers, the product of two returns (for instance the stock’s return multiplied by the market’s return), can vary wildly from day to day. If you keep outliers in the calculation, then the resulting beta is weighted heavily towards what happened to stocks on just one or two high volatility days, but if you take the outliers out (or clip them to a permissible range), then you don’t answer the question “what happens in high volatility conditions”.

So we need several months’ worth of data realistically, to get a handle on whether beta is currently high or low. There’s also a risk that the numbers we get will be specific to the methodology we choose. For instance, if, during a crash, all the stocks were to crash, but not all on the same day, then the 1-day returns might show low beta, but the 5-day returns might have a much higher beta.

The only way round this is to try lots of methodologies and see if they agree.



The average beta has been fairy constant over the last 10 years, and seems not to be particularly correlated to market volatility:


The beta fluctuates from year to year, but seems not to be directly related to volatility.

The uncertainty comes from assuming that the beta variation for a 10-day window is completely noise, and scaling the observed noise to the window used here (100 days).

If we try varying the return time, we get a similar shape, but shorter timescales have lower betas. This is completely expected if we take into account the short-term mean reversion: There is a slight tendency for stocks to revert from one day to the next, and although difficult to profit from, the effect is strong enough to increase correlations for longer timescales:



Shorter return periods give lower betas because of short-term mean-reverting price fluctuations.


Then, trying median beta rather than mean, and trying a less aggressive outlier reducing process:


Different methods give vaguely similar betas.

So it seems that at least for these 1000 stocks, the beta seems to be fairly unconnected to volatility.

One last test was to try the beta vs a home-made market (the largest 100 stocks in the same universe):


When a different definition of ‘Market’ is used, you get a comparable beta time series, but the details are different.



Beta did vary from year to year, and it seems to be significant – but the uncertainty in the beta estimate is difficult to estimate. There seems not to be a strong link between mean beta and volatility, though.


Underlying data courtesy of Stoxx. The Stoxx indices are the intellectual property (including registered trademarks) of STOXX Limited, Zurich, Switzerland and/or its licensors (“Licensors”), which is used under license. None of the products based on those Indices are sponsored, endorsed, sold or promoted by STOXX and its Licensors and neither of the Licensors shall have any liability with respect thereto.


This will be a short post discussing how risk is normally handled in finance and how OTAS uses them. Today I’ve been improving code that deals with them, so I thought I would write a bit about risk models.

The audience for this is either someone from outside finance who isn’t familiar with finance norms, or someone in finance who has not had the chance to study risk models in detail.

The definition of risk

Intuitively, the term ‘risk’ should be something to do with the chances that you will lose an uncomfortable amount of money. In the equities business it is normally defined to be the standard deviation of returns. So if in a given year, your portfolio makes perhaps on average £500k, but fluctuating so that perhaps on a bad year it loses £500k, or on a good year it makes £1.5M, your risk is probably about £1M.

This can catch people out – that the definition that is almost universally used (for equities) includes the risk of making lots of money as well as the risk of losing lots of money. You could make the argument that if the stock can go up by 10%, then it could go down 10% just as easily. You could imagine situations where that’s not true though: If 10 companies were bidding for an amazing contract that only one of them would win, then you’re more likely to make lots of money than lose it (if you buy shares in one company).

In fact, the reasons that standard deviation of returns is used is that it’s simple to calculate. That might sound as if it’s the technical teams making a decision to be lazy in order to make life easy, but actually trying to estimate risk in a better way is nightmarishly difficult – it’s not that the quant team would have to sit and think about the problem for *ages*, it’s that the problem becomes guesswork. Getting the standard deviation of a portfolio’s returns takes a surprisingly large number of data points in finance (because fat tails makes the calculation converge more slowly than expected), but getting a complete picture of how the risk works including outliers, catastrophic events, bidding wars, etc., takes far, far more data.

Since there isn’t enough data out there the missing gaps would have to be filled by guess work. And so most people stick to a plain standard-deviation based risk model.

Having a simple definition means that everyone agrees on what risk numbers are: If someone asks you to keep your risk to less than 5% per year, they and you can look at your portfolio and largely agree that a good estimate of risk would be under the threshold. Then most people can look at the actual returns at the end of the year and say whether or not the risk was under the 5% target.

How risk models work

Let’s accept that risk is modelled in terms of the standard deviation of portfolio returns. To estimate your realised risk, you just take how many dollars you made each day for the last year or so, and take a standard deviation. The risk model, though, is used to make predictions for the future.

The risk model could just contain a table of every possible or probable portfolio and the predicted risk for that portfolio, but it would be a huge table. On the other hand, that is a complete description of what a risk model does: It just tells you the risk for any hypothetical portfolio. We can simplify this a bit by noting that if you double a portfolio’s position, the risk must double, so don’t have to store every portfolio. In fact, similar reasoning means that if we have the standard deviation for N(N-1)/2 portfolios, we can work out the standard deviation for every portfolio.

Another way of saying the same is that all we need is the standard deviation for each stock, and the correlation between every stock and every other: If we know how volatile Vodafone is, and how volatile Apple is, and the correlation between them, then we can work out the volatility of any portfolio just containing Vodafone and Apple.

In the first instance, all you can do to predict the future correlation between two stocks is to look at their history – if they were historically correlated, we can say that they probably will be correlated in the future. However, we can probably do slightly better than that, and simplify the risk model at the same time using the following trick:

We make the assumption that the only reason that two stocks are correlated is that they share some factor in common: If a little paper manufacturer in Canada is highly correlated to a mid-sized management consultancy firm in Australia, we might say that it’s only because they’re both correlated to the market. Basically you have, say, 50 hypothetical influences, (known as “factors”) such as “telecoms”, or “large cap stocks” or “the market”, and you say that stocks can be correlated to those factors. You then ban the risk model from having any other view of correlation: The risk model won’t accept that two stocks are simply correlated to each other – it will only say that they’re both correlated to the same factors.

This actually helps quite a bit because the risk model ends up being much smaller – this step reduces the size of the risk model on the hard drive by up to 100 times, and it also speeds up most calculations that use it. If the factors are chosen carefully, it can also improve accuracy – the approximation that stocks are only correlated via a smallish number of factors can theoretically end up averaging out quite a lot of noise that would otherwise make the risk model less accurate.

What OTAS Tech does with them

OTAS Technologies uses risk models for correlation calculations, and for estimating clients portfolio risk, and for coming up with hedging advice. Risk models are also useful for working out whether a price movement was due to something stock-specific or whether it was to do with a factor move.


This post is less finance focused and more software-development focused.

At OTAS Technologies, we invest considerable time into making good software-development decisions. A poor choice can lead to thousands of man-hours of extra labour. We make a point of having an ecosystem that can accommodate several languages, even if one of them has to be used to glue the rest together. Our current question is how much Haskell to embed into our ecosystem. We have some major components (Lingo, for example) in Haskell, and we will continue to use and enjoy it. However it is unlikely to be our “main” language for some time. So it is largely guaranteed a place in our future, but still pacing out the boundaries of its position with us.

We started off assuming that python was going to simplify life. It was widely used and links in to fast C libraries reasonably well. This worked well at first, but there were significant disadvantages which lead us away from python.


Python’s first disadvantage was obvious from the start: Python is slow, several hundred times slower than C#, Haskell, etc., unless you can vectorise the entire calculation. For people coming from a Matlab or Python background, they would say that you can vectorise almost everything. In my opinion, if you’ve only used Matlab or Python, you shy away from the vast number of useful algorithms that you can’t vectorise. We find that vectorising works often, but in any difficult task, there’ll be at least something that has to be redone with a for-loop. It is true, however, that you can work round this problem by vectorising what you can, and using C (or just learning patience) for what you can’t.

Python’s second disadvantage though, is that large projects become unwieldy because they aren’t statically typed. If you change one object, you can’t know that the whole program still works. And when you look at a difficult set of code, there’s often nowhere that you can go to see what an object is made of. An extreme example actually came from code that went to a database, got an object and worked on that object. The object’s structure was determined by the table structure in the database. That makes it impossible to look at the code and see if it’s correct without comparing it to the tables in the database. This unwieldiness gets worse if you have C modules: Having some functionality that’s separated from the rest of the logic simply because it needs a for-loop makes it hard to look through the code.

Consequently, we only use Python now for really quick analysis nowadays. An example is to load a json from a file, and quickly plot a histogram of the market cap. For this task, it’s still good. There’s no danger of it getting too complicated. Nobody needs to read over the code later, and generally speed isn’t a problem.

The comparison process

The last week, though, has been spent comparing Haskell to C#. Haskell is an enormously sophisticated language that has, somewhat surprisingly, become easy to use and set up, thanks to the efforts of the Haskell community (Special thanks to FPComplete there). However, it’s a very different language to C#, and this makes it hard to compare. There is an impulse to try not to compare languages because the comparison can decay into a flame war. If you google X vs Y, where X and Y are languages, you come across a few well informed opinions, and a lot of badly informed wars.

There are several reasons for this, in my opinion. Often the languages’ advantages are too complicated for people to clearly see what they are and how important they are, and each language has some advantages and some disadvantages. A lot of the comparisons are entirely subjective – so two people can have the same information and disagree about it. The choice is often a psychology question regarding which is clearer (for both the first developer, and the year-later maintainer) and a sociology question about how the code will work with a team’s development habits.

There’s another difficulty that most of the evaluators are invested in one or the other language – nobody wants to be told that their favorite language, that they have spent 5 years using, is worse. We even evaluate ourselves according to how good we are at language X and how useful that is overall. A final difficulty is that there’s peer pressure and stigma and kudos attached to various languages.

So what’s the point? The task seems so difficult that perhaps it’s not worth attempting.

The point is that it’s the only way to decide between languages. And actually all the way through the history of science and technology, there have been comparisons between different technologies, and sure, the comparison’s difficult, but you can’t not compare things because the conclusions are valuable. The important thing is to remember that the comparison’s not going to be final, and that mistakes can be made, and that the comparison will be somewhat subjective, but that doesn’t make it pointless.


As a disclaimer, I am not a Haskell expert. I have spent a couple of months using Haskell, and I gravitate towards the lens and HMatrix libraries. I have benefited from sitting beside Alex Bates, who is considerably better at Haskell than I am.

Haskell has a very sophisticated type system, and the language is considered to be safe, concise and fast. For me, I suspect that I could write a lot of code in a way that makes it more general than I could in C# – Often I find that my C# code is tied to a specific implementation. You don’t have to: you could make heavy use of interfaces, but my impression is that Haskell is better for writing very general code.

Haskell also has an advantage when you wish to use only pure functions – its type system makes it explicit when you are using state and when you are not. However, in my experience, unintended side effects are actually a very minor problem in C#. Sure, a function that says it’s estimating the derivative of a spline curve *might* be using system environment variables as temporary variables. But probably not. If you code badly, you can get into a mess, but normal coding practices make it rare (in my experience) that state actually causes problems. Haskell’s purity guarantees can theoretically help the compiler, though, and may pay dividends when parallelising code – but I personally do not know. I personally reject the idea that state in itself is inherently bad – a lot of calculations are simpler when expressed using state. Of course, Haskell can manipulate state if you want it too, but at the cost of slightly (in my opinion) more complexity and in the 3 or so examples that I’ve looked at, slightly longer code too.

Haskell is often surprisingly fast – that often surprises non Haskell-users (often the surprise is accompanied by having to admit that the Haskell way is, in fact, faster). The extra complexity and loss of conciseness is something that better designed libraries might overcome. This is possibly accentuated for the highest level code: My impression is that lambda functions and the equivalent of LINQ expressions in Haskell produces less speed overhead in Haskell than in C#.

Another advantage of Haskell is safety – null reference exceptions are not easy to come by in Haskell, and some other runtime errors are eliminated. However, you still can get a runtime exception in Haskell from other things (like taking the head of an empty list, or finding the minus-1st element of an HMatrix array). On the other hand, exception handling is currently less uniform (I think) than C#, and possibly less powerful, so again, we have room for uncertainty.

Some disadvantages of Haskell seem to be that you can accidentally do things that ruin performance (for instance using lists in a way that makes an O(N) algorithm O(N^2) or give yourself stack overflows (doing recursion the wrong way). However, I know that these are considered to be newbie-related, and not serious problems. When I was using Haskell for backtests, I quickly settled on a numerics library and a way of doing things that didn’t in fact have either of these problems. However, when I look over online tutorials, it’s remarkable how many tutorial examples don’t scale up to N=1e6 because of stack overflow problems.

For me, perhaps the most unambiguously beneficial Haskell example was to produce HTML in a type-safe way. A set of functions are defined that form a domain-specific language. This library is imported, and the remaining Haskell code just looks essentially like a variant of HTML – except that it gets checked at compile time, and you have access to all the tools at Haskell’s disposal. You could do that in C#, but it would not look pretty, and as far as I know, it’s not how many people embed HTML in C#.

But we are just starting the comparison process, and it will be interesting in the coming days, months and years to find out exactly what the strengths and weaknesses of this emerging technology are.

In a future blog post, we’ll write about the areas in which the two win and lose on. But for now, it’s better to leave it unconcluded.

The dollar hedge is used throughout finance, and it is a bad idea.

To see why dollar hedges are used so widely, look into the reasons that people use them in the first place.

An uninformed investor might decide to invest in a stock because they want to invest in the stock, and partly because they have an understanding that the equity market as a whole should increase in value, and so they are quite happy to have exposure to both stock and market.

The dollar hedge is where you make sure that the total net dollar value of your portfolio is zero, by shorting or buying a hedging instrument such as a market future in order to balance your other positions.

The dollar hedge is where you make sure that the total net dollar value of your portfolio is zero, by shorting or buying a hedging instrument such as a market future in order to balance your other positions.

As the investor’s portfolio widens to more stocks, the market exposure adds up steadily, whereas the exposure to individual stocks tends to be diluted by diversification: If you have 1000 bets on 1000 stocks, the chances of losing money because they all happen to be under-performing stocks is small, but the chance of losing money because you’re long on all of them and the market crashes is very high. So if the aim is not to get market exposure, the investor can limit their risk by taking an opposing position in something like an index future or ETF. This is particularly important if there is a net long or short bias in their stock-positions.

It seems intuitive that if the investor has $1 M long in equities, then they should have roughly $1 M short in the index future. After all, the market future is meant to represent the market as a whole. But there is a problem: This almost always over hedges. This can be explained in terms of the average beta of your portfolio.

The beta of a stock to an index is how much you expect it to respond to movements in that index. A beta of 100% means that if the index goes up 5%, then the stock will also go up by 5% on average. A beta of 50% to an index going up 5% would mean only 2.5% expected rise in that stock. On average the beta should be 100% – but *only* if the stock is one of the index constituents. In fact, the most traded indices (or futures / ETFs on those indices) have quite a small list of constituents (for instance the FTSE 100 or Eurostoxx 50), and if you’re trading outside that small list (which you typically will), the average beta does *not* have to be equal to 100%, and in fact is normally lower.

OTAS Technologies makes extensive use of in-house risk models. We noticed this effect when we found that the average beta for a range of sensible portfolios was significantly less than 100% to the Eurostoxx 50. We spent some time fixing things, putting checks in place, and analysing our smoothing and data cleaning processes. Useful though that was, ultimately the effect was real.

We suspect that the effect is partly due to simple maths: different things drive different stocks, and a random stock outside the FTSE 100 will not necessarily be pushed around by the same thing as the FTSE 100. Then there’s a capitalization bias: These indices focus on large cap names, whereas the average portfolio might not. Then there’s also the possibility that the market indices drive themselves: Because they’re considered to be a proxy for the market, they get traded by people who take large macro views, and perhaps that causes their constituents to behave subtly differently to the average stock.

The effect on the average portfolio manager of getting this wrong can be stark. The beta hedge is guaranteed (if the beta is calculated correctly) to reduce the risk of the portfolio, but the dollar-neutral hedge is not. We have seen extremely plausible portfolios where in fact the dollar-neutral hedge *increases* risk by even more than the beta-neutral hedge decreases it. The most important effect, though, is for portfolio managers who tend to have long ideas and so end up with a short hedge. If they pick a dollar-neutral hedge, they will have an overall short exposure to the market. This will increase their risk, and get them negative drift (assuming that the market has a slight long-term upward drift). This is frequently the cause of the complaint that “We had good positions today, but the market went up and we lost money overall on the hedge”. It’s certainly the case that a well-hedged book can lose money if the market goes up. But on average, if the portfolio is well balanced, using beta- not dollar-neutral hedging, the portfolio will not normally have down days simply because the market went up.

To summarise, there’s good news and there’s bad news. The good news is that you don’t need to hedge as much as dollar-neutral. The bad news is that you might currently be short.


A plot of all the stocks denominated in euros in the OTAS universe, sorted by market cap. The y-axis is their beta to the Eurostoxx-50 index futures. You can see that some stocks have a beta well above 100%, but the majority are significantly below 100%, as shown by the red and blue averaging lines. The red and blue are to the future and to the ETF respectively. (The calculation used almost 6 years of returns from 2010, 5-day returns, and some winsorizing.)

In part 2, coming soon, we’ll hopefully look at some specific examples of when the dollar hedge has messed up a portfolio’s risk profile and performance.



Underlying data courtesy of Stoxx. The Stoxx indices are the intellectual property (including registered trademarks) of STOXX Limited, Zurich, Switzerland and/or its licensors (“Licensors”), which is used under license. None of the products based on those Indices are sponsored, endorsed, sold or promoted by STOXX and its Licensors and neither of the Licensors shall have any liability with respect thereto.

This follows an earlier post which described why a new entrant into quantitative finance should expect it to be difficult to find a strategy to trade.

Let us assume that the reader has chosen a data set, but hasn’t finalised the details. The next step would be to simulate the trades the program would have made historically, and see whether it would have made money. This is called a backtest, and it is an essential part of systematic trading.

The easiest way to do this badly is through inaccuracy: The backtest might not reflect how the strategy would actually have traded.

Inaccurate backtests

The most common mistakes to make in this scenario are the following:

  • Free rebalancing: The backtest allows the strategy to stay at, for instance, $1M long on Apple from day to day without incurring trading costs. You might think that this would not make much difference, but it does. It gets the strategy free access to a mean reversion alpha (if the price goes up, then the strat have to sell the stock to rebalance and vice versa), and will make money if the strategy is long on average. Unless you treat rebalancing costs accurately, your backtest won’t be accurate. For the more technical reader, note that a simple regression is equivalent to a taking a free-rebalancing strategy’s z-score, so treat simple regressions with suspicion.
  • Forward looking: The backtest uses data that would not be available until after the trade that depends on it. This is normally just a programming bug.
  • The universe was chosen based on current observables. For instance, picking the current largest 100 stocks as your universe. None of the current largest 100 stocks have suffered a calamitous collapse in the last few years, or they wouldn’t be in the largest 100, so you’ve post-selected a biased universe. This will make your strategy look like it would have made money when it wouldn’t have. This is a form of forward looking.
  • T+0: Backtests often use data available at the close of day x to trade at the close price of day x. This sounds like just a tiny change from an intraday strategy that trades with a 5 minute delay, but a lot of strategies appear work at T+0 and not with better assumptions. If the strategy isn’t evaluated with at least the delay that the trading strategy would have, then its predicted performance is not valid.
  • The cost model is wrong. People tend to underestimate trading costs. Often, the trading costs are assumed to be negligible, or it is assumed that once they’ve found a strategy that works, the way it trades can be manipulated to minimise the costs. That is comparable to perfecting the way that you drive 42km, and assuming that that’s a valuable way to prepare for a marathon. The only exception to that rule is ultra-long-term strategies.

These should be avoided, since they’re all problems that can be solved with care. Please note that they all normally make the backtest appear better than real life, not just different.

What a backtest gives us

A backtest provides the reader with a graph of profit/loss (P&L) and other statistics relating to the strategy’s trading patterns, risk profile and performance. The most important statistic, though, is the level of statistical confidence that the average daily profit, after costs, is positive.

This concept can catch people out. It is not enough that the strategy had a positive average daily profit over the backtest, because the backtest P&L might have just been lucky. Instead we use the average daily profit divided by its uncertainty (standard error), and call that a z-score.

The z-score: why P-values are normally wrong

This z-score is often converted to a probability value — which can be interpreted as the probability that a similar but alpha-neutral strategy would do better. However, the conversion is usually done using dodgy assumptions. The usual way to calculate them is to use “normal” or “Gaussian” statistics to do the conversion (and assume no correlation and other things). thermometer_500 However, this is not the only way to convert from a z-score to a P-value and if you use a more accurate way, you’ll get a different value, as shown on the right.

Basically if someone converts from a z-score to a P-value using the middle thermometer (normal distribution), they’re simply wrong. If they use a different thermometer (there are lots), then you have to get stuck in to all the assumptions and distributions that they’re using to make that conversion.

The z-score game:
There are a few pictures below. Each is split into three. The first is a raw image. Either the 2nd or 3rd will also contain the raw image, but both will contain a lot of noise on top. Your task is to guess which has the raw image in it. We’ll start off with an easy example, with a z-score of 36.6.


You probably guessed correctly: the middle image has the attractor in it, the rightmost is just noise. Now try for these slightly harder examples:





This problem is not easy. In fact, for this task, it’s fairly likely that you got it wrong. The correct answers are at the end of the blog post. The challenge is slightly unfair: the kind of noise that I’ve added makes large z-scores somewhat more likely than the Laplace distribution would (right thermometer). But a non-technical reader should be able to see that if the noise is badly behaved, then surprisingly large z-scores can happen fairly often by chance.

So, to conclude the last two sections, P-values are normally wrong, and z-scores can be quite noisy: a z-score of 1.5 doesn’t necessarily mean anything.

The importance of this is that the z-score is both our ranking system and our measure of certainty that the thing will make money.

In-sample and Out-sample.

Most strategies will not be optimal from the start, so most research programs allow for some tweaking of the strategy. For instance we might have a threshold for which we would try five different values, a holding period which with four different possibilities, and a couple of other options, like take-profit / stop-loss conditions, and three cap weighting scenarios. The strategy doesn’t seem to work that well for our initial idea, but we try a few others. We have now tried the equivalent of almost 1000 different combinations. Using our image-hidden-in-noise example, we can also try 1000 noise layers, selecting the one with the highest z-score, and we are now faced with the following figure:


So, which is the “correct” one? Well we couldn’t tell before and we can’t really tell now. It might even be that neither has the mysterious attractor in it. If we use the z-score to get some measure of how likely it is that they contain the attractor, it says “yes” to both: they’re both quite impressive z-scores. The problem is that we know that they only appear to have a high z-score because we kept looking until we saw something that appeared to be good.

For this reason, an in-sample/out-sample approach is taken where the date range is split in two: The first half of the date range is chosen to be the in-sample period. The strategy is optimised in that period. Later, when we want a final evaluation to see if we would trust our strategy to make money, we evaluate its performance in the second half. Since we didn’t fit to the 2nd half, we can’t have over fit to it either.

A common mistake to make here, though, is to try the out-sample period several times: we try one strategy in the in-sample period, and it appears to do quite well, but doesn’t work out-sample. In that case, often, the strategy is abandoned, and the researcher goes back to the drawing board. A different idea is then chosen, and the research program is repeated. The problem now is that if the 2nd or 3rd research programs all use the same out-sample period, then we’ve over used the out-sample period. We might get this figure if we tried 5 research programs:


The key point is that they both look really good (even though at least one isn’t). We would trade either strategy with these z-scores.

Another quick point, is that strategies can just stop working, which will mean that even good statistics properly done doesn’t guarantee that the strategy will work.


It is very important to backtest accurately if we want to avoid wasting time and money on strategies that don’t work in the real world. Backtests give you a z-score which is a measure of certainty that the strategy makes money. The z-score is often converted to a P-value, and the conversion is almost always done wrong: Z-scores have a much larger random range than -1.5 to +1.5. If you try to optimise your strategy, then be aware that you won’t be able to tell which is better, and that the optimised strategy will appear to be better than it is. For this reason, people split their date range into an in-sample period and an out-sample period. Unfortunately it’s almost impossible to avoid over using your out-sample period, which invalidates it.

The good news:

There are several respects in which the picture has been painted as more bleak than it is.

The main one is that how unreliable a z-score is depends on the type of noise you get, and in finance, the noise (for instance the performance of a well-placed bet that turns out badly) isn’t as bad as presented above. It all depends on three things: how correlated the noise is (very correlated in the pictures above, but not that correlated from one day to the next in finance), how fat tailed the noise is (not very in the example above, but quite fat tailed in finance), how much volatility clustering you have (plenty here and in strategy P&Ls). The z-scores in the images above tend to be wrong by around 4 either way, whereas in finance, it’s probably somewhere more like 2. The right-most (Laplace distribution) thermometer in the top graph is probably about right though.

The second one is that even if it’s not possible to be certain that a strategy makes money, a high z-score is still correlated to some extent with strategy performance: A z-score of 3 certainly doesn’t guarantee that the strategy will make money, but it’s more likely to make money than a strategy with a z-score of 0.
To reiterate a point in the previous blog post, it all comes down to the economics: if a strategy has a sound reason behind it, then that counts as additional evidence that the strategy will make money. You should still do backtests, and you should still pay attention to the results, but you should have both in order to have a well informed view of your chances.


a: left, b: left, c: left, d: left, e: right, f:right, g: right

Systematic trading is a great way to practice financial statistics. However, it is not an easy way to make money.

The idea is alluring because it seems to only depend on being able to program and having a good idea, neither of which is that hard. Unlike manual trading, a computer strategy isn’t affected by bad judgment or emotion, and it can process far more data than a human can, at least in terms of the sheer number of numbers it can look at. The main point, though, is that its historic performance can be looked at before deciding to invest — which should at least be an indication of the performance that it should realise when the strategy is live.

Although systematic trading has these advantages, it is best to start any research program with a realistic impression of the difficulties that will haunt it, which is why this post will go through a few of the problems that you might hit when doing this.

Orthonormal basis sets

Orthonormal basis sets are a sophisticated way to decompose market returns into trading signals. However, they’re not new and they’re not likely to work for you.

Before fast computers with good internet connections were commonplace, the people who had the capability to trade systematically really did find it comparatively easy to make money. Back then there were lots of correlations that were reasonably predictable and which could be monetized. The problem was that as more and more time and money went into the industry, they saturated the easier and then the not-so-easy strategies, and the opportunities went away. It is a general rule that if there are people who make money from a strategy, and they have enough capital, they’ll saturate the strategy until they can’t make more money by putting more capital into it. Then if someone else comes in, they obviously won’t make money off the strategy unless they’re better at exploiting it.

This leads to two inescapable conclusions: To make money from a particular strategy, you need to be either better than the other people looking at it, or the only person trying it. “Better” here means that you have better, faster data, more robust IT systems, better risk models and risk controls, better cost models and controls, and more experienced people helping you. In that case, you are playing a different game, and this article will not be as relevant to you.

For everyone else, your best hope is to pick a strategy that nobody else is using. This means either using new data that nobody else is using, or to use commonly used data in a new way.

New Data

Using new data is actually a good recommendation. There’s as much data about as there is stuff in the world, and new data could be anything from the results of phone surveys to weather forecasting. You can even buy and use data on which private jet landed where, and put that into a strategy. The problem is that some data isn’t that useful, some data is expensive, and some data is time consuming and fiddly to use. Some data would appear to be new, but on close examination might turn out to be surprisingly well used in other forms: News sentiment is popular at the moment, but it’s wise to remember that it’s giving you less data, not more, than the original news feeds, which are actively used for trading anyway. Even following the holiday movements of CEOs around is all very well, but there will be portfolio managers who know the CEOs personally, and they’ll have far more information than a timeseries of airports.

But there is genuine potential in new data, if the data bears up to scrutiny.

Old Data

A well known approach is to look at price graphs to try to see patterns or things that will help you guess where the price is going. But prices, as well as volumes, debt, earnings, profits, and many many more data sets make up a common body of data that a very large number of quant houses have access to, and use as much as they can. For this kind of data, you need to do something that both (a) nobody else is doing with the data, and (b) has an economic reason for why it works (otherwise it’ll just be a spurious correlation, and it’ll die away).

(a) You should have reason to think that nobody is running the same strategy with the same data. The strategy should be truly original, and not just a really complicated way of doing what’s been done before. A more complicated strategy might have lots of little catches, for instance having different behaviour if the volume is high and the price is high, unless the company is heavily indebted. Or the complication might be mathematical sophistication. You could convolve hourly returns with vectors taken from the hydrogenic wave function, and the resulting vectors regressed against the forward return. You might be right that few people have tried exactly that, but you shouldn’t assume that the space of linear kernels applied to prices hasn’t been thoroughly explored, because it has. Either way, just taking an obvious strategy and adding complexity to it probably won’t make it all that different, yet it will make it harder to exclude spurious statistical signals: your search space will be larger so the number of spurious false positives will be larger.

Combining several data sets is another way to come up with something new, but the same effect that means that there are explosively more strategies out there using multiple data sets also makes it explosively harder to be sure that your combination isn’t spurious. And the economically obvious combinations will all be extensively looked at already, as well as a fair few which aren’t economically obvious.

(b) The strategy has to be economically rational. If there is no reason behind it, it won’t work. Prices move because of information and people trading. There’s no secret black box somewhere which contains a weird set of rules to be uncovered. If it seems to work that prime-numbered days of the year always have positive returns, it only seems like that because of a coincidence. If you could get complete statistical certainty that an economically unusual strategy works, then there would be reason to believe it works. However, if that happens, then it would probably be because of messed up statistics: It’s very difficult to get statistical certainty in finance. Achieving statistical confidence in finance will be a topic for another blog post, and the topic is very important — in some ways it’s the main reason that it’s hard to find new working strategies on old data — it might be argued that a given set of data has only a small number of orthogonal strategies that can be shown to work statistically significantly, and as you open up your search space, you will find rehashes of simpler strategies, along with more and more spurious strategies.

It’s not all bad though:

This has described the initial difficulties in finding a data set on which to base a good strategy. There are factors which make it slightly less bleak than it could be:

1) There are lots of potential strategies that won’t be fully exploited if they’re too small for the big fish to care about. A strategy that only makes a few tens of thousands of pounds a year won’t be interesting to many quant houses.

2) The turnover is high: People come and go. Hedge funds come and go. People will stop looking at a given strategy for more reasons than just that it doesn’t make money any more.

3) Quants don’t talk to each other as much as people do in other industries. And that guardedness means that there isn’t a vast body of knowledge on which a new hedge fund can build. This is good news for small fish, since it means that they’re not starting the race with much of a handicap, at least intellectually.


In conclusion, systematic strategies have to be based on either new data or genuinely new methods, both of which pose their own problems. This is subtly different from good old human decision-making strategies. People have access to a wealth of information not available to a computer: They have seen the new smart phone with the same eyes that will choose whether or not to buy it, and determine whether it will be a success. People also understand the underlying factors that make a company successful far better than any current computer system. It is very likely that most good human stock pickers will have a genuine economic insight that, when informed by the right data and analytics, will be both rational and somewhat unique. The crucial difference is that they are using a much more diverse set of data to make each decision, and they’re not choosing a way to combine the data on statistical inferences between the data and previous price movements, but rather understanding the significance of each data point in isolation.

Hugo Martay