This post is less finance focused and more software-development focused.
At OTAS Technologies, we invest considerable time into making good software-development decisions. A poor choice can lead to thousands of man-hours of extra labour. We make a point of having an ecosystem that can accommodate several languages, even if one of them has to be used to glue the rest together. Our current question is how much Haskell to embed into our ecosystem. We have some major components (Lingo, for example) in Haskell, and we will continue to use and enjoy it. However it is unlikely to be our “main” language for some time. So it is largely guaranteed a place in our future, but still pacing out the boundaries of its position with us.
We started off assuming that python was going to simplify life. It was widely used and links in to fast C libraries reasonably well. This worked well at first, but there were significant disadvantages which lead us away from python.
Python
Python’s first disadvantage was obvious from the start: Python is slow, several hundred times slower than C#, Haskell, etc., unless you can vectorise the entire calculation. For people coming from a Matlab or Python background, they would say that you can vectorise almost everything. In my opinion, if you’ve only used Matlab or Python, you shy away from the vast number of useful algorithms that you can’t vectorise. We find that vectorising works often, but in any difficult task, there’ll be at least something that has to be redone with a for-loop. It is true, however, that you can work round this problem by vectorising what you can, and using C (or just learning patience) for what you can’t.
Python’s second disadvantage though, is that large projects become unwieldy because they aren’t statically typed. If you change one object, you can’t know that the whole program still works. And when you look at a difficult set of code, there’s often nowhere that you can go to see what an object is made of. An extreme example actually came from code that went to a database, got an object and worked on that object. The object’s structure was determined by the table structure in the database. That makes it impossible to look at the code and see if it’s correct without comparing it to the tables in the database. This unwieldiness gets worse if you have C modules: Having some functionality that’s separated from the rest of the logic simply because it needs a for-loop makes it hard to look through the code.
Consequently, we only use Python now for really quick analysis nowadays. An example is to load a json from a file, and quickly plot a histogram of the market cap. For this task, it’s still good. There’s no danger of it getting too complicated. Nobody needs to read over the code later, and generally speed isn’t a problem.
The comparison process
The last week, though, has been spent comparing Haskell to C#. Haskell is an enormously sophisticated language that has, somewhat surprisingly, become easy to use and set up, thanks to the efforts of the Haskell community (Special thanks to FPComplete there). However, it’s a very different language to C#, and this makes it hard to compare. There is an impulse to try not to compare languages because the comparison can decay into a flame war. If you google X vs Y, where X and Y are languages, you come across a few well informed opinions, and a lot of badly informed wars.
There are several reasons for this, in my opinion. Often the languages’ advantages are too complicated for people to clearly see what they are and how important they are, and each language has some advantages and some disadvantages. A lot of the comparisons are entirely subjective – so two people can have the same information and disagree about it. The choice is often a psychology question regarding which is clearer (for both the first developer, and the year-later maintainer) and a sociology question about how the code will work with a team’s development habits.
There’s another difficulty that most of the evaluators are invested in one or the other language – nobody wants to be told that their favorite language, that they have spent 5 years using, is worse. We even evaluate ourselves according to how good we are at language X and how useful that is overall. A final difficulty is that there’s peer pressure and stigma and kudos attached to various languages.
So what’s the point? The task seems so difficult that perhaps it’s not worth attempting.
The point is that it’s the only way to decide between languages. And actually all the way through the history of science and technology, there have been comparisons between different technologies, and sure, the comparison’s difficult, but you can’t not compare things because the conclusions are valuable. The important thing is to remember that the comparison’s not going to be final, and that mistakes can be made, and that the comparison will be somewhat subjective, but that doesn’t make it pointless.
Haskell
As a disclaimer, I am not a Haskell expert. I have spent a couple of months using Haskell, and I gravitate towards the lens and HMatrix libraries. I have benefited from sitting beside Alex Bates, who is considerably better at Haskell than I am.
Haskell has a very sophisticated type system, and the language is considered to be safe, concise and fast. For me, I suspect that I could write a lot of code in a way that makes it more general than I could in C# – Often I find that my C# code is tied to a specific implementation. You don’t have to: you could make heavy use of interfaces, but my impression is that Haskell is better for writing very general code.
Haskell also has an advantage when you wish to use only pure functions – its type system makes it explicit when you are using state and when you are not. However, in my experience, unintended side effects are actually a very minor problem in C#. Sure, a function that says it’s estimating the derivative of a spline curve *might* be using system environment variables as temporary variables. But probably not. If you code badly, you can get into a mess, but normal coding practices make it rare (in my experience) that state actually causes problems. Haskell’s purity guarantees can theoretically help the compiler, though, and may pay dividends when parallelising code – but I personally do not know. I personally reject the idea that state in itself is inherently bad – a lot of calculations are simpler when expressed using state. Of course, Haskell can manipulate state if you want it too, but at the cost of slightly (in my opinion) more complexity and in the 3 or so examples that I’ve looked at, slightly longer code too.
Haskell is often surprisingly fast – that often surprises non Haskell-users (often the surprise is accompanied by having to admit that the Haskell way is, in fact, faster). The extra complexity and loss of conciseness is something that better designed libraries might overcome. This is possibly accentuated for the highest level code: My impression is that lambda functions and the equivalent of LINQ expressions in Haskell produces less speed overhead in Haskell than in C#.
Another advantage of Haskell is safety – null reference exceptions are not easy to come by in Haskell, and some other runtime errors are eliminated. However, you still can get a runtime exception in Haskell from other things (like taking the head of an empty list, or finding the minus-1st element of an HMatrix array). On the other hand, exception handling is currently less uniform (I think) than C#, and possibly less powerful, so again, we have room for uncertainty.
Some disadvantages of Haskell seem to be that you can accidentally do things that ruin performance (for instance using lists in a way that makes an O(N) algorithm O(N^2) or give yourself stack overflows (doing recursion the wrong way). However, I know that these are considered to be newbie-related, and not serious problems. When I was using Haskell for backtests, I quickly settled on a numerics library and a way of doing things that didn’t in fact have either of these problems. However, when I look over online tutorials, it’s remarkable how many tutorial examples don’t scale up to N=1e6 because of stack overflow problems.
For me, perhaps the most unambiguously beneficial Haskell example was to produce HTML in a type-safe way. A set of functions are defined that form a domain-specific language. This library is imported, and the remaining Haskell code just looks essentially like a variant of HTML – except that it gets checked at compile time, and you have access to all the tools at Haskell’s disposal. You could do that in C#, but it would not look pretty, and as far as I know, it’s not how many people embed HTML in C#.
But we are just starting the comparison process, and it will be interesting in the coming days, months and years to find out exactly what the strengths and weaknesses of this emerging technology are.
In a future blog post, we’ll write about the areas in which the two win and lose on. But for now, it’s better to leave it unconcluded.