Archives

All posts by Alex Bates

OTAS stamps are one of the most recognisable ways in which we visually represent our data. We generate stamps for both single-stock and list data across many metrics, always seeking to present the most important information clearly and concisely. Currently, our stamps are handwritten in the SVG vector graphic format. In this post, I’m going to explore recreating one of our stamps in Haskell using the diagrams framework, leveraging the expressiveness and reuse that come with the abstraction as much as possible. This is not a diagrams tutorial; please refer to the manual for a comprehensive introduction.

OTAS single-stock stamps

OTAS single-stock stamps

I’m going to recreate one of our simpler stamps, the Signals stamp. This is for simplicity: I am confident diagrams is expressive enough to recreate all of our stamps. The Signals stamp displays the most recent close-to-close technical signal which has fired for the stock, showing the number of days ago the signals fired and the average one-month return the stock has experienced when this signal has fired in the past. Here’s an enlarged image of the stamp:

diagrams provides a very general Diagram type which can represent, amongst others, two-dimensional and three-dimensional objects. I’ll build up the stamp by combining smaller elements into a larger whole, focusing first on the core details contained in the inner rectangle: the row of boxes and lines of text. As we’ll see, the framework provides a rich array of primitive two-dimensional shapes with which to begin our diagram, and a host of combinators for combining them intuitively. Bear in mind that the Diagram type is wholly independent of the back-end used to render it: back-ends exist for software such as Cairo and PostScript as well as a first-class Haskell SVG representation provided by the diagrams-svg library.

First, let’s get some imports and definitions out of the way:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}

module Stamps where

import Diagrams.Prelude
import Diagrams.Backend.SVG
import Text.Printf

data SignalsFlag = SignalsFlag
  { avgHistReturn :: Double
  , daysAgo :: Int
  }

lightGrey
  = sRGB24read "#BBBBBB"
mediumGrey
  = sRGB24read "#AAAAAA"
darkGrey
  = sRGB24read "#999999"
brightGreen
  = sRGB24read "#40981C"
mud
  = sRGB24read "#101010"
steel
  = sRGB24read "#2F2F2F"

Our point of reference is Diagrams.Prelude in the diagrams-lib package (I’m using version 1.3), which exports the core types and combinators together with handy tools from lens and base. Additionally, we import Diagrams.Backend.SVG from diagrams-svg as our back-end. As I explained above, Diagrams is back-end agnostic: we import one here only to simplify the type signatures. The remainder of the above snippet defines a representation of a signal and a set of colours, read from a hexadecimal colour code.

Primitive shapes such as squares and rectangles, as well as combinations of these, have type Diagram B (the B type is provided by the back-end). Crucially, each diagram has a local origin which determines how it composes with other diagrams. As you may guess, regular polygons use, by default, the canonical origin, but this may not be as easy to determine for more complex structures.

The most basic combinators are (<>), (|||) and (===) and their list equivalents mconcat, hcat and vcat. (<>) (monoidal append) superimposes the first diagram atop the second (clearly diagrams form a monoid under such an operation), so circle 5 <> square 3 is a circle of radius 5 overlaid on a 3 by 3 square, such that their origins coincide (the resulting diagram has the same origin). (|||) combines two diagrams side-by-side, with the resulting origin equal to the origin of the first shape, while (===) performs vertical composition. These combinators are instances of the more general beside function, but they are sufficient here. Additionally, beneath is a handy synonym for flip (<>), and # is reverse function application, provided to aid readability.

Here’s the function to generate the internals of the signals stamp. Remember, we’re using a SignalsFlag to generate a Diagram B, hence the type signature of signalsStamp.

signalsStamp :: SignalsFlag -> Diagram B
signalsStamp flag
  = vcat $
      [ strutY 1.2 `beneath` text perf
          # fontSizeL 1.2
          # fc white
      , strutY 1 `beneath` text "20d avg. perf."
          # fontSizeL 0.8
          # fc mediumGrey
      , strutY 0.5
      , boxRow
          # alignX 0
      , strutY 1 `beneath` text "days ago"
          # fontSizeL 0.8
          # fc mediumGrey
      , strutY 1 -- Padding for aesthetic
      ]
  where
    perf
      = printf "%+.2f%%" (view avgHistReturn flag)
    boxRow :: Diagram B
    boxRow
      = hcat boxes
      where
        boxes
          = map (\k -> box k (k == view daysAgo flag)) [5, 4 .. 1]
          where
            box n hasSignal
              = square `beneath` text (show n)
                  # fc white
              where
                square
                  = unitSquare
                      # lc lightGrey
                      # if hasSignal
                          then
                            fc brightGreen
                          else
                            id

Before I explain the code, here’s the resulting SVG:

The centre of the signals stamp

The centre of the signals stamp

The top level of the function vertically concatenates (vcat) the four components of the diagram. Text is enclosed in strutYs, invisible constructs which represent empty space in the diagram, and which are also used for padding. Font size is adjusted relative to the enclosing strut with the fontSizeL function, so the size of the rendered text is a function both of the size of the strut and the font size. boxRow is a Diagram B representing the row of boxes and needs to be aligned with alignX to recentre the origin horizontally prior to vertical concatenation.

Constructing boxRow requires horizontally concatenating (hcat) five boxes labelled 5 through 1, with the box corresponding to the signal’s daysAgo field shaded green. The unitSquare primitive constructs a 1 by 1 square, which we fill with fc if the boolean hasSignal is true. Remember, to insert text into the square we superimpose the string onto it with (<>). A simple map generating the box indices and hasSignal values finishes the construction of the row.

From this small example, I hope it’s clear that simple diagrams, at least, may be constructed in a clear, declarative way. Next, I want to focus on constructing the series of rectangles which will surround the above image, the bezel. This is where the higher-level representation of diagrams really shines: reusability. We can construct a function which will take any diagram, scale it to a uniform size, and then enclose it in our bezel. This could be reused for every stamp we create, with no code duplication. To implement this, we’ll create a function bezel :: String -> Diagram B -> Diagram B which will wrap the given diagram in a series of rectangles and add the given title:

bezel :: String -> Diagram B -> Diagram B
bezel title stamp
  = mconcat
      [ (title' === centre)
          # alignY 0
      , roundedRect (r + innerWidth + outerWidth + outerWidth') (1 + innerWidth + outerWidth + titleHeight + outerWidth') curvature
          # fc mud
          # font "Arial"
      ]
  where
    -- Ratio between width and height of stamp
    r
      = 0.8
    stamp'
      = stamp
          # scaleToY 1
          # scaleToX r
    innerWidth
      = 0.1
    outerWidth
      = 0.025
    outerWidth'
      = 0.075
    curvature
      = 0.075
    titleHeight
      = 0.15
    title'
      = strutY titleHeight `beneath` text title
          # fc darkGrey
          # fontSizeL titleHeight
          # font "Arial"
    centre
      = mconcat
          [ stamp'
              # alignY 0
          , roundedRect (r + innerWidth) (1 + innerWidth) curvature
              # fc steel
          , rect (r + innerWidth + outerWidth) (1 + innerWidth + outerWidth)
              # fc black
          ]

The core specification is given at the top level and in the definition of centre. centre encloses the scaled stamp (of aspect ratio r) in first a steel-coloured rounded rectangle and then a black regular rectangle. The scaling is important, as it allows the function to operate correctly regardless of the width and height of the diagram argument. Then, the title is vertically concatenated (===) to this object and superimposed onto a final mud-coloured rounded rectangle.

Here’s the result of bezel "Signals" mempty, the bezel enclosing the empty diagram:

The empty bezel

The empty bezel

And here’s the final result, bezel "Signals" (signalsStamp $ SignalsFlag 2.06 5):

The final stamp

The final stamp

I’m very happy with this result, especially for such straightforward code. I haven’t been truly faithful to the dimensions of the original image, but they could be recreated with a little more effort. As I mentioned, our bezel definition is reusable across other diagrams, so to recreate our other stamps I would only need focus on the core structure, logic which I’m confident would be straightforward with Diagrams. But Diagrams is capable of far more than this, and I encourage you to explore the manual for an in-depth introduction.

Free monads are quite a hot topic amongst the Haskell community at the moment. With a category-theoretic derivation, getting to grips with them can be rather daunting, and I have found it difficult to understand the engineering benefit they introduce. However, having finally come to grips with the fundamentals of free monads, I thought I’d write a brief introduction in my own terms, focussing on the benefit when designing interpreters and domain specific languages (DSLs). Certainly, the information described below can be found elsewhere, such as this Stack Exchange question or Gabriel Gonzalez’ excellent blog post.

The free monad interpreter pattern allows you to generate a DSL program as pure data with do-notation, entirely separate from the interpreter which runs it. Indeed, you could construct the DSL and generate DSL programs without any implementation of the semantics. Traditionally, your DSL would be constructed from monadic functions hiding the underlying interpreter, with these functions implementing the interpreter logic. For example, a DSL for a stack-based calculator might implement functionality to push an element to the stack as a function push :: Double -> State [Double] (). Our free monad pattern separates the specification from the implementation, so our calculator program would be represented first-class, which can be passed both to the ‘true’ interpreter, as well as any other we wish to construct.

Let’s stick with the example of constructing a DSL for a stack-based calculator. To begin with, we need to construct a functor representing the various operations we can perform.

data Op next
  = Push Double next -- ^ Push element to the stack
  | Pop (Maybe Double -> next) -- ^ Pop element and return it, if it exists
  | Flip next -- ^ Flip top two element of stack, if they exist
  | Add next -- ^ Add top two elements, if they exist
  | Subtract next
  | Multiply next
  | Divide next
  | End -- ^ Terminate program
  deriving (Functor)

Our calculator programs will be able to push and pop elements from the stack and flip, add, subtract, multiply and divide the top two elements, if they exist. This definition may seem slightly strange: the datatype has a type parameter next which is included in each non-final instruction. In this way, each operation holds the following instructions in the program. The End constructor is special in that no instructions may follow it. Pop is special too: its field is a function, indicating that Maybe Double will be returned as a value in the context of the program (see below).

As I mentioned, Op needs to be a functor, which we derive automatically with the DeriveFunctor pragma. This will always be possible with a declaration of this style because there is exactly one way to define a valid functor instance.

As it stands, programs of different lengths will have different types, due to the type parameter next. For example, the program End has type Op next (next is a free type parameter here) while Push 4 (Push 5 (Add End)) has type Op (Op (Op (Op next))). Note that these look suspiciously like lists, the elements of which are Op‘s constructors. The free monad of a functor encodes this list-like representation while also providing a way to lift a value to the monadic context. For any functor f, the free monad Free f is the datatype data Free f a = Pure a | Free (f (Free f a)). Notice the similarity to the definition of list.

Suffice to say, Free f is a monad. I don’t want to dwell on the mathematical significance of Free or its application beyond interpreters: I’d rather focus on how they are used to define DSL programs using do-notation.

-- From Control.Monad.Free 
type Program = Free Op

push :: Double -> Program ()
push x
  = liftF $ Push x ()

pop :: Program (Maybe Double)
pop
  = liftF $ Pop id

flip :: Program ()
flip
  = liftF $ Flip ()

add :: Program ()
add
  = liftF $ Add ()

subtract :: Program ()
subtract
  = liftF $ Subtract ()

multiply :: Program ()
multiply
  = liftF $ Multiply ()

divide :: Program ()
divide
  = liftF $ Divide ()

end :: forall a. Program a
end
  = liftF End

Program is the monad representing our DSL program, and is defined as Free Op (Free is defined in the free package). We need to provide functions to lift each constructor into Program, using liftF. Notice that almost all of the functions are of type Program (), except for pop, which yields the top of the stack as a Maybe Double in the monadic context, and end, whose existential type indicates the end of a program (this requires the RankNTypes language extension). With these, we can start defining some calculator programs. Remember, we’re simply defining data here: we’re haven’t defined an interpreter yet!

-- Compute 2 * 2 + 3 * 3, leaving result on top of stack
prog :: forall a. Program a 
prog = do 
  push 2 
  push 2
  multiply 
  push 3 
  push 3 
  multiply 
  add
  end

pop allows us to read values into the Program context, meaning we can use pure Haskell functions in our programs:

-- Double the top element of the stack.
double :: Program ()
double = do
  x <- pop
  case x of 
    Nothing ->
      return ()
    Just x' ->
      push $ x' * 2

Finally, let’s define an interpreter for our DSL, which returns the state of the stack after program execution.

modStack :: (a -> a -> a) -> [a] -> [a]
modStack f (x : x' : xs)
  = f x x' : xs
modStack _ xs
  = xs

-- The existential type of the parameter forces the program to be 
-- terminated with 'end'.
interpret :: (forall a. Program a) -> [Double]
interpret prog
  = execState (interpret' prog) []
  where
    interpret' :: Program a -> State [Double] ()
    interpret' (Free (Push x next)) = do
      modify (x :)
      interpret' next
    interpret' (Free (Pop next)) = do
      stack  do
          put $ xs
          interpret' $ next (Just x)
        [] ->
          interpret' $ next Nothing
    interpret' (Free (Flip next)) = do
      stack 
          put $ x' : x : xs
        _ -> return ()
      interpret' next
    interpret' (Free (Add next)) = do
      modify $ modStack (+)
      interpret' next
    interpret' (Free (Subtract next)) = do
      modify $ modStack (-)
      interpret' next
    interpret' (Free (Multiply next)) = do
      modify $ modStack (*)
      interpret' next
    interpret' (Free (Divide next)) = do
      modify $ modStack (/)
      interpret' next
    interpret' (Free End)
      = return ()

The interpreter logic is very simple: we pattern match against each Op constructor, mutating an underlying list of doubles where necessary, being sure to recurse on the remainder of the program represented by next. interpret' accepts an arbitrary Program a parameter which doesn’t necessarily have to be terminated by end, although we insist on this by wrapping the function in interpret which demands an existential type. This detail is not important here, although may be for more complex interpreters which require resource clean-up once the interpretation has terminated. end could also be automatically appended when interpreting, too.

The beauty of this programming pattern is the separation of concerns: we can freely define a different interpreter which can operate on the same DSL program. It is often very natural to want to define more than one interpreter: in this case, I may want to execute the program, additionally warning the user when illegal pops or flips are made. This is possible because the interpreter logic does not define the embedded language, unlike in the traditional approach, making this a very attractive design pattern for DSL and interpreter construction.

OTAS Lingo is our natural language app to summarise many of our analyses in a textual report. Users can subscribe to daily, weekly or monthly Lingo emails for a custom portfolio of stocks or a public index, and receive a paragraph for each metric we offer, highlighting important changes across the time period. Our method of natural language generation is simple: given the state of the underlying data, we generate at least one sentence for each of the metric’s sub-topics, and concatenate the results to form a paragraph. This method of text generation has proved to be very customisable and extensible, despite being time-consuming to implement, and the results are very readable. However, given that we specify hundreds of these sentences in the code base, some of which are rendered very infrequently, how do we leverage compile-time checking to overcome the runtime issues of tools such as printf?

First, a little background. We render our natural language both as HTML and plain text, so we abstract the underlying structure with a GeneratedText type. This type represents dates, times and our internal security identifiers (OtasSecurityIds), in addition to raw text values. HTML and plain text are rendered with two respective functions, which use additional information, such as security identifier format (name, RIC code etc.), to customise the output. Here’s a simplified representation of the generated text type.

type HtmlClass = T.Text

data Token 
  = Text T.Text -- ^ Strict Text from text package
  | Day Day -- ^ From time package
  | OtasId OtasSecurityId 
  | Span [HtmlClass] [Token] -- ^ Wrap underlying tokens in span tag, with given classes, when rendering as HTML

newtype GeneratedText = GeneratedText [Token]
  deriving (Monoid)

We expose GeneratedText as a newtype, and not a type synonym, to hide the internal implementation as much as possible. Users are provided with functions such as liftOtasId :: OtasSecurityId-> GeneratedText, which generates a single-token text value, an IsString instance for convenience and a Monoid instance for composition. With this, we can also express useful combinators such as list :: [GeneratedText] -> GeneratedText, which will render a list more naturally as English. These combinators may be used to construct sentences such as the following.

-- ^ List stocks with high volume.
-- E.g. High volume was observed in Vodafone, Barclays and Tesco.
volumeText :: [OtasSecurityId] -> GeneratedText
volumeText sids
  = "High volume was observed in " <> list (map liftOtasId sids) <> "."

This definition is okay, but it is helpful to separate the format of the generated text from the underlying data as much as possible. Think about C-style format strings, which completely separate the format from the arguments to the format. Our code base contains hundreds of text sentences, so the code is more readable with this distinction in place. Additionally, there’s value in being able to treat formats as first class, allowing the structure of a sentence to be built up from sub-formats before being rendered to a generated text value. Enter formatting.

The formatting package is a fantastic type-safe equivalent to printf, allowing construction of typed format strings which can be rendered to strict of lazy text values. The type and number of expected arguments are encoded in the type of the formatter, so everything is checked at compile time: there are no runtime exceptions with formatting. The underlying logic is built upon the HoleyMonoid package, and many formatters are provided to render, for example, numbers with varying precision and in different bases. Formatting uses a text Builder under the hood, but the types will work with any monoid, such as our GeneratedText type, although we do need to reimplement the underlying functionality and formatters. However, this gives us a very clean way of expressing the format of our text:

volumeFormat :: Format r ([OtasSecurityId] -> r)
volumeFormat
  = "High volume was observed in " % list sid % "."

volumeText :: [OtasSecurityId] -> GeneratedText
volumeText 
  = format volumeFormat

The Format type expresses the type and number of arguments expected by the format string, in the correct order. volumeFormat above expects a list of OtasSecurityIds, so has type Format r ([OtasSecurityId] -> r), whereas a format string expecting an OtasSecurityId and a Double would have type Format r (OtasSecurityId -> Double -> r). The r parameter represents the result type, which, in our case, will always be a GeneratedText. Formatters are composed with (%), which keeps track of the expected argument, and a helpful IsString instance exists to avoid lifting text values. Finally, format will convert a format string to a function expecting the correct arguments, and yielding a GeneratedText value.

The format string style is especially beneficial as the text becomes longer, as the format is described without any noise from the data being textualised. We use a range of formatters for displaying dates, times, numeric values and security IDs, but we can also encode more complex behaviours as higher-order formatters, such as list above. Users can also define custom formatters to abstract more specialised representations. In practice, we specify all our generated text in Lingo in this way: it’s simply a clearer style of programming.

Several months ago, Stack added support for building and running Haskell modules on-the-fly. This means that you no longer need to include a Cabal file to build and run a singe-module Haskell applications. Here’s a minimal example, HelloWorld.hs.


#!/usr/bin/env stack
-- stack --resolver lts-3.12 --install-ghc runghc

main
  = putStrLn "Hello, world"

Given HelloWorld.hs has executable permissions, ./HelloWorld.hs will produce the expected output!

Stack can be configured in the second line of the file, with general form -- stack [options] runghc [options]. Above, the --install-ghc switch instructs Stack to download and install the appropriate version of GHC if it’s not already present. Specifying the resolver version is also crucial, as it enables the build to be forever reproducible, regardless of updates to any package dependencies. Happily, no extra work is required to import modules from other packages included in the resolver’s snapshot, although non-resolver packages can be loaded with the --package switch.

This feature finally allows programmers to leverage the advantages of Haskell in place of traditional scripting languages such as Python. This would be especially useful for ‘glue’ scripts, transforming data representations between programs, given Haskell’s type safely and excellent support for a myriad of data formats.

The Turtle Library

Gabriel Gonzalez’ turtle library attempts to leverage Haskell in place of Bash. I write Bash scripts very occasionally and am prone to forgetting the syntax, so immediately this is an interesting proposition to me. However, does the flexibility of the command line utilities translate usefully to Haskell?

The turtle library serves two purposes. Firstly, it re-exports commonly-used utilities provided by other packages, reducing the number of dependencies in your scripts. Some of these functions are renamed to match their UNIX equivalents. Secondly, it exposes the Shell type, which encapsulates streaming the results of running some of these utilities. For example, stdin :: Shell Text streams all lines from the standard input, while ls :: FilePath -> Shell FilePath streams all immediate children of the given directory.

The Shell type is similar to ListT IO, and shares its monadic behaviour. Binding to a Shell computation extracts an element from the computation’s output stream, and mzero terminates the current stream of computation. Text operations are line-oriented, so the monadic instance is all about feeding a line of output into another Shell computation. Putting this together lets us write really expressive operations, such as the following.


readCurrentDir :: Shell Text
readCurrentDir = do
  file <- ls "."
  guard $ testfile file -- Ensure 'file' is a regular file!
  input file -- Read the file

readCurrentDir is a Shell which streams each line of text of each file in the current directory. Remember, guard will terminate the computation when file isn’t a directory, so its contents will never be read. Without it, we risk an exception being thrown when trying to read a non-regular file.

The library exposes functions for aggregating and filtering the output of a Shell. grep :: Pattern a -> Shell Text -> Shell Text uses a Pattern to discard non-conforming lines, identical to its namesake. The Pattern type is a monadic backtracking parser capable of both matching and parsing values from Text, and is also used in the sed :: Pattern Text -> Shell Text -> Shell Text utility. The Fold interface collapses the Shell to a single result, and sh and view run a shell to completion in the IO monad.

Perhaps unsurprisingly, Turtle’s utilities are not as flexible as their command-line equivalents, as no variants exists for different command-line arguments. Of course, it would be possible for the library to expose a ‘Grep’ module with a typed representation of the program’s numerous optional parameters, but this seems overkill when I can execute and stream the results with inShell. This, then, informs my use of Turtle in the future. I do see myself using it in place of shell scripts, and not for typed representations of UNIX utilities. The library lifts shell output into an environment which I, as a Haskell programmer, can manipulate efficiently and expressively. It serves as the glue whose syntax and semantics I already reason with every day.

See Also

Stack script interpreter documentation

Here at OTAS, we run a private, internal instance of Hackage to easily share our Haskell packages and documentation amongst our developers. It’s really a core part of our Haskell development workflow and, once fully set-up, integrates seamlessly with Stack. Unfortunately, I’ve found deploying the server to be tricky and poorly-documented, so I’ve written this guide to help make the process easier.

We use Stack for building all our Haskell projects, so we do not explicitly install either GHC or cabal-install on our machines. I’ll assume you have a working installation of Stack on your machine before you begin. Cabal-install users should be able to follow easily enough.

Building the Hackage Server

I’ve found it very frustrating to build the hackage-server packages listed on Hackage. At the time of writing, two version, 0.4 and 0.5.0, are listed, both of which have an extended list of dependencies which Cabal’s solver isn’t able to resolve for me. As such, I went directly to the Github source and decided to build the most recent commit at the time of writing, e0bd7fc2a8701a807a5551afc1ab1e9df4179e81.


git clone git@github.com:haskell/hackage-server.git 
cd hackage-server
git checkout e0bd7fc2a8701a807a5551afc1ab1e9df4179e81

Stack successfully generated this stack.yaml file (using stack init --solver) which you can copy into your working directory to reproduce the build. Simply run stack build to generate the executables.

Running Hackage

Before we’re ready to initialise the server we need to generate private keys using hackage-repo-tool. First build and copy this program to your path with Stack then generate and install the keys.


hackage-repo-tool create-keys --keys keys
cp keys/timestamp/.private datafiles/TUF/timestamp.private
cp keys/snapshot/.private datafiles/TUF/snapshot.private
hackage-repo-tool create-root --keys keys -o datafiles/TUF/root.json
hackage-repo-tool create-mirrors --keys keys -o datafiles/TUF/mirrors.json

Now we’re ready to initialise the server! In the following command, substitute the administrator username and password with your chosen credentials.

./hackage-server init --admin=user:password --static-dir=datafiles/

Finally, run the server on your chosen port.

./hackage-server run --port=4050 --static-dir=datafiles/

Now that your server’s up and running (and assuming your firewall’s correctly configured) you’ll be able to access the local Hackage instance through your browser. There you’ll find documentation for administrative functions such as user management. This wiki page may be helpful.

Automatically Building Documentation

We use hackage-build to automatically generate documentation for uploaded packages. This program requires cabal-install, its dependencies and GHC to be accessible on your PATH. Fortunately, we can use Stack to install cabal-install for us!

stack install alex happy cabal-install

Here’s where things get a little tricky. You’ll need to use a different instance of hackage-build for each version of GHC that your packages require to build. In my case, I have packages depending on base versions 4.7.0.2 and 4.8.0.1, so I require GHC versions 7.8.4 and 7.10.2. I’ve installed both of these locally, off-path, at $HOME/.local/bin/ghc-7.8.4 and $HOME/.local/bin/ghc-7.10.2 respectively.

Of course, Hackage will document your packages by building them with cabal-install, which may well fail to find a successful build plan. Before switching over to Stack, I used Stackage snapshots to constrain the range of packages cabal-install considered. Therefore, all my older packages build with lts-1.0, and all my newer packages with lts-3.12. In the future, I’ll release packages with using a more recent snapshot, but I’ll always control when this happens.

For me, then, the best course is to run a hackage-build instance for each snapshot I’ve used. Documentation will therefore certainly be built by this instance for a package known to build with the corresponding snapshot. For each snapshot, I initialise hackage-build in a separate directory using the username and password of the account with which I wish to upload the documentation with. Note that account needs to be enabled as a trustee.

./hackage-build init http://localhost:4050 --init-username=user --init-password=password --cache-dir=build-cache-lts-1.0 # For lts-1.0

In the newly created build-cache-lts-1.0 directory I edit the cabal-config file to include both Hackage and the local Hackage as remote-repos, and I add the constraints specified by the corresponding snapshot. The first few lines of the file look like this.


remote-repo: localhost:http://localhost:4050
remote-repo: hackage.haskell.org:http://hackage.haskell.org/packages/archive
remote-repo-cache: build-cache-lts-1.0/cached-tarballs
constraint: AC-Vector ==2.3.2
constraint: BlastHTTP ==1.0.1
constraint: BlogLiterately ==0.7.1.7

Remember, I perform the above sequence for each of my two snapshots. Finally, I can run each build client with its corresponding GHC version. For the lts-1.0 process, I use the following command.

PATH=$HOME/.local/bin/ghc-7.8.4/bin:$PATH ./hackage-build build --cache-dir=build-cache-lts-1.0

From here, it’s simple to script the builder to run continuously. Also check out the output of ./hackage-build --help for more options.

Enabling access through Stack

To alert Stack to your new internal Hackage, include its details in the package-indices section of your global config file (~/.stack/config.yaml). Instructions are provided here.

See Also

hackage-server wiki

Edsko’s commit on settings up TUF for the server

This post discusses my experiences as a software engineer at OTAS Technologies, developing almost exclusively in Haskell for the past six months. Prior to this I’d learnt about and used functional languages solely in academia, in the context of type theory, with little development experience. Here are my views on some pretty fundamental issues I deal with day-to-day.

Tooling

The Haskell toolchain is not simple. Certainly, it took me several weeks before I adopted the system I currently use, although happily this system works successfully. The Haskell platform ships with GHC and Cabal. Cabal is the Haskell build system and dependency manager, and it is at once powerful and fragile. Usage of Cabal is well documented and its operation is not complicated, and yet many resources and tutorials do not stress the importance of sandboxes in Haskell development.

Sandboxes provide a mechanism to isolate the build process of individual Haskell projects. By default Cabal will use a shared (global) package database for storing compiled packages, dependencies of your library/program. However, sharing compiled packages leads to breakages when their dependencies are updated, an affliction known as `Cabal hell’. Sandboxes solve this issue by isolating (sandboxing) compiled dependencies per project, at the expense of vastly longer setup times, as all dependencies need to re-downloaded and built. However, their use is crucial in Haskell development and I’m unsure why this isn’t enforced by Cabal.

Isolated build environments with sandboxes are half the story to reliable Haskell development. It is necessary to future-proof your packages, ensuring upgrades to dependencies break neither your package or another dependency. `Stackage’ is a Hackage alternative that provides restrictions on packages, ensuring they are mutually-compatible. By restricting your projects to package dependencies specified by the Stackage snapshot you guarantee your package is buildable for all time. Certainly I make use of these snapshots in all my Haskell projects. I have not yet needed to upgrade a snapshot to use a newer version of a dependency, although I anticipate the process will be simple. Certainly Stackage enables me to upgrade dependencies in a controlled fashion.

I currently advise Haskellers to install the Haskell toolchain local to their user, and not through an OS-level package manager. The reason for this is twofold. Firstly, package managers frequently have an outdated version of GHC. Secondly, package managers also provide packages of popular Haskell libraries, which are then installed to the global package database. This interferes with the sandbox approach described above, and the two are not compatible.

Note that I have not explored packaging solutions such as Nix, as I find my current approach satisfactory, although I will be interested in seeing how the project develops.

Libraries

The Haskell ecosystem includes some fantastic software, available for instant use as libraries served by Hackage. Spend enough time using Hackage and you become familiar with many of the authors! A key skill when developing Haskell projects is identifying libraries that match your needs, the architecture of your program, and your other dependencies. Many people would expect the functionality provided by some of these libraries to be shipped with the compiler (`random’ and `time’ come to mind..). However, I see it as an advantage that core functionality is spread across many independent developers, as features are developed and pushed faster. The ecosystem certainly moves fast!

Certain libraries provide types and combinators that seemingly alter the very language. The prototypical example is Edward Kmett’s `lens’ library, which provides an extremely general, elegant and powerful framework for navigating data structures. Usage of the lens library is straightforward, but the underlying mathematical concepts are smart, and very interesting to learn about.

It takes experience to understand the gravity of your choice of library. There are libraries which attempt to perform the same functionality (pipes and conduit, warp and happstack etc.) but are architected in vastly different ways, such that converting between the two is far from trivial. For example, the lens library forms the sole interface to my OTAS Base wrapper, and is so heavily used that upgrading the dependency version requires careful consideration.

Using Haskell day-to-day

Haskell development is, frankly, awesome. When I describe my job, I frequently say I’m `living the programming dream’ (having the autonomy to choose my software tools and having full control over my projects are two reasons for my happiness, too). Programming in a purely functional language allows you to produce smarter, more correct code, period. The type system forces you to request and record the behavior of your function in its type. If your function needs access to a read-only environment, wrap its type in the Reader transformer. Need to log? Use the Writer monad. If you need to characterise failure, incorporate an Error type into your stack.

Because this correctness and clarity stems from restricting the behaviour of your functions, you do need to think carefully about what exactly they need to do. In particular, adding a feature to the software can often require a fundamental change in the monad transformer stack at the heart of your program. Therefore it is imperative you carefully consider the direction the project will take even before you begin development, and always consider the implications of the types you choose.

One source of difficulty for me is knowing to what extent I should generalise my types. This is often the case with monadic stacks based on IO. Annoyingly, core IO actions provided by the Haskell Prelude are of a concrete IO type (e.g. putStrLn “Hello, world!” :: IO ()), and one must explicitly import and use `liftIO’ from the `MonadIO’ typeclass when performing IO actions deeper in the stack. Handling exceptions requires similar machinery. Ideally, all IO actions would utilise generalised typeclasses such as MonadIO, `MonadBase’ and `MonadBaseControl’ as fully general frameworks for exception handling and IO actions. However, prematurely generalising your types complicates development, and furthermore many libraries do not currently support the most general interface possible. In practice fully general types leads to much cleaner code, although reasoning about the types and debugging becomes more difficult.

Finally, I’m beginning to notice limitations in the language which would greatly improve some aspects of development, although it seems unlikely they will see a solution in the near future. Foremost in this is Haskell’s record system. I would like to see a lens-based anonymous record system implemented in GHC, which would solve the overloaded records problem and enable `first class’ record manipulations. Alas, this would be a new dialect of Haskell! Additionally, the Prelude supports many outdated and inefficient concepts deemed beginner friendly (the String type, for example) and which are only now being improved upon (the applicative-monad proposal and the foldable-traversable proposal are two key imminent changes).

Certainly though, my experience with Haskell has been extremely positive. I’d be skeptical of anyone disregarding Haskell for any problem domain due to performance (except perhaps embedded or OS-level software) or through lack of libraries. Certainly at OTAS, barring exceptional circumstances I’ll be encouraging all future projects to be implemented in Haskell.