DEFUN 2008 - Real World Haskell

Real World Haskell

Bryan O’Sullivan
bos@serpentine.com

2008-09-27

Welcome!

A few things to expect about this tutorial:
The pace will be rapid
Stop me and ask questions—early and often
I assume no prior Haskell exposure

A little bit about Haskell

Haskell is a multi-paradigm language.
It chooses some unusual, but principled, defaults:
Pure functions
Non-strict evaluation
Immutable data
Static, strong typing
Why default to these behaviours?
We want our code to be safe, modular, and tractable.

Pure functions

Definition
The result of a pure function depends only on its visible inputs:
Given identical inputs, it always computes the same result.
It has no other observable effects.
What are some consequences of this?
Modularity leads to simplified reasoning about behaviour.
Straightforward testing: no need for elaborate frameworks.

Immutable data

Definition
Data is immutable (or purely functional) if it is never modified
after construction.
To “modify” a value, we create a new value.
Both new and old versions can coexist afterwards, so we get
persistent, versioned data for free.
Modification is often easier than with mutable data.
In multithreaded code, we do away with much elaborate
locking.

Static, strong typing

Deﬁnition
A program is statically typed if we know the type of every
expression before the program is run.

Deﬁnition
Code is strongly typed if the absence of certain classes of error can
be proven statically.

Safety, modularity, and tractability

Safety:
As few nasty surprises at runtime as possible.
Static typing and eased testing give us conﬁdence.
Modularity:
We can build big pieces of code from smaller components.
No need to focus on the details of the smaller parts.
Tractability:
All of this ﬁts in our brain comfortably...
...leaving plenty of room for the application we care about.

GHC, the Glorious Glasgow Haskell Compiler

Have you got GHC yet?
Download installer for Windows, OS X, or Linux here:
http://www.haskell.org/ghc/download_ghc_683.html

What’s special about GHC?

Mature, portable, optimising compiler
Great tools:
interactive shell and debugger
time and space proﬁlers
code coverage analyser
BSD-licensed, hence suitable for OSS and commercial use

Counting lines

The classic Unix wc command counts the lines in some ﬁles:

$ time wc -l *.fasta
9975 1000-Rn_EST.fasta
14032 chr18.fasta
14005 chr19.fasta
13980 chr20.fasta
42017 chr_all.fasta
94009 total

real 0m0.017s

Breaking the problem down

Subproblems to consider:
Get our command line arguments
Read a ﬁle
Split it into lines
Count the lines
Let’s work through these in reverse order.

Type signatures

Deﬁnition
A type signature describes the type of a Haskell expression:

e : : Double

We read :: as “left has the type right”.
So “e has the type Double”.
Here’s the accompanying deﬁnition:
e = 2.7182818

Type signatures are optional

In Haskell, most type signatures are optional.
The compiler can automatically infer types based on our
usage.
Why write type signatures at all, then?
Mostly as useful documentation to ourselves.

GHC’s interactive interpreter

GHC includes an interactive expression evaluator, ghci.
Run it from a terminal window or command prompt:

$ ghci
GHCi, version 6.8.3: http://www.haskell.org/ghc/
:? for help
Loading package base ... linking ... done.
Prelude>

The Prelude> text is ghci’s prompt.
Type :? at the prompt to get (terse) help.

Basic interaction

Let’s enter some expressions:

Prelude> 2 + 2
4
Prelude> True && False
False

We can ﬁnd out about types:

Prelude> :type True
True :: Bool

Writing a list

Here’s an empty list:

Prelude> []
[]

What do we need to create a longer list?
A value
An existing list
Some glue—the : operator

Prelude> 1:[]
[1]
Prelude> 1:2:[]
[1,2]

Syntactic sugar for lists

What’s the diﬀerence between these?
1:2:[]
[1,2]
Nothing—the latter is purely a notational convenience.

Characters and strings

One character:

Prelude> :type ’a’
’a’ :: Char

A string is a list of characters:

Prelude> ’a’ : ’b’ : []
quot;abquot;

Notation:
Single quotes for one Char
Double quotes for a string (written [Char])

Function application

We apply a function to its arguments by juxtaposition:

Prelude> length [2,4,6]
3
Prelude> take 2 [3,6,9,12]
[3,6]

Why refer to this as application, instead of the more familiar
calling?
Haskell is a non-strict language
The result may not be computed immediately

Lists are inductive

Haskell lists are deﬁned inductively.
A list can be one of two things:
An empty list
A value in front of an existing list
We call our friends [] and : value constructors:
They construct values that have the type “list of something.”

Counting lines

Haskell programmers love abstraction.
We won’t worry about counting lines.
Instead, we’ll count the elements in any kind of list.

The type signature of a function

How do we describe a function that computes the length of a list?
l e n : : [ a ] −> I n t e g e r

The −> notation denotes a function.
The function accepts an [a], and returns an Integer.
What’s an [a]?
A list, whose elements must all be of some type a.

Counting by induction: the base case

An empty list has the length zero.
len [ ] = 0
This is our ﬁrst example of pattern matching.
Our function accepts one argument.
If the argument is an empty list, we return zero.
We call this the base case.

Counting by induction: the inductive case

Let’s see if a list value was created using the : constructor.
len ( x : xs ) = 1 + len xs
If the pattern match succeeds:
The name x is bound to the head of the list.
The name xs is bound to the tail of the list.
The body of the deﬁnition is used as the result.

The complete function

Save this in a ﬁle named Length.hs:
l e n : : [ a ] −> I n t e g e r
len [ ] = 0
len ( x : xs ) = 1 + len xs

Load the ﬁle into ghci
In the same directory, run ghci:

Prelude> :load Length
[1 of 1] Compiling Main ( Length.hs, interprete
Ok, modules loaded: Main.
*Main>

The ghci prompt changes when we load ﬁles.
Let’s try out our function:

*Main> len []
0
*Main> len (1:[])
1
*Main> len [4,5,6]
3

Generating a list from a list

How might we double every other element of a list?
double ( a : b : cs ) = a : b ∗ 2 : double cs
double cs = cs
Save this in a ﬁle named Double.hs.
Load the ﬁle into ghci.
Try the following expressions:
[1..10]
double [1..10]

Your turn: axpy

The classic Linpack function axpy computes a × xi + yi over a
scalar a and each element i of two vectors x and y .
Deﬁne it over two lists of numbers in Haskell.
How do we handle lists of diﬀerent lengths?

Splitting text on line boundaries

Haskell provides a large library of built-in functions, the Prelude.
Here’s the Prelude’s function for splitting text by lines:
l i n e s : : S t r i n g −> [ S t r i n g ]
The type String is a synonym for [Char].
A ghci experiment:

*Main> lines quot;foonbarnquot;
[quot;fooquot;,quot;barquot;]
*Main> len (lines quot;foonbarnquot;)
2

Reading a file

To read a file, we use the Prelude’s readFile function:

*Main> :type readFile
readFile :: FilePath -> IO String

What’s this signature mean?
The FilePath type is just a synonym for String.
The type IO String means here be dragons!
A signature that ends in IO something can have externally
visible side effects.
Here, the side effect is “read the contents of a file”.

Side effects

That innocuous IO in the type is a big deal.
We can tell by its type signature whether a value might have
externally visible effects.
If a type does not include IO, it cannot:
Read files
Make network connections
Launch torpedoes
The ideal is for most code to not have an IO type.

Counting lines in a file

If we invoke code that has side effects, our code must by
implication have side effects too.
c o u n t L i n e s : : F i l e P a t h −> IO I n t e g e r
c o u n t L i n e s p a t h = do
c o n t e n t s <− r e a d F i l e p a t h
return ( len ( l i n e s contents ))
We had to add IO to our type here because we use readFile,
which has side effects.
Add this code to Length.hs.

A few explanations

The <− notation means “perform the action on the right,
and assign the result to the name on the left.”
name <− a c t i o n

The return function takes a pure value, and (here) adds IO to
its type.

Command line arguments

We use getArgs to obtain command line arguments.
import System . E n v i r o n m e n t ( getArgs )
main = do
a r g s <− getArgs
putStrLn ( ” h e l l o , a r g s a r e ” ++ show a r g s )
What’s new here?
The import directive imports the name getArgs from the
System.Environment module.
The ++ operator concatenates two lists.

Pattern matching in an expression

We use case to pattern match inside an expression.
−− Does l i s t c o n t a i n two o r more e l e m e n t s ?
atLeastTwo m y L i s t =
case m y L i s t o f
( a : b : c s ) −> True
−> F a l s e
The expression between case and of is matched in turn against
each pattern, until one matches.

Irrefutable and wild card patterns

A pattern usually matches against a value’s constructors.
In other words, it inspects the structure of the value.
A simple pattern, e.g. a plain name like a, contains no
constructors.
It thus matches any value.

Deﬁnition
A pattern that always matches any value is called irrefutable.
The special wild card pattern is irrefutable, but does not bind a
value to a name.

Tuples

A tuple is a ﬁxed-size collection of values.
Items in a tuple can have diﬀerent types.
Example: (True,”foo”)
This has the type (Bool,String)
Contrast tuples with lists, to see why we’d want both:
A list is a variable-sized collection of values.
Each value in a list must have the same type.
Example: [True, False]

The zip function

What does the zip function do? Adventures in function discovery,
courtesy of ghci:
Start by inspecting its type, using :type.
Try it with one set of inputs.
Then try with another.

Making our program runnable

Add the following code to Length.hs:
main = do
−− E x e r c i s e : g e t t h e command l i n e a r g u m e n t s

l e n g t h s <− mapM c o u n t L i n e s a r g s
mapM p r i n t L e n g t h ( z i p a r g s l e n g t h s )
case a r g s o f
( : : ) −> p r i n t L e n g t h ( ” t o t a l ” , sum l e n g t h s )
−> r e t u r n ( )
Don’t forget to add an import directive at the beginning!

The mapM function

This function applies an action to a list of arguments in turn,
and returns the list of results.
The mapM function is similar, but returns the value (), aka
unit (“nothing”).
The mapM function is useful for the eﬀects it causes, e.g.
printing every element of a list.

Write your own printLength function

Hint: we’ve seen a similar example already, with our getArgs
example.

Compiling your program

It’s easy to compile a program with GHC:

$ ghc --make Length

What does the compiler do?
Looks for a source ﬁle named Length.hs.
Compiles it to native code.
Generates an executable named Length.

Running our program

Here’s an example from my laptop:

$ time ./Length *.fasta
1000-Rn_EST.fasta 9975
chr18.fasta 14032
chr19.fasta 14005
chr20.fasta 13980
chr_all.fasta 42017
total 94009

real 0m1.533s

Oh, no! Look at that performance!
90 times slower than wc

Faster ﬁle processing

Lists are wonderful to work with
But they exact a huge performance toll
The current best-of-breed alternative for ﬁle data:
ByteString

What is a ByteString?

They come in two ﬂavours:
Strict: a single packed array of bytes
Lazy: a list of 64KB strict chunks
Each ﬂavour provides a list-like API.

Retooling our word count program

All we do is add an import and change one function:
import q u a l i f i e d Data . B y t e S t r i n g . Lazy . Char8 a s B

c o u n t L i n e s p a t h = do
c o n t e n t s <− B . r e a d F i l e p a t h
r e t u r n ( l e n g t h (B . l i n e s c o n t e n t s ) )
The “B.” preﬁxes make us pick up the readFile and lines
functions from the bytestring package.

What happens to performance?

Haskell lists: 1.533 seconds
Lazy ByteString: 0.022 seconds
wc command: 0.015 seconds
Given the tiny data set size, C and Haskell are in a dead heat.

When to use ByteStrings?

Any time you deal with binary data
For text, only if you’re sure it’s 8-bit clean
For i18n needs, fast packed Unicode is under development.
Great open source libraries that use ByteStrings:
binary—parsing/generation of binary data
zlib and bzlib—support for popular
compression/decompression formats
attoparsec—parse text-based ﬁles and network protocols

A little bit about JSON

A popular interchange format for structured data: simpler than
XML, and widely supported.
Basic types:
Number
String
Boolean
Null
Derived types:
Object: unordered name/value map
Array: ordered collection of values

JSON at work: Twitter’s search API

From http://search.twitter.com/search.json?q=haskell:

{quot;textquot;: quot;Why Haskell? Easiest way to be productivequot;,
quot;to_user_idquot;: null,
quot;from_userquot;: quot;galoisincquot;,
quot;idquot;: 936114469,
quot;from_user_idquot;: 1633746,
quot;iso_language_codequot;: quot;enquot;,
quot;created_atquot;:quot;Fri, 26 Sep 2008 19:15:35 +0000quot;}

What is a JSString?

We hide the underlying use of a String:
newtype J S S t r i n g = JSONString { f r o m J S S t r i n g : : S

t o J S S t r i n g : : S t r i n g −> J S S t r i n g
t o J S S t r i n g = JSONString
We do the same with JSON objects:
newtype J S O b j e c t a = JSONObject { f r o m J S O b j e c t : : [ (

t o J S O b j e c t : : [ ( S t r i n g , a ) ] −> J S O b j e c t a
t o J S O b j e c t = JSONObject

JSON conversion

In Haskell, we capture type-dependent patterns using typeclasses:
The class of types whose values can be converted to and from
JSON

data R e s u l t a = Ok a | E r r o r S t r i n g

c l a s s JSON a where
readJSON : : J S V a l u e −> R e s u l t a
showJSON : : a −> J S V a l u e

Why JSString, JSObject, and JSArray?

Haskell typeclasses give us an open world:
We can declare a type to be an instance of a class at any time
In fact, we cannot declare the number of instances to be ﬁxed
If we left the String type “naked”, what could happen?
Someone might declare Char to be an instance of JSON
What if someone declared a JSON a =>JSON [a] instance?
This is the overlapping instances problem.

Relaxing the overlapping instances restriction

By default, GHC is conservative:
It rejects overlapping instances outright
We can get it to loosen up a bit via a pragma:
{−# LANGUAGE O v e r l a p p i n g I n s t a n c e s #−}
If it ﬁnds one most speciﬁc instance, it will use it, otherwise bail as
before.

Bool as JSON

Here’s a simple way to declare the Bool type as an instance of the
JSON class:
i n s t a n c e JSON Bool where
showJSON = JSBool

readJSON ( JSBool b ) = Ok b
readJSON = E r r o r ” Bool p a r s e f a i l e d ”
This has a design problem:
We’ve plumbed our Result type straight in
If we want to change its implementation, it will be painful

Hiding the plumbing

A simple (but good enough!) approach to abstraction:
s u c c e s s : : a −> R e s u l t a
s u c c e s s k = Ok k

f a i l u r e : : S t r i n g −> R e s u l t a
f a i l u r e errMsg = E r r o r errMsg
Functions like these are sometimes called “smart constructors”.

Does this aﬀect our code much?

We simply replace the explicit constructors with the functions we
just deﬁned:
i n s t a n c e JSON Bool where
showJSON = JSBool

readJSON ( JSBool b )
= success b
readJSON = f a i l u r e ” Bool p a r s e f a i l e d ”

JSON input and output

We can now convert between normal Haskell values and our JSON
representation. But...
...we still need to be able to transmit this stuﬀ over the wire.
Which is more fun to mull over? Parsing!

A functional view of parsing

Here’s a super-simple perspective:
Take a piece of data (usually a sequence)
Try to apply an interpretation to it
How might we represent this?

A basic type signature for parsing

Take two type variables, i.e. placeholders for types that we’ll
substitute later:
s—the state (data) we want to parse
a—the type of its interpretation
We get this generic type signature:
s −> a
Let’s make the task more concrete:
Parse a String as an Int

S t r i n g −> I n t
What’s missing?

Parsing as state transformation

After we’ve parsed one Int, we might have more data in our
String that we want to parse.
How to represent this? Return the transformed state and the result
in a tuple.
s −> ( a , s )
We accept an input state of type s, and return a transformed
state, also of type s.

Parsing is composable

Let’s give integer parsing a name:
p a r s e D i g i t : : S t r i n g −> ( I n t , S t r i n g )
How might we want to parse two digits?
p a r s e T w o D i g i t s : : S t r i n g −> ( ( I n t , I n t ) , S t r i n g )
parseTwoDigits s =
let ( i , t ) = parseDigit s
( j , u) = parseDigit t
in (( i , j ) , u)

Chaining parses more tidily

It’s not good to represent the guts of our state explicitly using
pairs:
Tying ourselves to an implementation eliminates wiggle room.
Here’s an alternative approach.
newtype S t a t e s a = S t a t e {
r u n S t a t e : : s −> ( a , s )
}

A newline declaration hides our implementation. It has no
runtime cost.
The runState function is a deconstructor: it exposes the
underlying value.

Chaining parses

Given a function that produces a result and a new state, we can
“chain up” another function that accepts its result.
c h a i n S t a t e s : : S t a t e s a −> ( a −> S t a t e s b ) −> S t a
c h a i n S t a t e s m k = State chainFunc
where c h a i n F u n c s =
let (a , t ) = runState m s
in runState (k a) t
Notice that the result type is compatible with the input:
We can chain uses of chainStates!

Injecting a pure value

We’ll often want to leave the current state untouched, but inject a
normal value that we can use when chaining.
p u r e S t a t e : : a −> S t a t e s a
p u r e S t a t e a = S t a t e $ s −> ( a , s )

What about computations that might fail?

Try these in in ghci:

Prelude> head [1,2,3]
1
Prelude> head []

What gets printed in the second case?

One approach to potential failure

The Prelude deﬁnes this handy standard type:
data Maybe a = Just a
| Nothing
We can use it as follows:
s a f e H e a d ( x : ) = Just x
safeHead [ ] = Nothing
Save this in a source ﬁle, load it into ghci, and try it out.

Some familiar operations

We can chain Maybe values:
c h a i n M a y b e s : : Maybe a −> ( a −> Maybe b )
−> Maybe b
c h a i n M a y b e s Nothing k = Nothing
c h a i n M a y b e s ( Just x ) k = k x
This gives us short circuiting if any computation in a chain fails:
Maybe is the Ur-exception.
We can also inject a pure value into a Maybe-typed computation:
pureMaybe : : a −> Maybe a
pureMaybe x = Just x

What do these types have in common?

Chaining:
chainMaybes : : Maybe a −> ( a −> Maybe b )
−> Maybe b
chainStates : : State s a −> ( a −> S t a t e s b )
−> State s b
Injection of a pure value:
p u r e S t a t e : : a −> S t a t e s a
pureMaybe : : a −> Maybe a

Abstract away the type constructors, and these have identical
types!

Monads

More type-related pattern capture, courtesy of typeclasses:
c l a s s Monad m where
−− c h a i n
(>>=) : : m a −> ( a −> m b ) −> m b

−− i n j e c t a p u r e v a l u e
r e t u r n : : a −> m a

Instances

When a type is an instance of a typeclass, it supplies particular
implementations of the typeclass’s functions:
i n s t a n c e Monad Maybe where
(>>=) = c h a i n M a y b e s
r e t u r n = pureMaybe

i n s t a n c e Monad ( S t a t e s ) where
(>>=) = c h a i n S t a t e s
return = pureState

Chaining with monads

Using the methods of the Monad typeclass:
parseThreeDigits =
p a r s e D i g i t >>= a −>
p a r s e D i g i t >>= b −>
p a r s e D i g i t >>= c −>
return (a , b , c )
Syntactically sugared with do-notation:
p a r s e T h r e e D i g i t s = do
a <− p a r s e D i g i t
b <− p a r s e D i g i t
c <− p a r s e D i g i t
return (a , b , c )
This now looks suspiciously like imperative code.

Haven’t we forgotten something?

What happens if we want to parse a digit out of a string that
doesn’t contain any?
We’d like to “break the chain” if a parse fails.
We have this nice Maybe type for representing failure.
Alas, we can’t combine the Maybe monad with the State monad.
Diﬀerent monads do not combine.

But this is awful! Don’t we need lots of boilerplate?

Are we condemned to a world of numerous slightly tweaked custom
monads?
We can adapt the behaviour of an underlying monad.
newtype MaybeT m a = MaybeT {
runMaybeT : : m (Maybe a )
}

Can we inject a pure value?

pureMaybeT : : (Monad m) = a −> MaybeT m a
>
pureMaybeT a = MaybeT ( r e t u r n ( Just a ) )

Can we write a chaining function?

chainMaybeTs : : (Monad m) = MaybeT m a −> ( a −> Ma
>
−> MaybeT m b

x ‘ chainMaybeTs ‘ f = MaybeT $ do
unwrapped <− runMaybeT x
case unwrapped o f
Nothing −> r e t u r n Nothing
Just y −> runMaybeT ( f y )

Making a Monad instance

Given an underlying monad, we can stack a MaybeT on top of it
and get a new monad.
i n s t a n c e (Monad m) = Monad ( MaybeT m) where
>
(>>=) = chainMaybeTs
r e t u r n = pureMaybeT

A custom monad in 2 lines of code

A parsing type that can short-circuit:
{−# LANGUAGE G e n e r a l i z e d N e w t y p e D e r i v i n g #−}

newtype MyParser a = MyP ( MaybeT ( S t a t e S t r i n g ) a )
d e r i v i n g (Monad , MonadState S t r i n g )
We use a GHC extension to automatically generate instances of
non-H98 typeclasses:
Monad
MonadState String

What is MonadState?

The State monad is parameterised over its underlying state, as
State s:
It knows nothing about the state, and cannot manipulate it.
Instead, it implements an interface that lets us query and modify
the state ourselves:
c l a s s (Monad m) = MonadState s m
>
−− q u e r y t h e c u r r e n t s t a t e
get : : m s

−− r e p l a c e t h e s t a t e w i t h a new one
p u t : : s −> m ( )

Parsing text

In essence:
Get the current state, modify it, put the new state back.
What do we do on failure?
s t r i n g : : S t r i n g −> MyParser ( )
s t r i n g s t r = do
s <− g e t
l e t ( hd , t l ) = s p l i t A t ( l e n g t h s t r ) s
i f s t r == hd
then p u t t l
e l s e f a i l $ ” f a i l e d t o match ” ++ show s t r

Shipment of fail

We’ve carefully hidden fail so far. Why?
Many monads have a very bad deﬁnition: error.
What’s the problem with error?
It throws an exception that we can’t catch in pure code.
It’s only safe to use in catastrophic cases.

Non-catastrophic failure

A bread-and-butter activity in parsing is lookahead:
Inspect the input stream and see what to do next
JSON example:
An object begins with “{”
An array begins with “[”
We look at the next input token to ﬁgure out what to do.
If we fail to match “{”, it’s not an error.
We just try “[” instead.

Giving ourselves alternatives

We have two conﬂicting goals:
We like to keep our implementation options open.
Whether fail crashes depends on the underlying monad.
We need a safer, abstract way to fail.

MonadPlus

A typeclass with two methods:
c l a s s Monad m = MonadPlus m where
>
−− non−f a t a l f a i l u r e
mzero : : m a

−− i f t h e f i r s t a c t i o n f a i l s ,
−− p e r f o r m t h e s e c o n d i n s t e a d
mplus : : m a −> m a −> m a
To upgrade our code, we replace our use of fail with mzero.

Writing a MonadZero instance

We can easily make any stack of MaybeT atop another monad a
MonadPlus:
i n s t a n c e Monad m = MonadPlus ( MaybeT m) where
>
mzero = MaybeT $ r e t u r n Nothing

a ‘ mplus ‘ b = MaybeT $ do
r e s u l t <− runMaybeT a
case r e s u l t o f
Just k −> r e t u r n ( Just k )
Nothing −> runMaybeT b
We simply add MonadPlus to the list of typeclasses we ask GHC
to automatically derive for us.

Using MonadPlus

Given functions that know how to parse bits of JSON:
p a r s e O b j e c t : : MyParser [ ( S t r i n g , J S V a l u e ) ]
p a r s e A r r a y : : MyParser [ J S V a l u e ]
We can turn them into a coherent whole:
parseJSON : : MyParser J S V a l u e
parseJSON =
( p a r s e O b j e c t >>= o −> r e t u r n ( J S O b j e c t o ) )
‘ mplus ‘
( p a r s e A r r a y >>= a −> r e t u r n ( J S A r r a y a ) )
‘ mplus ‘
...

The problem of boilerplate

Here’s a repeated pattern from our parser:
f o o >>= x −> r e t u r n ( b a r x )
These brief uses of variables, >>=, and return are redundant and
burdensome.
In fact, this pattern of applying a pure function to a monadic result
is ubiquitous.

Boilerplate removal via lifting

We replace this boilerplate with liftM:
l i f t M : : Monad m = ( a −> b ) −> m a −> m b
>
We refer to this as lifting a pure function into the monad.
parseJSON =
( JSObject ‘ liftM ‘ parseObject )
‘ mplus ‘
( JSArray ‘ liftM ‘ parseArray )
This style of programming looks less imperative, and more
applicative.

The Parsec library

Our motivation so far:
Show you that it’s really easy to build a monadic parsing
library
But we must concede:
Maybe you simply want to parse stuﬀ
Instead of rolling your own, use Daan Leijen’s Parsec library.

What to expect from Parsec

It has some great advantages:
A complete, concise EDSL for building parsers
Easy to learn
Produces useful error messages
But it’s not perfect:
Strict, so cannot parsing huge streams incrementally
Based on String, hence slow
Accepts, and chokes on, left-recursive grammars

Parsing a JSON string

An example of Parsec’s concision:
j s o n S t r i n g = between ( c h a r ’ ” ’ ) ( c h a r ’ ” ’ )
( many j s o n C h a r )
Some parsing combinators explained:
between matches its 1st argument, then its 3rd, then its 2nd
many runs a parser until it fails
It returns a list of parse results

Parsing a character within a string

j s o n C h a r = c h a r ’ ’ >> ( p e s c <|> p u n i )
<|> s a t i s f y ( ‘ notElem ‘ ” ” ” )
Between quotes, jsonChar matches a string’s body:
A backslash must be followed by an escape (“n”) or Unicode
(“u2fbe” )
Any other character except “” or “”” is okay
More combinator notes:
The >> combinator is like >>=, but provides only
sequencing, not binding
The satisfy combinator uses a pure predicate.

Your turn!

Write a parser for numbers. Here are some pieces you’ll need:
import Numeric ( readFloat , readSigned )
import Text . P a r s e r C o m b i n a t o r s . P a r s e c
import C o n t r o l . Monad ( mzero )
Other functions you’ll need:
getInput
setInput
The type of your parser should look like this:
parseNumber : : C h a r P a r s e r ( ) R a t i o n a l

Experimenting with your parser

Simply load your code into ghci, and start playing:

Prelude> :load MyParser
*Main> parseTest parseNumber quot;3.14159quot;

My number parser

parseNumber = do
s <− g e t I n p u t
case readSigned r e a d F l o a t s o f
[ ( n , s ’ ) ] −> s e t I n p u t s ’ >> r e t u r n n
−> mzero
<?> ” number ”

Using JSON in Haskell

A good JSON package is already available from Hackage:
http://tinyurl.com/hs-json
The module is named Text.JSON
Doesn’t use overlapping instances

Part 3

This was going to be a concurrent web application, but I ran out
of time.
It’s still going to be informative and fun!

Concurrent programming

The dominant programming model:
Shared-state threads
Locks for synchronization
Condition variables for notiﬁcation

The prehistory of threads

Invented independently at least 3 times, circa 1965:
Dijkstra
Berkeley Timesharing System
PL/I’s CALL XXX (A, B) TASK;
Alas, the model has barely changed in almost half a century.

What does threading involve?

Threads are a simple extension to sequential programming.
All that we lose are the following:
Understandability,
Predictability, and
Correctness

Concurrent Haskell

Introduced in 1996, inspired by Id.
Provides a forkIO action to create threads.
The MVar type is the communication primitive:
Atomically modiﬁable single-slot container
Provides get and put operations
An empty MVar blocks on get
A full MVar blocks on put
We can use MVars to build locks, semaphores, etc.

What’s wrong with MVars?

MVars are no safer than the concurrency primitives of other
languages.
Deadlocks
Data corruption
Race conditions
Higher order programming and phantom typing can help, but only
a little.

The fundamental problem

Given two correct concurrent program fragments:
We cannot compose another correct concurrent fragment
from them without great care.

Message passing is no panacea

It brings its own diﬃculties:
The programming model is demanding.
Deadlock avoidance is hard.
Debugging is really tough.
Don’t forget coherence, scaling, atomicity, ...

Lock-free data structures

A focus of much research in the 1990s.
Modus operandi: ﬁnd a new lock-free algorithm, earn a PhD.
Tremendously diﬃcult to get the code right.
Neither a scalable or sustainable approach!
This inspired research into hardware support, followed by:
Software transactional memory

Software transactional memory

The model is loosely similar to database programming:
Start a transaction.
Do lots of work.
Either all changes succeed atomically...
...Or they all abort, again atomically.
An aborted transaction is usually restarted.

The perils of STM

STM code needs to be careful:
Transactional code must not perform non-transactional
actions.
On abort-and-restart, there’s no way to roll back
dropNukes()!
In traditional languages, this is unenforceable.
Programmers can innocently cause serious, hard-to-ﬁnd bugs.
Some hacks exist to help, e.g. tm callable annotations.

STM in Haskell

In Haskell, the type system solves this problem for us.
Recall that I/O actions have IO in their type signatures.
STM actions have STM in their type signatures, but not IO.
The type system statically prevents STM code from
performing non-transactional actions!

Firing up a transaction

As usual, we can explore APIs in ghci.
The atomically action launches a transaction:

Prelude> :m +Control.Concurrent.STM

Prelude Control.Concurrent.STM> :type atomically
atomically :: STM a -> IO a

Let’s build a game—World of Haskellcraft

Our players love to have possessions.
data I t e m = S c r o l l | Wand | Banjo
d e r i v i n g ( Eq , Ord , Show)

−− i n v e n t o r y
data I n v = I n v {
i n v I t e m s : : [ Item ] ,
invCapacity : : Int
} d e r i v i n g ( Eq , Ord , Show)

Inventory manipulation

Here’s how we set up mutable player inventory:
import C o n t r o l . C o n c u r r e n t .STM

type I n v e n t o r y = TVar I n v

n e w I n v e n t o r y : : I n t −> IO I n v e n t o r y
n e w I n v e n t o r y cap =
newTVarIO I n v { i n v I t e m s = [ ] ,
i n v C a p a c i t y = cap }
The use of curly braces is called record syntax.

Inventory manipulation

Here’s how we can add an item to a player’s inventory:
a d d I t e m : : I t e m −> I n v e n t o r y −> STM ( )

a d d I t e m i t e m i n v = do
i <− readTVar i n v
writeTVar inv i {
i n v I t e m s = item : i n v I t e m s i
}
But wait a second:
What about an inventory’s capacity?
We don’t want our players to have inﬁnitely deep pockets!

Checking capacity

GHC deﬁnes a retry action that will abort and restart a
transaction if it cannot succeed:
i s F u l l : : I n v −> Bool
i s F u l l ( I n v i t e m s cap ) = l e n g t h i t e m s == cap

a d d I t e m i t e m i n v = do
when ( i s F u l l i )
retry
writeTVar inv i {
}

Let’s try it out

Save the code in a ﬁle, and ﬁre up ghci:

*Main> i <- newInventory 3
*Main> atomically (addItem Wand i)
*Main> atomically (readTVar i)
Inv {invItems = [Wand], invCapacity = 3}

What happens if you repeat the addItem a few more times?

How does retry work?

In principle, all the runtime has to do is retry the transaction
immediately, and spin tightly until it succeeds.
This might be correct, but it’s wasteful.
What happens instead?
The RTS tracks each mutable variable touched during a
transaction.
On retry, it blocks the transaction until at least one of those
variables is modiﬁed.
We haven’t told GHC what variables to wait on: it does this
automatically!

Your turn!

Write a function that removes an item from a player’s inventory:
r e m o v e I t e m : : I t e m −> I n v e n t o r y −> STM ( )

My item removal action

r e m o v e I t e m i t e m i n v = do
case break (==i t e m ) ( i n v I t e m s i ) o f
( ,[]) −> r e t r y
( h , ( : t ) ) −> w r i t e T V a r i n v i {
i n v I t e m s = h ++ t
}

Your turn again!

Write an action that lets us give an item from one player to
another:
g i v e I t e m : : I t e m −> I n v e n t o r y −> I n v e n t o r y
−> STM ( )

My solution

g i v e I t e m i t e m a b = do
removeItem item a
addItem item b

What about that blocking?

If we’re writing a game, we don’t want to block forever if a player’s
inventory is full or empty.
We’d like to say “you can’t do that right now”.

One approach to immediate failure

Let’s call this the C programmer’s approach:
a d d I t e m 1 : : I t e m −> TVar I n v −> STM Bool
a d d I t e m 1 i t e m i n v = do
if isFull i
then r e t u r n F a l s e
e l s e do
writeTVar inv i {
}
r e t u r n True

What is the cost of this approach?

If we have to check our results everywhere:
The need for checking will spread
Sadness will ensue

The Haskeller’s ﬁrst loves

We have some fondly held principles:
Abstraction
Composability
Higher-order programming
How can we apply these here?

A more abstract approach

It turns out that the STM monad is a MonadPlus instance:
i m m e d i a t e l y : : STM a −> STM (Maybe a )
immediately act =
( Just ‘ l i f t M ‘ a c t ) ‘ mplus ‘ r e t u r n Nothing

What does mplus do in STM?

This combinator is deﬁned as orElse :
o r E l s e : : STM a −> STM a −> STM a
Given two transactions j and k:
If transaction j must abort, perform transaction k instead.

A complicated speciﬁcation

We now have all the pieces we need to:
Atomically give an item from one player to another.
Fail immediately if the giver does not have it, or the recipient
cannot accept it.
Convert the result to a Bool.

Compositionality for the win

Here’s how we glue the whole lot together:
import Data . Maybe ( i s J u s t )

giveItemNow : : I t e m −> I n v e n t o r y −> I n v e n t o r y
−> IO Bool
giveItemNow i t e m a b =
liftM isJust . atomically . immediately $
r e m o v e I t e m i t e m a >> a d d I t e m i t e m b
Even better, we can do all of this as nearly a one-liner!

Thank you!

I hope you found this tutorial useful!
Slide source available:
http://tinyurl.com/defun08

DEFUN 2008 - Real World Haskell

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a DEFUN 2008 - Real World Haskell

Semelhante a DEFUN 2008 - Real World Haskell (20)

Mais de Bryan O'Sullivan

Mais de Bryan O'Sullivan (6)

Último

Último (20)

DEFUN 2008 - Real World Haskell