Tuesday, December 31, 2013

8600.5

Flooded again

Standing water

While we have had the occasional day of heavy rain, with a little bit of waterlogging -- though nothing quite as bad as last year,

Pines on the horizon

Bright Winter's Day

thanks to the continuing fair weather -- including a lovely few days at the start of the holidays -- I more than made all the mileage goals I'd set myself. On my bike that was 3008 miles, plus 433 on hired bikes for a total of 3441 miles; not quite the equivalent of commuting to work every other day.

Maybe I should try for the 10 miles a day average next year.



Monday, December 30, 2013

An introduction to Functional Programming in F# -- Part 6: Continuations and Monads

Hang onto your hats! This final part is where the ride gets wild.


Topics covered in the series

Alas, despite having trailed generative programming, I find I don't have any notes from the seminar series covering that topic explicitly (it was not one I was presenting). If I find something, I'll add it in as an extra post.


Continuations

This style of programming originally arose in handling data streams in a fashion using lazy evaluation, and uses closures to represent each step of the program. I/O is made available as a pair of lazy lists, a request list and a response list, and the program is a function which takes a response list to a request list:

For example:

The hidden causality here is that putting the Getq at the head of the request queue causes the Getp c to exist at the head of the response queue (perhaps by reading the keyboard).


In general, the continuation style uses chained closures to construct a complex compound operation from a collection of small basic operations. We introduce the types

Our building blocks will mostly be producers and consumers, and others that are neither:

  • Where possible, the continuations are built during the construction phase.
  • Each step is provided with the rest of the execution as a continuation building block(s).
  • A producer is provided with a consumer which constructs a continuation from the produced data at execution time.

Example -- Console I/O

First we define some primitives:

where we introduce the standard convenience function

to avoid extra parentheses.

We can then define a compound operation, for example:


Example -- The Lexer revisited

The type constructor lexerResult describes the values that the lexer is going to produce. Note that the lexer is going to be a producer continuation.

The type constructor lexerSystem provides us with an I/O abstraction.

Given that the lexer is going to be a producer, we will need a consumer to complete its construction.

To allow for lexical analysis failures, we also have a building block for that.

Note that although it looks like a consumer, in practice we always know the message to use at construction time.

Function lexGetChar builds our most primitive producer continuation.
The consumer (g) constructs a continuation from the character option read from the input.

Similarly, lexPutLexeme builds our most primitive consumer, which writes a lexeme to the output.

In order to make decisions based on our inputs we need to express a conditional branch.

and similarly we want to be able to express repetition.


Thus equipped we can start to write the lexer.


Exercise: Complete the lexer.


Summary

We have used closures as continuations to construct functions which compose using function application to create more complex functions. The end result is a complex closure, which is itself just a function which, when applied to the unit value “()”, is the program we want, with any side effects of the constituent closures. The main problem that we, as programmers, are likely to feel with this approach is that the process of construction is counter-intuitive, due to the back to front construction style, given that each step of the closure takes an argument representing the rest of the program, thus

As Eric Lippert put it (in the context of writing asynchronous code in this style in C#):

Remember what I was saying about the pros and cons of CPS?

  • PRO: Arbitrarily complex and interesting control flows can be built out of simple parts – check.
  • CON: The reification of control flow via continuations is hard to read and hard to reason about – check.
  • CON: The code that represents the mechanisms of control flow completely overwhelms the meaning of the code – check.
  • CON: The transformation of ordinary code control flow into CPS is the kind of thing that compilers are good at, and almost no one else is – check.

It is important to note, however, that this is not the only use of continuations. We regularly use continuations as event handlers, and they can also be used to search a set of alternatives, by effectively constructing a set of programs to be tried one after another until one does not fail provides a result, or all of them fail.

Used appropriately, this is a powerful technique.



Monads

Despite the scary name, monads are constructs that arise quite naturally in functional programming; typically as a result of trying to express the OO notion of a separation of concerns, here between individual functions doing localized transformations, and the larger control flow of the program.

A monad is a type the values of which encapsulate a piece of a program in much the same way as a continuation does. There are, however, differences in the way the values work and the way they compose, which fits into the functional programming style more naturally. We will, for the moment, write a monad as a type M t where t is a type which values of the monad produce when “run”, except when the run fails (in which case no value is produced). As with continuation programming, a program is constructed by the composition of monads and functions which produce monads. The basic constructions used are:


Understanding “bind”

The “bind” combinator is often written as the infix operator >>=. This begins to make sense if we use a program layout thus:

We read this as binding value1 to the output(s) of m, using this to produce the monad E1[value1], and then binding value2 to the outputs of that, constructing the monad E2[value2], and so on. A language with monads built in can even provide syntactic sugar for this, such as:

Unfortunately the F# type system is not expressive enough for us to write a general >>= along the lines of

so for the moment we will write >>= but mean the comparable operator specialised to the monad of interest.

Derived combinator cases

A monad type M unit is special (think of it as pure side-effect). With such a type there little point in binding the result values – we know that we will only get ():

A monad which produces a function or functions is also interesting:

Given a function we can construct a function over monads:

Exercise: Prove that Lift f = Apply (Return f)

Given two functions that return monads, we can combine them:

A monad which produces a monad or monads can be flattened (lose a level of monad):

The name join makes sense when the monad involved produces multiple values – the result monad produces all of the values produced by all of the monads produced by mm.

Given a list of monads we can construct a monad of lists:


An example of a monad

We introduced the notion of a monad with the description that it handles things like separation of concerns. One very familiar language construct that we have been using from the very first session can be written in monadic form; a 'a list is just another monad 'a:

Another pure monad construction is Option.

which allows us to chain together a series of operations that might fail, and bypass the calls made after any failure.

Exercise: Construct a pure monad based on trees.


An I/O Monad

These pure monads do not really have a notion of execution. We can use closures again to construct a monad which itself carries out some side-effecting computation and returns a value. As we did for continuations, we will wrap the closures up using a data type (as before this is to highlight the construction and can be elided):

The principal functions for this are:

The construction for bind (>>=) deliberately constructs a wrapped closure. At first sight it looks the same as f(runIO m), which has the same type, but we need to make sure that the sub-term runIO m is evaluated when the result of the bind is run rather than when it is constructed. As all of the other combinators are defined in terms of these three, we won’t need to worry about evaluation order for them. The issue will arise, however, for most special primitives, as we will want side effects to occur at the right time.

To read and write characters we can use:

We can, of course, string a lot of these together using the derived combinators >>= and >>. The result does nothing until run. For example:

We can also define read and write for strings:

Exercise: write PutS.


Computation expressions

F# does provide a form of syntactic sugar that allows us to write monadic code, and continuation passing, in a form that is actually comprehensible to the programmer.

Consider the Option monad. If we introduce a singleton type

where Delay has the effect (as in the >>= operation for our I/O monad) of delaying the execution of the operation until the appropriate point in the sequence, we can write the monadic code using chains of >>=? combinators as a sequence of assignment-like let! expressions.

where f, g, h are functions taking plain values to option values.

The actual compiled form of the let! binding

is in the continuation form

This not only allows us to write monadic code in something close to the syntactic sugar form as alluded to above, but also to express naturally continuation based code (e.g. asynchronous operations) in a sequential style (indeed the async computation builder is a standard part of the library for exactly this purpose).

We can also express the equivalence of the two notations by building a monadic form from a computation expression, such as the async computation builder:


Concluding Remarks

Now, like Wittgenstein's Ladder once you have grokked the content of this course, you should be able to discard the often inefficient implementations used for simplicity (such as avoiding cluttering the example code with reversing cons'd lists), and be confident in using a functional style where complex programs can be built from simple pure elements composed together.

As to the F# language itself, we have barely skimmed the surface -- material like active patterns, quotations (the missing last lecture), the standard collections modules, reactive events and more still remain untouched. But in my experience, once the initial hurdle of the programming style has been crossed, picking those up when you realise you need them is the easy part; whereas before the "Aha!" moment the motivation and need for such features will remain mostly obscure, hidden behind the dilemma of "how do you program when your variables don't?" .

Exercise: Test your understanding by reworking the course material for another language of your choice. Note that for some functional languages, parts will not be applicable as they stand (e.g. neither assignment nor the shared-state queue will be expressible in Erlang); and that for many languages pattern based dispatch and infix notation will not be available.



Sunday, December 29, 2013

An introduction to Functional Programming in F# -- Part 5: Objects and Exceptions

This session marks a pause for breath between the introductory parts where we concentrated on doing interesting work with mostly simple immutable lists and with algebraic datatypes, and the last part which will be all heavy lifting, to look at other ways of representing compound data, and how the functional world interacts with the more familiar parts of the .net framework. Also, this week, no exercises either.

I'm keeping this part in mainly for completeness' sake, as a transcription of the original seminar material from '08; rather than for any great insights it offers. In particular, unlike the previous sessions there won't be any of the two steps to comprehend -- first, how the program works, and then with increasing familiarity, why you wouldn't actually implement things in quite that way in a real program. It'll be more like a shopping list instead.


Topics covered in the series


Assignment

Actually we covered assignment and mutability last time.


Tuples

We used tuples -- collections of a fixed number of items, defined by the types of each item, way back in the first session. Recall

The value returned by this function is a 2-tuple of the Bits type -- written Bits * Bits; but in general the types in a tuple will be heterogeneous.

Tuples are usually best employed as ad hoc datastructures; while there are standard functions

for extracting members of a 2-tuple, getting at parts of longer tuples will involve destructuring pattern matching.


Records

A record is more like a class -- or, really, a 'C'-style struct -- in having a series of named fields. We have actually used records in their simplest form, in some of our previous examples

because functions are just values, we can put them into a record just as well as other types. We can be more object like if we mix scalar and function values in a record:

which defines a 2D point type with a displacement-vector behaviour.

Records themselves are immutable, so we can only create new records based off old ones e.g.

There is a copy but replace one or more fields syntax

but this of course violates the contract that the move function displaces its target by the x,y value of the point. However, types can have methods

and with the move member working on the current rather than creation time value of the coordinates, we have a class-like structure, albeit with public fields. And most of the time this is sufficient, if the type itself is not exposed.


Objects and interfaces

When interfacing with .net libraries, we need to define actual objects or interfaces. Our previous point behaviour can be expressed as an interface

which we can implement as

Note that F# syntax requires us to explicitly implement the interface; as a consequence, we also need to cast our concrete implementation explicitly to the interface type when we wish to refer to it as such.

There is an alternative syntax which removes the explicit new member as a constructor, and instead makes the class body act as a constructor function:

The two are similar, but not quite the same; for this second example vx and vy are not (as the result of pasting the code into the interactive prompt shows) instance variables, as they were before, but are in locals in the constructor function.

Alternatively, we can define an abstract base class, rather than an interface. This requires an attribute annotation, rather than a keyword:

Note that this time we don't have to cast anything to the base class in order to invoke the move method. A Point' is an AbstractPoint.


Exceptions

These have the sort of behaviour that we are familiar with

  • Exceptions do not affect the types of functions that throw them.
  • Exceptions “propagate up the call stack” until a handler is found (or to top level).

There are two built-in exception mechanisms

or we can just raise a standard exception type

New exception types can be declared by subclassing, or by ML syntax

Exceptions are handled much like they are in C#/C++/Java...

where the with construct actually allows pattern matching over the exception (so can be the equivalent of multiple catch clauses)

There is a related construct; a finally clause can also be used with try as in C#, to add some code which is always executed:

though there is no portmanteau try ... catch ... finally as there is in C#.

Note that because we are in the .net framework here, exceptions are expensive operations, unlike in OCAML, where they are sufficiently cheap as to be a standard control flow construct.

Friday, December 20, 2013

An introduction to Functional Programming in F# -- Part 4: Concurrency and Networking

So far we have dealt with entirely pure functions and immutable data, where the program result might just as well be generated at compile-time, like a C++ template meta-program. This time we look at violating these constraints.

As has been the case for the previous posts, this discussion focuses on the principles of functional style and how to achieve the results we want from first principles, rather than going directly to library code, especially when introducing new concepts.

Topics covered in the series


Assignment

The type of variable that one is familiar with in languages like C# or Java is a reference -- the variable name refers to a box that contains a value. F# has an explicit form or type 'a ref to denote a box that can hold values that are of type 'a, e.g.

where the integer zero is an immutable value all of its own.

We get the value out of the box by the function

We change the value in the box by the function

F# also offers the modifier mutable and the operator <-; a mutable value needs no special treatment to be read, but update is done using the <- operator. Using this we can emulate 'a ref as a record type (as we used to hold computations last time)

We will favour references for the rest of the post, as their mutability is explicit at all sites of use.


Local Persistent State

We can use references to maintain persistent state within program, and encapsulate it, without needing to yet reach for the explicitly object-oriented features of the language. Consider this function:

the function gen produces a generator function that uses a private variable to track the number of times it is called, so producing f 1, then f 2, as in

where successive calls produce 1, 4, 9, 16...


Sharing a Reference Variable

If we have the following functions

to extract the head and the tail of a list, we can model a queue using shared mutable state as

Here stream takes an initializing list, and returns functions add an item, remove an item and test for exhaustion of the queue.

Exercise: This is a naive implementation -- the behaviour of put is linear in the number of elements in the queue. Immutable queues are implemented as pairs of lists with occasional list reversals leading to amortized constant time to flow an item through the queue, and a similar internal representation could be used here. Rewrite stream to use such an implementation.


Simple concurrency

The problem with shared state as in our queue example above comes when multiple concurrent operations attempt to modify the state, rendering its output, or worse, its actual internal state, inconsistent. So, using the familiar thread and lock model for concurrency, we can introduce:

for thread creation, and

to manage locks.

Aside: we could use other concurrency APIs to initiate execution, e.g. Task; what is important here is that there is the possibility for concurrent access to any shared mutable state.

These operations block indefinitely, so prevent data races while offering the opportunity for deadlock instead.


A Concurrent Queue

With the lock operations, we can modify our queue to use a lock to manage access to the shared state, placing a write lock around the mutations and a read lock around the other accesses:

We also introduce the standard type 'a option which has the form

to signal the empty state in a more uniform way than having to test the previous isEmpty function.

Exercise: Add locking to the result of the previous exercise .

Aside: This style of code is of course not safe against exceptions that unwind the stack while a lock is being held.


Asynchronous Composition

We can connect two functions, producer and consumer, by a stream thus:

which will run printing squares until you halt the environment.


Functional networking

We can develop this idea further with the following concepts:

  • It’s asynchronous composition on different computers or between different processes on the same computer.
  • Make byte streams look like lists.
  • Build about the simple idea of null terminates strings as messages.

Depending how we make the connections, we can either model simple one-way messaging (or asynchronous messaging if each counter-party transmits on its own schedule), or request/response (or synchronous messaging).

Aside: This approach is akin to that used by Erlang, though this is only a very first approximation to the Erlang model.


Developing the network model

Taking advantage of the .net library methods that provide a stream-oriented view on a connection, we can write helper methods

and make the interconversion with lists as


A simple echo server and client

Using the same familiar techniques, we can build up string to byte-list inter-conversions, and put together functions to put a string onto a TCP connection and then read it off again. It's then a simple matter of calling the appropriate network APIs to open a listening socket and make connections for the duration of a request and response.

The example code at the end of the listing below spins the server code on a separate thread, allowing the client to execute in the same interactive session.

Wednesday, December 18, 2013

An introduction to Functional Programming in F# -- Part 3: Interpreters

Last time, we developed a parser for a simple LISP style language; now we develop from the abstract syntax tree the mechanism whereby we can actually evaluate expressions (programs) written in it.

Topics covered in the series


A simple language

We define the behaviour of our language on top of the abstract syntax thus:


Expr =Example
(LET list_of_dotted_pairs EXPR) |(Let ((x . 1) (y. 2)) (Add x y))
(IF EXPR EXPR EXPR) |(IF (EQ x y) x (CONS x y))
(LAMBDA list_of_variables EXPR) |(LAMBDA (x y) (ADD x y 1))
(REC variable list_of_variables EXPR) |(REC F (N) (IF (EQ x 0) 1 (MULT N (F (SUB N 1)))))
(QUOTE EXPR) |(QUOTE (1 2 3))
(EVAL EXPR) |(EVAL (QUOTE (ADD 1 2)))
(EVAL EXPR environment_expression) |(EVAL (QUOTE (ADD X 1)) (QUOTE ((X . 2))))
(EXPR EXPR*) |(ADD (LENGTH L) (LENGTH J) 3)
SYMBOL |X
NUMBER 3

where the coloured background denotes special forms (keyword related constructs) in the language, and the rest are call-by-value expressions. The lower-case descriptive names represent the following expansions

list_of_dotted_pairs = ((EXPR . EXPR)*)
list_of_variables = ( SYMBOL*)
variable = SYMBOL
environment_expression = EXPR

In code, we extend the algebraic type we used last time with the notion of a function

in the form of the COMP type; and modify the value of a SYMBOL to be a string.

Aside: the value of a COMP type is a record, a series of named fields gathered together. In this case there is one field, and that has a value that is a function mapping one SEXPR to another.

The Structure of the Evaluator

In pseudocode terms, given an expression as above, an interpreter would evaluate it with a recursive function eval that looks like this:

To eval an expression e in an environment (a construct where symbol definitions live) env

  • if e is NIL then NIL
  • else if e is a number n then n
  • else if e is a SYMBOL then look it up in the environment
  • else if e is a special form then evaluate special form
  • else if e is of form (expr1 expr2 ... exprN) then (eval expr1 env) (eval expr2 env) ... (eval exprN env)
  • else ERROR

The Structure of the Environment

We introduced the notion of an environment as where symbol definitions live; we express it as a look-up table from symbol to value like this (where we continue to use the list as our standard functional data-structure):

so that structurally the environment is an SEXPR which itself is a list of pairs of SYMBOL * SEXPR.

In code, the evaluator looks like this


Note: Unlike our previous examples this doesn't just paste straight into the interpreter and work, as it has dangling references. The full set of new source appears at the end of this post.

The function starts out with the call by value cases from our pseudo-code; their behaviour should be obvious, given a suitable definition of lookup that walks the environment and returns the matching SEXPR or a suitable default if not found.


The Special forms

Our special form IF works like

where we have made the design decision that all values are "truthy" except NIL. We have to match on our predicate because, as an implementation detail, we can't test functions for equality.

Our special form LET introduces functions applySnd which takes the list of dotted pairs represented as SEXPR * SEXPR, and recursively evaluates the second of each pair, as in

and APPEND which takes a list of SEXPR * SEXPR and returns that list merges that with the input environment (another list of SEXPR * SEXPR).

Having generated a new environment by merging all the evaluated bindings, i.e. (APPEND pairs e), it then evaluates the body of the expression in that new environment.

Our special form LAMBDA behaves similarly, creating a function that when executed makes local bindings from the formal parameters to the corresponding values (ZIP being the standard operation to map a pair of lists to a list of pairs), and uses that to define a new environment for evaluating the body.

The special form REC returns a self-referencing computation, but is otherwise similar to the LAMBDA form. Note that f is not itself a function; this use emphasises that the F# rec keyword merely brings a variable into scope inside its own definition. so the definition can be recursive.

The special form QUOTE returns the input expression unevaluated.


The "Extras", or "code is data is code"

The evaluator provides access to the environment from within the language; symbol ENV yields the environment as a list of dotted pairs.

The symbol EVAL with one argument evaluates an expression within the current environment for the interpreter, and with two arguments, allows a different environment to be specified. This is how data becomes code -- it is able to cause itself self to execute.


Remaining cases -- Functions and errors

Function application is modelled by taking a fully evaluated list, whose first element is expected to be a COMPUTATION and applying it to the remainder of the list of arguments.

In the code

MAP applies the evaluator to each member of the list b to achieve this fully evaluated state; and apply looks like


Built-in functions are represented in this model by providing a standard environment with computations defined.

Anything else that is superficially syntactically valid but matches none of the expected forms results in an error.


Exercises

  • Modify the parser from last time to work with the new definition of SEXPR
  • Add Boolean operators AND, OR, NOT, EQUIV, IMPLIES to evaluator
  • Add built in functions to map between SYMBOLS and List of Characters and back again.
  • Add QUASIQUOTE (QQ), UNQUOTE with the meaning that (QQ (ADD (UNQUOTE X) (UNQUOTE Y) 23)) in an environment where X and Y are 123 and Z respectively evaluates to the S-expression (ADD 123 Z 23)
  • (Advanced): Change the evaluator to allow deal with interpreted code in the environment e.g. F is bound to (INTERPRET E) in env and (F X) is (eval E env)(eval X env). Alternatively extend the type of SEXPR to include an INTERPRETED constructor and do the same
  • (Advanced) Change the evaluator to allow special forms to be placed in the environment.

Putting it all together

I haven't copied over the parser from last time (that's the first exercise); but all the missing new bits are here.

Wednesday, December 11, 2013

An introduction to Functional Programming in F# -- Part 2: Lexers and Parsers

I hadn't meant this to be a weekly series, but that seems to be the tempo at which I'm transcribing things. So, without further ado, Part Two!

Topics covered in the series

  • Introduction to Functional Programming via circuits
  • Lexing and Parsing (this part)
  • Interpreters
  • Simple ‘Functional’ Distributed/Concurrent Programming
  • Non-Functional bits - Assignment, Exceptions, Objects, Libraries, etc.
  • Advanced stuff - Continuations, Monads, Generative Programming (Language Based programming), etc.

Our example

To illustrate the functional approach to the common problem of interpreting streams of data, this post will cover the lexing (or tokenizing) and parsing (rendering into syntactic form) of a language with a simple grammar. And there are no prizes for guessing that this simple language is LISP, with expressions looking like

  • (mult (add x 5) factor)
  • (bind ((x. 1) (y . 2) (z . 3)) (add x y z))
  • (defun append (x y)
            (if (null x) y
                 (cons (head x) (append (tail x) y))))

Schematically, what we want to build looks like

taking characters to lexemes to syntax tree.


Lexers

The lexer for our simple grammar is a state machine that looks like this.

It operates over a stream of characters, emitting tokens on the transitions marked with "⇒"

To start coding, we assume that the input is supplied to us as a string; and our first requirement is to turn this into a char list, which we can do, following the approach of recursive decomposition as introduced in the previous part:

Digression - purity of concept vs pragmatism

We are, in this post, working from first principles throughout, rather than going to library code -- the availability of mature facilities such as fslex/fsyacc or FParsec would reduce this example into just the incantations to initialise those.

This toList implementation is intended as a reminder of the technique used, without diverting into implementation detail. We can see that the repeated creation of temporary strings, for example, would be a potential performance issue in practice. Indeed, what one might write in practice would be more like

but objects and modules come later in the series.


The basic tools


To describe our tokens, we introduce an algebraic type which, as well as being symbolic (as our Bits type before), also includes data


-- this defines our lexemes. Then to interpret the input data, define the list of separators (referred to as stopcases in the state diagram above)


some of which are meaningful tokens in their own right, and digits


We can then define predicates to identify the type of an input character. We build a method to test for membership in a list, and use that to define the tests for each type.

We can also take digit streams and turn them into numeric values


The component lexers

We can now write the code representing the individual state transitions for symbol and number lexing

which takes a list, and returns the lexeme and the rest of the list. The case for numbers is more complicated, since we have to deal with the case where an embedded letter makes the whole thing a symbol

all that remains is to assemble the pieces


Exercise: Write a lexer for .ini files (see http://en.wikipedia.org/wiki/INI_file).


The component parsers

The syntax we want to develop to distinguish our LISP from arbitrary streams of lemexes looks looks like this

  • SEXPR = ATOM |
            BRA SEXPR DOT SEXPR KET |
            BRA SEXPR* KET
  • ATOM =  SYMBOL | NUMBER 

The approach we take is to translate the grammar directly into a program by creating elementary parsers for bra, ket, dot, symbols, atoms etc. and combine them by operations of sequencing, alternation and iteration.


In the same way that the lexer worked in terms of a list of lexemes and the input yet to process, we define our parse state as the tuple (remaining input to be parsed, list of results so far); a parsing operation will either transform a parse state to another (moving data from input to output) or explicitly fail

Consider the case (mult 1 2) and assume BRA has been read already. The input parse state looks like ( [SYMATOM mult ; NUMATOM 1; NUMATOM 2; KET] , _)
where we use the _ as a conventional wildcard, and we want our symbol parser to return ( [NUMATOM 1; NUMATOM 2; KET ] , [SYMATOM mult] ) on success.

Rendering this as code

creates a type that captures the idea of a success with some meaningful value or failure with none. Then we define the parser we want to handle the state where we expect the next lexeme to be a symbol as


which we see will do the job (modulo the wrapping in a Return) that we specified above.


The parser for a number will be much the same


and for all the simple tokens we can write a general parser


from which we produce the specific parsers


Combining the simple parsers

To express the notion that there is more than one valid lexeme at a particular point define

to express the notion of "Try A if it Fails Try B".

Similarly, if we expect two particular results in succession we can define

in words, "Try A if it fails the whole thing fails; if it succeeds then try B on the new parse state; if B fails the whole thing fails."

Then when we expect a number of the same thing in sequence

"Try A to match the longest possible sequence of A’s, including a zero length sequence."


Exercise: Write the operator OPT such that OPT A matches zero or one occurrences of A (note A is a parser!); i.e. A is OPTIONAL.


Exercise: Write the operator PLUS such that PLUS A matches one or more occurrences of A.


Transforms

So far parsers only check the structure of an expression – we also want them to build an abstract structure corresponding to the string, with general form.


  • SEXPR = ATOM ⇒ ATOM-ACTION |
            BRA SEXPR DOT SEXPR KET ⇒ PAIR-ACTION |
            BRA SEXPR* KET ⇒ LIST-ACTION 
  • ATOM =  SYMBOL ⇒ SYMBOL-ACTION |
            NUMBER  ⇒ NUMBER-ACTION 

Transforms take a parse state and build a parse state of a different type. Where our old Parse-State was (remaining input to be parsed, list of lexical items so far) we ant to construct a
new Parse-State of the form (remaining input to be parsed, list of Lisp Trees (i.e. items built from ATOMS,NIL and CONS)).

In code, we define a representation of an S-expression as

and the sort of syntax tree structure we want to build in terms of this type will look like


The general shape of a transform is

which parses input according to f then transforms the parse state according to h.


The grammar we need to make the shape of the figure above is

  • SEXPR = ATOM ⇒ ATOM-ACTION |
            BRA SEXPR DOT SEXPR KET ⇒ CONS-PAIR |
            BRA SEXPR* KET ⇒ CONS-LIST 
  • ATOM =  SYMBOL ⇒ ATOM-ACTION |
            NUMBER  ⇒ ATOM-ACTION 

where the operations are

  • CONSPAIR – produce a singleton list containing a CONS of the items
  • CONSLIST – produce an explicit CONS cell list containing the items
  • ATOMACTION – translate base cases between data types.

or, in code:


The Final Parser

After all the hard work, putting it together is rather anticlimactic. We define a helper

Finally, putting it all into one bundle, of type string -> SEXPR


Exercise: Write a parser for an .ini file

Here's the sort of grammar to use:


  • File = Line*
  • Line = WHITESPACE CMD ‘NL’
  • WHITESPACE = (‘NL’|’TAB’|SPACE)*
  • CMD = SectionHeader | Binding | Comment
  • SectionHeader = ‘[‘ HeaderString ’]’
  • Binding = BinderString ‘=‘BinderString
  • HeaderString = anything but [ or ] or newline
  • BinderString = anything but = or newline | Quote
  • Quote = ‘”’ anything but quotes ‘”’
  • Comment = ‘;’ ANY ‘NL’

Exercise: What real-world performance implications are there in the use of this sort of parser? Re-implement one of the parsers to avoid the limitations of the strictly evaluated list datatype.



Wednesday, December 04, 2013

An introduction to Functional Programming in F# -- Part 1 : Functional Programming via circuits

The other day I came across the notes from a series of informal lunchtime seminars that I organised about five years ago, and thought it would be worth putting them into order and posting them (actually à propos of a recent question about lexing from first principles on the MSDN F# forum, a topic which we will come to in a subsequent post).

Credit where credit is due

My thanks to co-conspirators Will Harwood and Laurence Jordan who did all the heavy lifting to prepare the material for the original set of seminars.

Topics covered in the series

Scope

This series covers functional-programming concepts, as illustrated in F# syntax -- a way for beginners who have started to use the language to avoid just writing FORTRAN C# in any other language. It is not intended as an introductory guide to the language syntax itself, though I'll put in asides about the notations used where appropriate.

F#'s Background

F# descends from the ML series of languages; The initial ML was originally developed as part of the LCF project (1978)‏ by Milner et al.; this split into separate strands (SML; and Cambridge ML, or CAML). CAML development proceeded at INRIA, grew objects in the OCAML variant from the turn of the century, and that in turn provided the basis for F#, which has continued to evolve in what is now more than a decade since Don Syme first brought us that language.

As an ML language, we expect functions to be first-class objects, like any other type (passed as arguments; returned from functions; stored in data-structures; nested inside other functions, where they are lexically scoped, just like any other local variable); and that evaluation is strict (arguments are evaluated before a function call is entered).

Similarly, we expect a static type system that includes polymorphic types, but that type inference will actually lift the burden of pervasive type annotations from the programmer. Most types -- built-in, supplied in libraries or user-defined -- will have immutable value behaviour, though reference types exist that can provided controlled destructive update.

And, while these are standard fare nowadays, we expect the language environment to provide garbage collection and exception handling.

Digital Circuits

Now we actually start to use the language; these examples are intended to execute in the F# interactive console, so are punctuated by the ";;" syntax which is an ML end-of-expression mark, usually optional in project-based code, but needed in the interactive case as a way of signalling that there is now a compilable unit to deal with.

We can use F# to model simple logic gates (1 or 2 signals of either 0 or 1 in, and some Boolean function -- AND, OR, XOR,...).

Here we have defined an algebraic data-type (similar to an enum) and two functions that take a pair of values to return a single result. The functions themselves use match expressions (roughly, what a switch statement wants to be when it grows up) to say "if the input looks like this, then the result must be that".

Exercise: Define the standard Bit functions: OR, NOT, NAND, NOR

Binary Addition

An adder is a circuit that takes two streams of bits and produces a steam of bits as the answer

In standard fashion, we build an adder from two half-adders (which take two inputs and produce the XOR as the sum stream and the AND as the carry stream).

One half-adder takes the two inputs; the sum stream from that is passed into the second half-adder along with the previous carry bit, and the resulting sum forms the final output. Meanwhile the two intermediate carry streams are OR'd together to form the full carry stream.

Exercise: Write Full_Adder as a ‘pattern match’ function definition (i.e. substitute and simplify)

Some Basic Tools: Lists and Streams

A well typed list is a sequence of values where all the values are of the same type,say, 'a. This is an 'a list e.g. Bit list in ML notation (F# allows the more C++-like synonym list<'a> for convenience). A list is either empty or it is a pair of values composed of a value of type 'a and an 'a list.

We can define functions operating on lists by exploiting the structural recursion -- define a function by what it must do on each clause of the type definition. Example: computing the length of a list as

  • The length of an empty list is 0
  • The length of a non-empty list is 1 + the length of the second list

Here the rec keyword has the effect of bringing the function's own name into scope inside itself, thus allowing the recursive call.

We can perform more complex operations such as

to create the concatenation of two lists, or

Observe that in each case, the original lists are left undisturbed (there is no forward pointer pointing towards the head of the list, so in the append case, l2 is not affected by having some other list know about it, whereas the list cells of l1 are copied as they are appended).

A better implementation of reverse

This is an example of a function, reverse_acc, being defined as a local value inside the enclosing function, just as we might define an integer or string value.

Higher order functions

These are simply functions that have function arguments, like

And we can generalize twice like


to apply f n times.

Lists naturally lead us to higher order functions, as that is how we would describe operations that apply some simpler transformation to every element of the list.

where we use the fact that + is just a function with some syntactic sugar that allows us to write it like an infix operator.

Exercise: Define the function zip which when given two lists of the same length and returns a single list of pairs e.g. zip [1,2,3] [zero, one, zero] is [(1,zero),(2,one), (3,zero)]

Exercise: Define itlist which takes a function f, an initial value v, and a non-negative integer n and returns a list of the iterates of f on v, so itlist f v n is [v, f v, f(f v), f(f(f v))), ...., fn v]

Exercise: Define the function R_reduce f l c which returns f(f(f(f (x1, c) , x2) , x3), x4)if l is [x1, x2, x3, x4]

All these and more (though not always under the same names) are actually provided in the F# Lists module; these and the related functions in the other collections modules are your standard tools when working in the language. Not everything is list processing, but a surprising amount can be cast in that form.

The adder revisited

Now we have lists, we can model the way that the adder is supplied by sequences of bits representing a number, not just one at a time:

Numbers may not be same length, so

  • Drop leading zeros
  • Pad with new leading zeros until they are the same length

Numbers are fed to the adder LSB first, so would also need to be reversed from the natural MSB first form before and restored after, giving

Generalising: Synchronous State Machine

The adder we saw above is a special case of a general form

which we can represent in code as

and in this form, the LSB-first implementation of the adder looks like