## Monday, August 20, 2007

Quicksort in Haskell Quicksort is a commonly used example of how succinct Haskell programming can be. It usually looks something likes this:
```qsort :: (Ord a Bool) => [a] -> [a]
qsort [] = []
qsort (x:xs) = qsort (filter (<= x) xs) ++ [x] ++ qsort (filter (> x) xs)
```
The problem with this function is that it's not really Quicksort. Viewed through sufficently blurry glasses (or high abstraction altitude) it's looks like the real Quicksort. What they have in common is overall algorithm: pick a pivot (always the first element), then recursively sort the ones that are smaller, the ones that are bigger, and then stick it all together. But in my opinion the real Quicksort has to be imperative because it relies on destructive update; and it uses a very elegant partitioning algorithm. The partitioning works like this: scan from the left for an element bigger than the pivot, then scan from the right for an element smaller than the pivot, and then swap them. Repeat this until the array has been partitioned. Haskell has a variety of array types with destructive updates (in different monads), so it's perfectly possible to write the imperative Quicksort in Haskell. To make the code look reasonably nice I'm going to use my C-like DSEL to write the code. Here it is:
```qsortM :: forall i a m r arr .
(MonadRef m r, Num i, Ix i, MArray arr a m, Ord a Bool) =>
arr i a -> m (arr i a)
qsortM ma = runE \$ do
(lb, ub) <- embed \$ getBounds ma
let mlb = pure0 lb
mub = pure0 ub
a <- liftArray ma

let qsort' l r =
if1 (r > l) \$ do
i <- auto l
j <- auto (r+1)
let v = a[l] :: E m a
iLTj = i < (j :: E m i)
while iLTj \$ do
while ((i += 1) < mub && a[i] < v)
skip
while (a[j -= 1] > v)
skip
if1 iLTj \$ do
a[i] =:= a[j]
a[l] =:= a[j]
qsort' l (j-1)
qsort' i r

qsort' mlb mub
return ma
```
```qsortA :: forall i a . (Num i, Ix i, Ord a Bool) => Array i a -> Array i a
qsortA a = runSTArray sa
where sa :: ST s (STArray s i a)
sa = thaw a >>= qsortM
```
We're using runSTArray instead of runST, because it provides an efficient way to turn a mutable array on the inside into an immutable array on the outside. The thaw function turns an immutable array into a mutable one, but it has to make a copy to be safe, since we don't want to mutate the original. Finally, if we want to sort lists we can always convert back and forth.
```asList :: (Array Int a -> Array Int a) -> ([a] -> [a])
asList f xs = elems . f . listArray (1, length xs) \$ xs

qsort :: (Prelude.Ord a) => [a] -> [a]
qsort = asList qsortA
```
The final qsort has the normal type signature (I switched to the normal Ord again).

Labels:

## Thursday, August 16, 2007

What about arrays? After doing my little C-like DSEL in Haskell I started wondering if you could do arrays that look like C arrays too. It turns out you can, but I can't figure out a way to make it safe from certain runtime errors. Array indexing in C is written a[i], and this is legal syntax in Haskell too, so we're off to a good start. But to make it work we need a to be a function that takes a list and returns something. What should it return? It must be something that is both an l-value and an r-value. Just as I had a function auto to create a local variable, I'll have a function arr to make a local array. It will take the size of the array and then return a function that takes the index. For simplicity I'll make the index be an Int.
```arr :: forall a . [E Int] -> E (forall v . [E Int] -> E' v a)
```
So now this type checks
```atest = do {
a <- arr[2];
a[1] =: a[0] + 1;
}
```
Now we just need the body of arr.
```arr [s] = do
s' <- runE s
a <- newArray (0, s' - 1) undefined :: IO (IOArray Int a)
let ix [i] = runE i
return (\ is -> V (ix is >>= readArray a)
(\ x -> ix is >>= \ i -> writeArray a i x))
```
The arr function takes a list with one element, the size, and allocates an array (indexed from 0) with this size. It then returns a function that expects a singleton list with an index and returns the V constructor which I used for variables. For multidimensional arrays we can extend the arr function.
```arr ss = do
ss' <- mapM runE ss
let sz = product ss'
ix is = do
is' <- mapM runE is
when (length is' /= length ss') \$ error "wrong number of indicies"
return \$ foldr (\ (i, s) r -> r * s + i) 0 (zip is' ss')
a <- newArray (0, product ss' - 1) undefined :: IO (IOArray Int a)
return (\ is -> V (ix is >>= readArray a) (\ x -> ix is >>= \ i -> writeArray a i x))
```
The problem with both these definitions is that the number of indicies is checked dynamically rather than statically. To do it statically we'd have to be able to overload the syntax for list literals to use a type that keeps track of the number of elements.

Labels:

## Monday, August 13, 2007

Programming in C, ummm, Haskell Here's a small Haskell program for computing factorial.
```fac n = do {
a <- auto 1;
i <- auto n;
while (i >. 0) \$ do {
a *= i;
i -= 1;
};
a;
}
```
It even runs, runE (fac 10) produces 3628800. Let's compare that to the C program doing the same thing.
```int
fac(int n) {
auto a = 1;
auto i = n;
while (i > 0) {
a *= i;
i -= 1;
}
return a;
}
```
They look rather similar, don't they? (BTW, I decided to use the heavily underused C keyword auto. If you don't know C, don't worry, auto doesn't mean anything. A local declaration auto int i; is the same as int i; in C.) I often hear people complain that C-like programming in Haskell is ugly and verbose, but I don't think that has to be the case. What people complain about is when they write code like this:
```fac' n = do
a <- newIORef 1
i <- newIORef n
whileM (do i' <- readIORef i; return \$ i' > 0) \$ do
writeIORef a (a' * i')
writeIORef i (i' + 1)
return a'
```
And I agree, it's ugly, it's verbose, but it's not necessary. One of the reasons it's verbose are the uses of readIORef and writeIORef. In C you just refer to the variable and you'll get an r-value when you need it, and you'll get an l-value when you need it. And you can do the same in Haskell. Let's look at a tiny version of the problem. We want the following operations:
```type V a ...                     -- variables of type a
type E a ...                     -- expressions of type a
(=:) :: V a -> E a -> IO ()      -- assignment
plus :: E Int -> E Int -> E Int  -- addition
one :: E Int                     -- constant 1
```
Here's a tiny sample:
```  x <- auto one
x =: x `plus` x

-- the following should be a type error
(x `plus` x) =: one
```
How can we make this happen? We want x to be able to be the left hand side of an assignment as well as an expression on the right hand side. But the assignment operator has different types on the left and right. So we could possibly make V the same as E. But this would be bad, because we want the left hand side to be a variable and now we'd have no way of knowing, which means that it would no longer be a type error to assign to a non-variable. So, we need a little bit of type trickery. First, we'll make V and E "subtypes" of the same type, so they have something in common.
```data E' v a = ...
data LValue
data RValue
type V a = E' LValue a
type E a = E' RValue a
```
The type E' has an additional type argument that determines if the value is an l-value or an r-value. So what about auto, what type should it have? It needs to return something that is both an l-value and an r-value. So we'll make it polymorphic! First attempt:
```auto :: E a -> IO (E' v a)
```
But this won't work. Saying "x <- auto 2; ..." is the same as "auto 2 >>= \ x -> ...". And when a variable is lambda bound it is not polymorphic. What does this mean? It means that we can use x as either an l-value or as an r-value inside ..., but not both; the v type variable can only have a single type. Are we stuck? Well, we would have been without higher ranked polymorphism. But here's a type that works:
```auto :: E a -> IO (forall v . E' v a)
```
So now x will have type forall v . E' v a which is indeed polymorphic in v just as we wanted. OK, so now we have some type signatures that work (using stub implementations it's easy to check that the types work out). So now we need to implement the types and the operations. Let's start with r-values. To make a constructor that only can construct r-values we'll use a GADT. The E type will simply embed an IO value. So when we see the type E' RValue a it's really isomorphic to IO a. To extract the IO value we have the function runE.
```data E' v a where
E :: IO a -> E' RValue a

runE :: E' v a -> IO a
runE (E t) = t
```
So now we can do plus and one; they are totally straightforward except that we need to unwrap and wrap the E constructor.
```plus :: E Int -> E Int -> E Int
plus x y = E \$ do
x' <- runE x
y' <- runE y
return \$ x' + y'

one :: E Int
one = E \$ return 1
```
And then the hard part, variables. To represent a variable we'll use two fields, one that reads the variable and one that assigns the variable. We need to extend the runE function to use the variable read field to get the value. The auto function allocates a new variable and then packages up the two fields.
```data E' v a where
E :: IO a -> E' RValue a
V :: IO a -> (a -> IO ()) -> E' v a

runE :: E' v a -> IO a
runE (E t) = t
runE (V t _) = t

auto :: E a -> IO (forall v . E' v a)
auto x = do
x' <- runE x
r  <- newIORef x'
return (V (readIORef r) (writeIORef r))
```
We're about done, just assignment left. It's easy, just use the assignment field from the V constructor.
```(=:) :: V a -> E a -> IO ()
(V _ asg) =: e = do
e' <- runE e
asg e'
```
Hmmm, but there's a missing case in that definition. What about the constructor E? Can't we get a runtime failure? No. Look at the type of the first argument, it's V a, i.e., E' LValue a. The E constructor can only construct values of type E' RValue a, so we just can't get a match failure. Now our little example above compiles, and trying the bad expression (x `plus` x) =: one gives this error:
```    Couldn't match expected type `LValue'
against inferred type `RValue'
Expected type: V a
Inferred type: E Int
```
Once we've made the E data type it's totally routine to make the factorial function I first showed work. We just need to create an instance for Num (E a) and a few more functions. Is this the end of the story? Is everything rosy? Is C programming in Haskell as smooth as that? Well, there are some flies in the ointment. While it's true that programming with the E type is easy, it's a little cumbersome if we want to use pure functions. Pute functions will not have the E type and will have to be lifted into this new brave world. Likewise, IO types will have to be lifted. So while we have reduced some verbosity, it has popped up again in a different place. Which is better? I don't know, but I thought I'd share the l-value/r-value trick. Oh, and (of course), there's nothing special about the IO monad. I just used it as an example; any monad with references will work. You can parametrize E' over the underlying monad.

Labels: