## Lost and found

If I write`10^8`in Haskell, how many multiplications will be used to compute the power? A stupid question? Well, for this example, but if I was computing

`x^8`and x has 100000 digits then I'd care. So how can I find out? I can look at the definition of the exponentiation operator. Here it is, from the Haskell report and GHC 6.8:

(^) :: (Num a, Integral b) => a -> b -> a _ ^ 0 = 1 x ^ n | n > 0 = f x (n-1) x where f _ 0 y = y f a d y = g a d where g b i | even i = g (b*b) (i `quot` 2) | otherwise = f b (i-1) (b*y) _ ^ _ = error "Prelude.^: negative exponent"It's a bit involved, but decipherable. Another way would be to insert some kind of debug trace message in the multiplication.

### Traced values

I'd like to show a different way. Here's a ghci session:Prelude> :m +Debug.Traced Prelude Debug.Traced> let x = 10 :: Traced AsValue Integer Prelude Debug.Traced> let y = x ^ 8 Prelude Debug.Traced> y 100000000 Prelude Debug.Traced> :t y y :: Traced AsValue Integer Prelude Debug.Traced> asExp y 10 * 10 * (10 * 10) * (10 * 10 * (10 * 10)) Prelude Debug.Traced> asSharedExp y let _2 = 10 * 10; in _2 * _2 * (_2 * (10 * 10)) Prelude Debug.Traced> :t asSharedExp y asSharedExp y :: Traced AsExp Integer Prelude Debug.Traced>So what's going on? The value of x is

`Traced Integer`, which means that there's some magic going on. The variable can be used as usual, for instance in computing

`x^8`. A traced value can also be shown as an expression, which is what

`showAsExp`does. So a traced value is somewhat like the symbolic values I had in an earlier post, but in addition to having a symbolic representation they also have a normal value. But the output from

`showAsExp`doesn't really help in answering how many multiplications there are, since the shown expression has no sharing; it is totally flattened. The

`showAsShared`function is the black magic here, it recovers the sharing, and we can see what happened. What we see is that there are actually five (5) multiplications involved in computing 10^8. This shows that the definition of exponentiation is suboptimal, since it can be done with three multiplications (three repeated squarings). The

`showAsShared`really does have some

**black**magic. It recovers information that is not part of the Haskell semantics, so from that we can conclude that it must contain the powerful incantation

`unsafePerformIO`somewhere. How does it reveal implementation secrets? Look at this:

Prelude Debug.Traced> asSharedExp $ let a = 1 + 2 in a * a let _1 = 1 + 2; in _1 * _1 Prelude Debug.Traced> asSharedExp $ (1+2) * (1+2) (1 + 2) * (1 + 2) Prelude Debug.Traced>The let expression and the expression where the variable has been expanded are semantically equal in Haskell, so no (pure) function can possibly be able to give different results for them.

OK, so how does it work? I'll show a simplified version of the Traced module here that only deals with one traced type, but it can be extended. The (soon to be available) hackage package contains the extended version.

In the `Traced` type we need to represent expressions. We only need constants and function applications.

data Traced a = Con a | Apply a String [Traced a]The function application contains the value, the name of the function, and the arguments the function was applied to.

In the exported interface from the module we want to be able to convert to and from the Traced type. Nothing exciting here.

traced :: a -> Traced a traced = Con unTraced :: Traced a -> a unTraced (Con x) = x unTraced (Apply x _ _) = xWe want to show a traced value the same way we show the underlying value.

instance (Show a) => Show (Traced a) where show = show . unTracedComparing for equality, we simply compare the underlying values.

instance (Eq a) => Eq (Traced a) where x == y = unTraced x == unTraced yAnd we'll make traced numbers be an instance of

`Num`. All the functions (except

`fromInteger`) build apply nodes.

instance (Num a) => Num (Traced a) where (+) = apply2 "+" (+) (-) = apply2 "-" (-) (*) = apply2 "*" (*) negate = apply1 "-" negate abs = apply1 "abs" abs signum = apply1 "signum" signum fromInteger = traced . fromInteger apply1 s f x = Apply (f (unTraced x)) s [x] apply2 s f x y = Apply (f (unTraced x) (unTraced y)) s [x, y]A fancier version of this module could make the Traced type an applicative functor etc., but that's not really so important.

Finally, we want to be able to show a traced expression as an expression tree instead of a value.

Traced t -> String showAsExp (Con x) = show x showAsExp (Apply _ s [x,y]) | not (isAlpha (head s)) = "(" ++ showAsExp x ++ " " ++ s ++ " " ++ showAsExp y ++ ")" showAsExp (Apply _ s xs) = "(" ++ concat (intersperse " " $ s : map showAsExp xs) ++ ")"We only export what is necessary, so the module header should be

module Traced(Traced, traced, unTraced, showAsExp) whereA quick test:

Traced> putStrLn $ showAsExp $ 10^8 (((10 * 10) * (10 * 10)) * ((10 * 10) * (10 * 10))) Traced>And now we need the black magic to recover the sharing. We would like to have a unique label in each node of the expression tree. If we only had that we could see when two things referred to the same subexpression, and use the label to refer to it instead of the value. If we were doing this in,

*e.g.*., Java we could use object identity for this purpose. If we were doing it in C, we'd just compare pointers to the structs containing the expressions. But we're doing it in Haskell and none of this is available. It's not unavailable because Haskell wants to make our lives difficult, quite the contrary. Languages that allow pointer comparisons (object identity) must introduce an extra level of indirection in the semantics to explain how this is possible. So now it's not enough to know we have the number 5, we need to know that this is the number 5 at location 1000. And that's not the same as the number 5 at location 1010. The numbers contained in the locations might be the same, but the locations are not interchangable since they could,

*e.g.*, be mutated differently in the future.

So is everything lost in Haskell? Not at all. GHC implements a library of stable names, which is (at first approximation) the same as the address of something in memory.
The API to `System.Mem.StableName` is very simple.

data StableName a makeStableName :: a -> IO (StableName a) hashStableName :: StableName a -> IntThe

`makeStableName`function is like the & operator (address of) in C. It returns the "address" of something. So

`StableName`is like a C pointer type. And the

`hashStableName`function converts the "address" to an int,

*i.e.*, like casting it to int in C. (In the simplified code below we'll assume that two stable names never hash to the same Int, although this is not absolutely guaranteed.)

How come this interface is OK? For instance, calling `makeStableName` on semantically equal values can yield different results if the values happen to be stored in different parts of memory. It's ok, because the returned value is in the IO monad. In the IO monad anything can happen, so it's perfectly reasonable that the same argument yields different results in different calls.

Despite the name and the documentation the GHC stable names have a stability flaw. The stable name changes when an unevaluated expression is evaluated. It's annoying, but not a major flaw. But once evaluated the stable names is guaranteed to remain the same. (The implementation of stable names is not as simple as taking the address of an object, since the GC can move objects around.)

So now we have a way to get the identity of each node in the expression tree built by the Traced type. The plan is to traverse the expression tree. For each node we'll use the "address" of it as it's name, and remember that we've seen this node. As we traverse the tree we build a list of nodes we've seen. This list then corresponds to let bindings we'd like to display as the final result. As we traverse the nodes we'll also replace each node with a reference to its name instead, so we can see the sharing in the result.

To be able to represent the expression with rediscovered sharing we need to extend the Traced type. We need variable references and let bindings. In fact, we'll only generate a top level let binding, but we include it in the data type anyway.

data Traced a ... | Var a Name | Let [(Name, Traced a)] (Traced a) type Name = StringWe store the value in the

`Var`constructor to make

`unTraced`possible. New cases:

... unTraced (Var x _) = x unTraced (Let _ e) = unTraced eAnd we want to show the new constructors:

... showAsExp (Var _ n) = n showAsExp (Let bs e) = "let " ++ concatMap bind bs ++ "in " ++ showAsExp e where bind (n, e) = n ++ " = " ++ showAsExp e ++ "; "To rediscover the sharing we need to keep some state. We need a mapping from the node address to the

`Var`that should replace it, and we need to accumulate the bindings,

*i.e.*, the pairs of node name and expression. So we need a state monad to keep track of the state. We also need to be able to call

`makeStableName`in the IO monad, so we need IO as well. We'll do this by using the state transformer monad on top of IO. So the function that discovers sharing will take a traced value and return a new traced value, all in this monad. So we have the type:

type TState = (M.IntMap (Traced a), [(Name, Traced a)]) share :: Traced a -> StateT TState IO (Traced a)Assuming some imports

import Control.Monad.State import qualified Data.IntMap as M import System.Mem.StableNameNow the body:

share e@(Con _) = return e share e@(Apply v s xs) = do (sm, bs) <- get sn <- liftIO $ makeStableName e let h = hashStableName sn case M.lookup h sm of Just ie -> return ie Nothing -> do let n = "_" ++ show h ie = Var v n put (M.insert h ie sm, bs) xs' <- mapM share xs modify $ \ (sm', bs') -> (sm', (n, Apply v s xs') : bs') return ieConstants are easy, we don't bother sharing them (we could, but it's not that interesting). For an apply node, we get its stable name (it's already evaluated, so it won't change) and hash it. We then grab the map (it's an IntMap; a fast map from Int to anything) from the state and look up the node. If the node is found we just return the expression from the map. If it's not in the map, we invent a name (using the hash value) and insert a

`Var`node in the map so we'll never process this node again. We then recursively traverse all the children of the apply node, and rebuild a new apply node with those new children. This constitutes a binding and we stick it in the accumulated list of bindings. Finally we return the new

`Var`node.

At the top level we need to call `share` and then build a let expression. The bindings are put in the list with the top node first in the list, so it looks nicer to reverse it.

shareIO :: Traced a -> IO (Traced a) shareIO e = do (v, (_, bs)) <- runStateT (share e) (M.empty, []) return $ Let (reverse bs) vWe could leave it there, but to make it more convenient to use, we'll do an

`unsafePerformIO`to hide the use of IO. This is not unsafe in the sense that it will make our program crash, but it is unsafe in the sense that it ruins the Haskell semantics. But since this whole exercise is to make a debugging/tracing tool this is a legitimate use, in my opinion.

reShare :: Traced a -> Traced a reShare = unsafePerformIO . shareIO showShared :: (Show a) => Traced a -> String showShared = showAsExp . reShareAnd let's test it:

Traced> putStrLn $ showShared $ 10^8 let _5 = (10 * 10); _7 = (_5 * _5); _11 = (10 * 10); _1 = (_5 * _11); _8 = (_7 * _1); in _8 Traced>In my first example it looked a little prettier since the full implementation only shows bindings for nodes that are actually shared. This is a simple transformation to add. But looking at this, we can still see that there are five multiplications.

### Hacking with types

It's a little annoying that we have both`show`and

`showAsExp`to show traced values. It would be nicer if we could always use

`show`and make the type determine how it is showed. We could invent a

`newtype`to wrap around

`Traced`and print the new type in a different way. But then this new type would not be compatible with the old one, which is a little annoying.

So we're going down a different road instead, we'll have a the `Traced` type have two parameters. The first one being a phantom type; just being used to determine how to show the traced value. We will rename the old type to `TracedT`, and it will only be used internally in the module.

newtype Traced t a = T { unT :: TracedT a } data TracedT a = Con a | Apply a String [TracedT a] | Var a Name | Let [(Name, TracedT a)] (TracedT a)Some minor changes are needed to the code to accomodate for this change.

traced :: a -> Traced t a traced = T . Con unTraced :: Traced t a -> a unTraced = unTracedT . unT unTracedT :: TracedT a -> a ... apply1 s f x = T $ Apply (f (unTraced x)) s [unT x] apply2 s f x y = T $ Apply (f (unTraced x) (unTraced y)) s [unT x, unT y]And now for the fun part, the

`show`function. We could do something like this:

data AsValue data AsExp instance Show (Traced AsValue a) ... instance Show (Traced AsExp a) ...This has problems, first it's not Haskell-98, but more importantly we can't print something of type

`Traced t a`, because t is too general. We'd like to print as value by default, and be able to override this. There's only one defaulting mechanism in Haskell: the numeric defaulting. So as a somewhat disgusting hack, let's use the numeric defaulting to our advantage. We'll print

`Traced Integer a`as values and

`Traced Double a`as expressions.

instance (Num t, Show a) => Show (Traced t a) where show e = if doExp e then showAsExp (unT e) else show (unT e) doExp :: (Num t) => Traced t a -> Bool doExp x = '.' `elem` show (f x) where f :: (Num t) => Traced t a -> t f _ = 0We distinguish between Integer and Double by how 0 is printed; for Double it has a '.' in the string.

A small utility to force printing as an expression.

asExp :: Traced t a -> Traced Double a asExp = T . unTLet's try it in ghci:

Traced> reShare $ 10^8 100000000 Traced> asExp $ reShare $ 10^8 let _8 = (10 * 10); _9 = (_8 * _8); _5 = (10 * 10); _7 = (_8 * _5); _10 = (_9 * _7); in _10 Traced> reShare $ (asExp 10)^8 let _7 = (10 * 10); _8 = (_7 * _7); _1 = (10 * 10); _5 = (_7 * _1); _9 = (_8 * _5); in _9 Traced>In the first expression the numeric default made t be Integer, so we got a value. In the second and third case we forced t to be Double.

Some other examples:

Traced> let fac n = if n == 0 then 1 else n * fac(n-1) Traced> asExp $ reShare $ fac 3 let _19 = (3 - 1); _24 = (_19 - 1); _18 = (_24 * 1); _20 = (_19 * _18); _21 = (3 * _20); in _21 Traced> let slowFib n = if n < 2 then 1 else slowFib(n-1) + slowFib(n-2) Traced> asExp $ reShare $ slowFib 5 let _18 = (1 + 1); _19 = (_18 + 1); _23 = (1 + 1); _20 = (_19 + _23); _25 = (1 + 1); _17 = (_25 + 1); _21 = (_20 + _17); in _21 Traced> let fastFib n = fst $ iterate (\ (x,y) -> (y,x+y)) (1,1) !! n Traced> asExp $ reShare $ fastFib 5 let _20 = (1 + 1); _21 = (1 + _20); _19 = (_20 + _21); _23 = (_21 + _19); in _23 Traced>Well, that's enough abuse of the type system for one day.

### More examples

The full library for traced values contains some more functionality, like naming values and functions that operate on traced booleans.A named value, and a symbolic named value:

Prelude Debug.Traced> asSharedExp $ (named "x" 10)^8 let x = 10; _2 = x * x; in _2 * _2 * (_2 * (x * x)) Prelude Debug.Traced> asSharedExp $ (unknown "x")^8 let _2 = x * x; in _2 * _2 * (_2 * (x * x))For a normal definition of

`fac`we only see the arithmetic:

Prelude Debug.Traced> let fac n = if n == 0 then 1 else n * fac (n-1) Prelude Debug.Traced> fac 5 120 Prelude Debug.Traced> asSharedExp $ fac 5 let _2 = 5 - 1; _4 = _2 - 1; _6 = _4 - 1; in 5 * (_2 * (_4 * (_6 * ((_6 - 1) * 1))))By using traced booleans we can see exactly what's going on:

Prelude Debug.Traced> let facT n = ifT (n %== 0) 1 (n * facT (n-1)) Prelude Debug.Traced> facT 5 120 Prelude Debug.Traced> asSharedExp $ facT 5 let _6 = 5 - 1; _11 = _6 - 1; _16 = _11 - 1; _21 = _16 - 1; in ifT (5 == 0) ... (5 * ifT (_6 == 0) ... (_6 * ifT (_11 == 0) ... (_11 * ifT (_16 == 0) ... (_16 * ifT (_21 == 0) ... (_21 * ifT (_21 - 1 == 0) 1 ...)))))We can also lift a regular function to a traced function:

Prelude Debug.Traced> let fac' = liftFun "fac'" fac :: Traced t Integer -> Traced t Integer Prelude Debug.Traced> let a = fac' 5 + fac' 10 Prelude Debug.Traced> a 3628920 Prelude Debug.Traced> asSharedExp a fac' 5 + fac' 10 Prelude Debug.Traced>

### And what about exponentiation?

The new exponentiation function in GHC looks like this:(^) :: (Num a, Integral b) => a -> b -> a x0 ^ y0 | y0 < 0 = error "Negative exponent" | y0 == 0 = 1 | otherwise = f x0 y0 where f x y | even y = f (x * x) (y `quot` 2) | y == 1 = x | otherwise = g (x * x) ((y - 1) `quot` 2) x g x y z | even y = g (x * x) (y `quot` 2) z | y == 1 = x * z | otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)And if we try this definition we get

Debug.Traced> asSharedExp $ 10 ^ 8 let _2 = 10 * 10; _1 = _2 * _2; in _1 * _1 Debug.Traced>Victory!

Edit: Package available in hackage.

Labels: Haskell