Things that amuse me

Wednesday, December 10, 2008

A somewhat failed adventure in Haskell abstraction

I usually blog about weird and wonderful things you can do in Haskell. Today I'm going to talk about something very plain and not wonderful at all.

If you want to try out the code below, use these Haskell extensions:

{-# LANGUAGE TypeFamilies, MultiParamTypeClasses, OverloadedStrings,
   FlexibleInstances, TypeSynonymInstances, ScopedTypeVariables,
   FunctionalDependencies, RecordWildCards, FlexibleContexts,
   GeneralizedNewtypeDeriving #-}

The simple problem

We want to define a type for a person which has a few fields and operations. Like this

module Person(Person(..), display) where

data Person = Person {
   firstName :: String,
   lastName  :: String,
   height    :: Double
   }

display :: Person -> String
display p = firstName p ++ " " ++ lastName p ++ " " ++ show (height p + 1)

Very simple. To use it we can just import the module and the write something like

 print $ display $ Person { firstName = "Manuel", lastName = "Peyton Jones", height = 255 }

But being efficiancy conscious I'm not happy with using String and Double. I'd like to experiment with different types for these. Maybe I should use ByteString and Int instead?

Simple enough, let's abstract out the types and operations into a different module.

module Ops(XString, XDouble, (+++), xshow) where
import Data.String
newtype XString = XString String deriving (Eq, Show, IsString)
newtype XDouble = XDouble Double deriving (Eq, Show, Num)

(+++) :: XString -> XString -> XString
XString x +++ XString y = XString (x ++ y)

xshow :: XDouble -> XString
xshow (XDouble x) = XString (show x)

module Person(Person(..), display) where
import Ops

data Person = Person {
   firstName :: XString,
   lastName  :: XString,
   height    :: XDouble
   }

display :: Person -> XString
display p = firstName p +++ " " +++ lastName p +++ " " +++ show (height p + 1)

There, problems solved. By changing the import in the Person module you can try out different types for XString and XDouble.

No, this is not problem solved. To try out different implementations I need to edit the Person module. That's not abstraction, that's obstruction. It should be possible to write the code for the Person module once and for all once you decided to abstract, and then never change it again.

I also didn't really want to necessarily have newtype in my module. Maybe I'd want this:

module Ops(XString, XDouble, (+++), xshow) where
type XString = String
type XDouble = Double

(+++) :: XString -> XString -> XString
(+++) = (++)

xshow :: XDouble -> XString
xshow = show

You can define Ops that way, but then the implementation of Ops may leak into the Person module. What you really want is to type check Person against the signature of Ops, like

interface Ops where
type XString
type XDouble
(+++) :: XString -> XString -> XString
xshow :: XDouble -> XString

And later supply the actual implementation. Alas, Haskell doesn't allow this.

In ML (SML or O'Caml) this would be solved by using a functor. The Person module would be a functor that takes the Ops module as an argument and yields a new module. And then you can just plug and play with different Ops implementations. This is where ML shines and Haskell sucks.

Type classes

But the defenders of Haskell superiority say, Haskell has type classes, that's the way to abstract! So let's make Ops into a type class. Let's do old style with multiple parameters first. Since Ops defines two types it will correspond to having two type parameters to the class.

class (IsString xstring, Num xdouble) => Ops xstring xdouble where
   (+++) :: xstring -> xstring -> xstring
   xshow :: xdouble -> xshow

Ok, so how do we have to rewrite the Person module?

data Person xstring xdouble = Person {
   firstName :: xstring,
   lastName  :: xstring,
   height    :: xdouble
   }

display :: (Ops xstring xdouble) => Person xstring xdouble -> xstring
display p = firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1)

An implementation is provided by an instance declaration:

instance Ops String Double where
   (+++) = (++)
   xshow = show

We see the major flaw in this approch at once. The Person data type now has two parameters. This might be bearable, but imagine a more complicated example where Ops contains 15 types. And every time you add a field with a new type to Person you have to update every single place in the program that mentions the Person type. That's not abstraction.

But in fact, it's even worse than that. The definition of display might look plausible, but it's full of ambiguities. Compiling it gives lots of errors of this kind:

   Could not deduce (Ops xstring xdouble)
     from the context (Ops xstring xdouble4)

Well, we can remove the type signature and let GHC figure it out. The we get this

display :: (Ops xstring xdouble,
           Ops xstring xdouble3,
           Ops xstring xdouble2,
           Ops xstring xdouble1,
           Ops xstring xdouble4) =>
          Person xstring xdouble4 -> xstring

And this function can, of course, never be used because most of the type variables do not occur outside the context so they will never be determined. I don't even know how to put explicit types in the function to make it work.

Well, it's common knowledge that multi-parameter type classes without functional dependencies is asking for trouble. So can we add some functional dependencies? Sure, if we use

class (IsString xstring, Num xdouble) => Ops xstring xdouble | xstring -> xdouble where

then things work beautifully. Until we decide that another instance that would be interesting to try is

instance Ops String Int

which is not valid with the FD present.

So we can't have functional dependencies if we want to have flexibilty with the instances. So what is it that goes wrong without the FDs? It's that all the uses (+++) and xshow are not tied together, they could potentially have different types. Let's try and be sneaky and tie them together:

display :: (Ops xstring xdouble) => Person xstring xdouble -> xstring
display p =
   let (++++) = (+++); xxshow = xshow
   in  firstName p ++++ " " ++++ lastName p ++++ " " ++++ xxshow (height p + 1)

This only generates one error message, because there's still nothing that says the the two operations come from the same instance. We need to make the tie even closer.

class (IsString xstring, Num xdouble) => Ops xstring xdouble where
   ops :: (xstring -> xstring -> xstring, xdouble -> xstring)
instance Ops String Double where
   ops = ((++), show)

display :: (Ops xstring xdouble) => Person xstring xdouble -> xstring
display p =
   let ((+++), xshow) = ops
   in  firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1)

This actually works! We can make it neater looking.

class (IsString xstring, Num xdouble) => Ops xstring xdouble where
   ops :: DOps xstring xdouble

data DOps xstring xdouble = DOps {
   (+++) :: xstring -> xstring -> xstring,
   xshow :: xdouble -> xstring
   }

instance Ops String Double where
   ops = DOps (++) show

display :: (Ops xstring xdouble) => Person xstring xdouble -> xstring
display p =
   let DOps{..} = ops
   in  firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1)

We have basically packaged up the dictionary and unpack it ourselves to get access to the operations. It's not pleasent, but it works.

But as I already said, the multiparameter type class version isn't really a good solution to the problem even if it works; it introduces too many parameters to the Person record.

Associated types

The new and shiny way of doing type classes is to use associated types instead of FDs. So let's give that a try. So what should the associated types be in the class. The associated type is supposed to be the one that can be computed from the main one. But we have two types that are on equal footing, so there is no main one. We can remedy that by introducing an artificial third type that is the main one, it can then determine the other two.

class (IsString (XString t), Num (XDouble t)) => Ops t where
   type XString t :: *
   type XDouble t :: *
   (+++) :: XString t -> XString t -> XString t
   xshow :: XDouble t -> XString t

data Person t = Person {
   firstName :: XString t,
   lastName  :: XString t,
   height    :: XDouble t
   }

That looks pretty neat. Note how the Person record has one parameter and no matter how many new associated type we add it will still only have one parameter. One parameter is reasonable, the Person record is after all parameterized over what kind of Ops we are providing.

Let's do an instance. It will need the extra type that is somehow the name of the instance.

data Basic = Basic

instance Ops Basic where
   type XString Basic = String
   type XDouble Basic = Double
   (+++) = (++)
   xshow = show

Now what about the display function? Alas, now it breaks down again. The display function is full of type errors again. And the reason is similar to the multiparameter version; there's nothing that ties the operations together.

We can play the same trick as with DOps above, but for some reason it doesn't work this time. The type comes out as

display :: (XString t ~ XString a,
           XDouble t ~ XDouble a,
           Ops a,
           Num (XDouble t)) =>
          Person t -> XString a

I have no clue why. I find associated types very hard to get a grip on.

OK, multi-parameter type classes made things work, but had too many type parameters. And associated types is the other way around. You can try combining them, but it didn't get me anywhere closer.

Associated data types

OK, I won't admit defeat yet. There's still associated data types. They are easier to deal with than associated types, because the type function is guaranteed to be injective.

class (IsString (XString t), Num (XDouble t)) => Ops t where
   data XString t :: *
   data XDouble t :: *
   (+++) :: XString t -> XString t -> XString t
   xshow :: XDouble t -> XString t

data Basic = Basic

instance Ops Basic where
   newtype XString Basic = XSB String deriving (Eq, Ord, Show)
   newtype XDouble Basic = XDB Double deriving (Eq, Ord, Show)
   XSB x +++ XSB y = XSB (x ++ y)
   xshow (XDB x) = XSB (show x)
instance Num (XDouble Basic) where
   XDB x + XDB y = XDB (x+y)
   fromInteger = XDB . fromInteger
instance IsString (XString Basic) where
   fromString = XSB

At last, this actually works! But it's at a price. We can no longer use the types we want in the instance declaration, instead we are forced to invent new types. Using this approach the original multi-parameter version could have been made to work as well.

Normally the GeneralizedNewtypeDeriving language extension makes it relatively painless to introduce a newtype that has all the instances of the underlying type. But due to a bug in ghc you can't use this extension for associated newtypes. So we have to make manual instance declarations which makes this approach very tedious.

Conclusion

I have found no way of doing what I want. My request is very simple, I want to be able to abstract over the actual implementation of a module, where the module contains types, values, and instances.

Haskell normally excels in abstraction, but here I have found no natural way of doing what I want. Perhaps I'm just not clever enough to figure out how, but that is a failure of Haskell too. It should not take any cleverness to do something as simple as this. In ML this is the most natural thing in the world to do.

Associated types are not a replacement for a proper module system. They let you do some things, but others just don't work.

I'd be happy to see anyone doing this in Haskell in a simple way.

Labels: Haskell, Modules, overloading

13 Comments:

chak said...: Incidentally, Stefan Wehr is presenting a paper that came out of his Master thesis at APLAS 2008 on the same day as your blog post: ML Modules and Haskell Type Classes: A Constructive Comparison. We came to essentially the same conclusion as you do: classes with associated types do not provide the same abstraction as ML functors. We propose a form of "abstract associated types" to work around the problem to some extent.; Wednesday, December 10, 2008 at 1:52:00 AM GMT
單中杰 said...: But the superiority of Haskell defenders say, Haskell has higher-order type constructors, that's the way to abstract! Never mind type classes.

Before you tried associated types above, you came to the conclusion that what you need -- all you need, in fact -- are record kinds, i.e., to be able to bundle up multiple type parameters and pass them as a single type argument. That is where ML-style module systems shine; that is what associated type synonyms let you simulate; that is what gives us first-class structures in Haskell and typed cross-module compilation.

Given that you're such a fan of ML (: I assume that you wouldn't mind losing a bit of implicit overloading resolution. You already have all the tools:

{-# LANGUAGE TypeFamilies, OverloadedStrings, TypeSynonymInstances, RecordWildCards, FlexibleContexts #-}

import Data.String

class (IsString (XString t), Num (XDouble t)) => Ops t where
    type XString t :: *
    type XDouble t :: *

data Ops t => DOps t = DOps {
    (+++) :: XString t -> XString t -> XString t,
    xshow :: XDouble t -> XString t
    }

data Ops t => Person t = Person {
    firstName :: XString t,
    lastName :: XString t,
    height :: XDouble t
    }

display :: (Ops t) => DOps t -> Person t -> XString t
display ops p =
    let DOps{..} = ops
    in firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1); Wednesday, December 10, 2008 at 2:21:00 AM GMT
Unknown said...: In this case, you can actually modify your classes to give a pretty good solution with associated types.

Same as before:

> {-# LANGUAGE TypeFamilies, MultiParamTypeClasses, OverloadedStrings,
> FlexibleInstances, TypeSynonymInstances, ScopedTypeVariables,
> FunctionalDependencies, RecordWildCards, FlexibleContexts,
> GeneralizedNewtypeDeriving #-}
>
> module Person(Person(..), display) where
>
> import Data.String
>
> data Person t = Person {
> firstName :: XString t,
> lastName :: XString t,
> height :: XDouble t
> }
>

Now the important thing is to define the typeclasses
differently. Since (+++) only depends on the string
and not on the double type, we should put it in a class
which doesn't mention the double:

> class IsString s => IsXString s where
> (+++) :: s -> s -> s

Also, the xshow function only depends on the
XDouble and XString types, but not on the "t"
type in Ops. For instance, if we have

type XString Basic1 = String
type XDouble Basic1 = Double

type XString Basic2 = String
type XDouble Basic2 = Double

then the xshow instance for Basic1 and Basic2
must be the same. So we can't mention Basic1/Basic2/t
in the class where xshow is defined. So we write

> class (Num d, IsXString s) => IsXDouble s d where
> xshow :: d -> s

> class (IsXDouble (XString t) (XDouble t)) => Ops t where
> type XString t :: *
> type XDouble t :: *

and we give a Basic instance:

> instance IsXString String where
> (+++) = (++)
> instance IsXDouble String Double where
> xshow = show
>
> data Basic = Basic
>
> instance Ops Basic where
> type XString Basic = String
> type XDouble Basic = Double

and we can write display in the original form as

> display :: Ops t => Person t -> XString t
> display p = firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1)

If we give no type signature for display, then GHC infers
its type as
display :: (IsXDouble (XString t) (XDouble t)) =>
Person t -> XString t

but the (Ops t) context is cleaner.; Wednesday, December 10, 2008 at 3:33:00 AM GMT
Stefan Wehr said...: The APLAS paper written by chak and me (ML Modules and Haskell Type Classes: A Constructive Comparison) gives you a systematic way to derive a Haskell program given a program making use of ML-style modules. It seems that you are after the following ML program (the program is written in OCaml; it ignores the fact that the original display function inserts whitespace between the arguments):

http://www.informatik.uni-freiburg.de/~wehr/download/Lennart.ml

Using the techniques explained in the paper, you would arrive at the following Haskell program:

http://www.informatik.uni-freiburg.de/~wehr/download/Lennart.hs; Wednesday, December 10, 2008 at 6:16:00 AM GMT
augustss said...: To chak: Thanks for the reference, I'll read that. I also thought some kind of abstract associated types would be what I want.

To 單中杰: I do mind adding an extra argument to display. The ML version doesn't have that.

To Reiner Pope: Nice! That works. Now I have to blog about my next problem and see if someone can solve that.

To Stefan Wehr: As I said above, I'm not happy with the extra argument to display. In a real example that is very messy since it involves many changes, and doesn't work with operators.
It's nice that there is a systematic translation to Haskell, but it's too involved for my taste.; Wednesday, December 10, 2008 at 8:54:00 AM GMT
單中杰 said...: Lennart, could you please post the ML code you have in mind that "doesn't have" "an extra argument to display"? Stefan Wehr's definition of display in ML contains "O." all over the place, which corresponds to an extra argument in Haskell.; Wednesday, December 10, 2008 at 11:38:00 AM GMT
augustss said...: To 單中杰: Stick an 'open O;' at the top of the module. My ML is a bit rusty so maybe it needs a bit more syntax than that. Or maybe it's called 'include'. :); Wednesday, December 10, 2008 at 1:51:00 PM GMT
單中杰 said...: Huh? "Stick an 'open O;' at the top" corresponds precisely to "let DOps{..} = ops" in your and my code. The level of implicit overloading resolution that you seem to want in Haskell would correspond to being able to say "open O1; open O2;" without identifier clashes in ML.; Wednesday, December 10, 2008 at 2:54:00 PM GMT
augustss said...: No, it doesn't correspond to 'let DOps{..} = ...', because the open is at the top level in the module and brings the operations into scope for all function in the module. The display function doesn't get an extra argument, nor any other function in the module. It's the module that has an argument, and I'm all right with that.; Wednesday, December 10, 2008 at 3:04:00 PM GMT
單中杰 said...: I see. I was confused because your/Wehr's "MkPerson" functor returns a module with exactly one value component, namely "display". So, please just rename "display" to "mkPerson" in my code:

data Ops t => DPerson t = DPerson { display :: Person t -> XString t }
mkPerson :: (Ops t) => DOps t -> DPerson t
mkPerson ops = let DOps{..} = ops in DPerson { display = \p -> firstName p +++ " " +++ lastName p +++ " " +++ xshow (height p + 1) }

Now the display function (a component of the record returned by mkPerson) takes exactly one argument. (One implementation of the display function can be accessed using the expression "display basicPerson".); Wednesday, December 10, 2008 at 3:24:00 PM GMT
augustss said...: Yes, that works all right for defining a bunch of values. But a module contains values, types, classes, instances. I would want to open dops so that is scopes of all those definitions.

Furthermore, I think your changes are again intrusive. Imagine how you would change a module from not having a parameter to having one. The module contains 1000s of lines and I don't really want to touch most of them.
In ML I change the start of the module to add the functor argument, and then I open the argument.
In your encoding I'd have to group all the values together into a record, do something else again with the types. And I don't even know what to do with classes and instances.; Wednesday, December 10, 2008 at 3:46:00 PM GMT
單中杰 said...: I agree that, just as "let DOps{..} = ops" is nice to have at the value level, it would be nice to have something like "type Ops{..} = Basic" at the type level, once we recognize that this use of associated type synonyms is really getting at a type-record definition "kind Ops = Ops { XString :: *, XDouble :: * }". Classes and instances are global anyway, so it doesn't make sense (in this Haskell, I mean) to simulate functors that return classes or instances.

As for your "intrusive" complaint: it wouldn't be intrusive if you had written your code inside a big record in the first place. (: What Haskell calls modules is purely a namespace mechanism and should not be used for abstraction.; Wednesday, December 10, 2008 at 4:00:00 PM GMT
augustss said...: You are right, Haskell modules is a namespace mechanism, and I want more. :)
And since we're talking about real code here, I'm not going to advocate writing all Haskell code in some obscure style where all values live in some big record, just in case we want to abstract later.
ML modules can be encoded in Haskell in various ways, but I find all of them painful. I wish Haskell had something similar to ML's modules.; Wednesday, December 10, 2008 at 4:08:00 PM GMT