Error reporting in GHC is not always the nicest. For example, I often develop code by using
undefined as a placeholder for code I've not written yet. Here's a simple example:
import System.Environment
main = do
args <- getargs
if null args then
undefined
else
undefined
Running this looks like this:
$ ./H
H: Prelude.undefined
Which
undefined caused that? Looking at the error message we have no idea. Wouldn't it be nice with some location information?
We can actually get location information by using Control.Exception.assert:
import Control.Exception(assert)
import System.Environment
main = do
args <- getargs
if null args then
assert False undefined
else
assert False undefined
Now running it is much more informative:
$ ./H
H: H.hs:7:9-14: Assertion failed
How is
assert able to report the location? If we dig deep enough we discover that it's because the ghc compiler contains a special hack to recognize this function and give it location information.
A generalized hack
In a Haskell compiler that I've implemented I've taken this compiler hack and extended it so it can be used for any function. It comes in two parts, location information and location transparent definitions.
__LOCATION__
The
__LOCATION__ identifier is always defined and utterly magical. Its value is a string that describes the location of that very identifier. This is the very opposite of a referentially transparent name. In fact it's value varies with where it is placed in the code! So it's definitely not for purists. But I'm a practical man, so I sometimes have resort of the ugliness of reality. And in reality we want to report locations in errors.
Enough philosophy, here's an example:
main = do
print __LOCATION__
print __LOCATION__
And running it prints:
"test/Test.hs:2:11"
"test/Test.hs:3:13"
And to illustrate the impurity:
main = do
let loc = __LOCATION__
print loc
print loc
And running this:
"test/Test.mu:2:15"
"test/Test.mu:2:15"
Location transparency
The
__LOCATION__ identifier gives the location of itself. This is of little use on its own. Imagine the definition we could give for
undefined. Somewhere in the
Prelude module it could say something like
undefined = error ("undefined: " ++ __LOCATION__)
But if we use this all that it will tell us is where the definition of
undefined is, not where it was used.
To get the point of use instead of the definition I've introduced location transparent definitions. In a location transparent definition the __LOCATION__ identifier will not refer to its own position, but to the position of the reference to the definition. Location transparency is introduced with a pragma.
{-# LOCATIONTRANSPARENT undefined #-}
undefined = error ("undefined: " ++ __LOCATION__)
With this definition our initial example looks like this when we run it:
undefined: test/H.hs:6:9
In fact, the real definition of
undefined doesn't look like that. The
__LOCATION__ identifier is only used in the definition of
error, so it looks something like this:
{-# LOCATIONTRANSPARENT error #-}
error :: String -> a
error s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s))
{-# LOCATIONTRANSPARENT undefined #-}
undefined = error "undefined"
Since both
error and
undefined are transparent any use of
undefined will be reported with the location of the use.
Furthermore, we can make a few more functions location transparent, e.g.,
{-# LOCATIONTRANSPARENT head #-}
head :: [a] -> a
head [] = error "Empty list"
head (x:xs) = x
A simple example:
main = putStr (head [])
Which will print:
test/Head.hs:1:16: Empty list
which is the location where
head was called.
Implementation
There are different ways to implement this feature, and I'm going to sketch two of them.
First: Every function that has the LOCATIONTRANSPARENT pragma will be inlined at the point of use, and the __LOCATION__ identifier in the inlined code will be updated to reflect the call site. The definitions must be processed in a bottom-up fashion for this to work. It's fairly simple to implement, but will cause some code bloat due to inlining.
Second: Every function that has LOCATIONTRANSPARENT pragma will be rewritten (by the compiler) to have an extra location argument, and each use of this function will be rewritten to pass in the current location.
For example (using $$f for the location version of f):
main = putStr ($$head __LOCATION__ [])
$$head __LOCATION__ [] = $$error __LOCATION__ "Empty list"
$$head __LOCATION__ (x:xs) = x
$$error __LOCATION__ s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s))
This should be fairly straightforward to implement, but I've not tried it. (It's somewhat like dynamic binding, so maybe ghc could reuse that mechanism for locations.)
And, of course, the global __LOCATION__ identifier has to be recognized by the compiler and replaced by a string that is its location.
Conclusion
I implemented the
__LOCATION__ hack quite a while ago, and I like the much improved reporting of error locations. I hope someone will add it to ghc as well.
Labels: Haskell