Jhc does its own dependency chasing to track down source files, you need only provide it with the file containing your 'main' function on the command line. For instance, if you had a program 'HelloWorld.hs', the following would compile it to an executable named 'hello'.
; jhc -v HelloWorld.hs -o hello
Libraries are built by passing jhc a file describing the library via the --build-hl option. The file format is a simplified version of the cabal format. The name of the generated file will be
; jhc -v --build-hl mylibrary.cabal
jhc libraries are distributed as files with an 'hl' suffix, such as 'base-1.0.hl'. You simply need to drop this file somewhere that jhc can find it. for instance, $HOME/lib/jhc. You can then set $JHCLIBPATH to said directory, or specify it on the command line with the '-L' option. Extra libraries are specified on the command line with the '-p' option.
; jhc -v -L/home/john/devel/jhc -pmylibrary MyProgram.hs -o myprogram
Using make to build projects with jhc is straightforward, simply add a line like the following in your Makefile
% : %.hs
jhc -v $< -o $@
Or, to build a library, something similar to this will do.
%.hl : %.cabal
jhc -v --build-hl $< -o $@
Jhc does its own dependency chasing to track down source files, you need only provide it with the file containing your 'main' function on the command line. For instance, if you had a program 'HelloWorld.hs', the following would compile it to an executable named 'hello'.
; jhc -v HelloWorld.hs -o hello
Libraries are built by passing jhc a file describing the library via the --build-hl option. The file format is a simplified version of the cabal format. The name of the generated file will be
; jhc -v --build-hl mylibrary.cabal
jhc libraries are distributed as files with an 'hl' suffix, such as 'base-1.0.hl'. You simply need to drop this file somewhere that jhc can find it. for instance, $HOME/lib/jhc. You can then set $JHCLIBPATH to said directory, or specify it on the command line with the '-L' option. Extra libraries are specified on the command line with the '-p' option.
; jhc -v -L/home/john/devel/jhc -pmylibrary MyProgram.hs -o myprogram
Using make to build projects with jhc is straightforward, simply add a line like the following in your Makefile
% : %.hs
jhc -v $< -o $@
Or, to build a library, something similar to this will do.
%.hl : %.cabal
jhc -v --build-hl $< -o $@
Usage: jhc [OPTION...] Main.hs
-V --version print version info and exit
--version-context print version context info and exit
--help print help information and exit
--config show a variety of config info
-v --verbose chatty output on stderr
-z Increase verbosity of statistics
-d [no-]flag dump specified data during compilation
-f [no-]flag set or clear compilation options
-o FILE --output=FILE output to FILE
-i DIR --include=DIR where to look for source files
-I DIR add to preprocessor include path
-D NAME=VALUE add new definitions to set in preprocessor
--optc=option extra options to pass to c compiler
-N --noprelude no implicit prelude
-C Typecheck, compile ho and grin
-c Typecheck and compile ho
-k --keepgoing keep going on errors
--cross enable cross-compilation, choose target with the -m flag
--width=COLUMNS width of screen for debugging output
--main=Main.main main entry point
-m arch --arch=arch target architecture options
--entry=<expr> main entry point, showable expression
-e <statement> run given statement as if on jhci prompt
--debug debugging
--show-ho=file.ho Show ho file
--noauto Don't automatically load base and haskell98 packages
-p file.hl Load given haskell library .hl file
-L path Look for haskell libraries in the given directory
--build-hl=file.cabal Build hakell library from given library description file
--interactive run interactivly
--ignore-ho Ignore existing haskell object files
--nowrite-ho Do not write new haskell object files
--no-ho same as --ignore-ho and --nowrite-ho
--ho-cache=HOCACHEDIR Use a global ho cache located at the argument
--ho-dir=<dir> Where to place and look for ho files
--stale=Module Treat these modules as stale, even if a ho file is present
--dependency Follow import dependencies only then quit
--no-follow-deps Don't follow depencies not listed on command line
--list-libraries List of installed libraries
--print-hsc-options print options to pass to hsc2hs
valid -d arguments: 'help' for more info
all-dcons, all-kind, all-types, aspats, bindgroups, boxy-steps, class, class-summary, core
core-afterlift, core-beforelift, core-initial, core-mangled, core-mini, core-pass, core-steps
datatable, datatable-builtin, dcons, decls, defs, derived, e-alias, e-info, e-size, e-verbose
exports, grin, grin-datalog, grin-final, grin-graph, grin-initial, grin-normalized, grin-pass
grin-posteval, grin-preeval, grin-steps, html, imports, ini, instance, kind, kind-steps
optimization-stats, parsed, preprocessed, program, progress, renamed, rules, rules-spec
scc-modules, sigenv, srcsigs, stats, steps, tags, the, types, tyvar, verbose, veryverbose
valid -f arguments: 'help' for more info
bang-patterns, boehm, controlled, cpp, cpr, debug, default, defaulting, ffi, float-in, full-int
global-optimize, inline-pragmas, jgc, lint, m4, monomorphism-restriction, negate, profile, raw
rules, strictness, type-analysis, unboxed-tuples, unboxed-values, via-ghc, wrapper
You can have jhc print out a variety of things while running as Controlled by the '-d' flag. The following is a list of possible parameters you can pass to '-d'.
Front End | |
---|---|
defs | Show all defined names in a module |
derived | show generated derived instances |
exports | show which names are exported from each module |
imports | show in scope names for each module |
ini | all ini configuration options |
parsed | parsed code |
preprocessed | code after preprocessing/deliting |
renamed | code after uniqueness renaming |
scc-modules | show strongly connected modules in dependency order |
Type Checker | |
---|---|
all-dcons | show unified data constructor table |
all-kind | show unified kind table after everything has been typechecked |
all-types | show unified type table, after everything has been typechecked |
aspats | show as patterns |
bindgroups | show bindgroups |
boxy-steps | show step by step what the type inferencer is doing |
class | detailed information on each class |
class-summary | summary of all classes |
dcons | data constructors |
decls | processed declarations |
instance | show instances |
kind | show results of kind inference for each module |
kind-steps | show steps of kind inference |
program | impl expls, the whole shebang. |
sigenv | initial signature environment |
srcsigs | processed signatures from source code |
types | display unified type table containing all defined names |
tyvar | show original tyvars rather than renaming them. |
Intermediate code | |
---|---|
core | show intermediate core code |
core-afterlift | show final core before writing ho file |
core-beforelift | show core before lambda lifting |
core-initial | show core right after E.FromHs conversion |
core-mangled | de-typed core right before it is converted to grin |
core-mini | show details even when optimizing individual functions |
core-pass | show each iteration of code while transforming |
core-steps | show what happens in each pass |
datatable | show data table of constructors |
datatable-builtin | show data table entries for some built in types |
e-alias | show expanded aliases |
e-info | show info tags on all bound variables |
e-size | print the size of E after each pass |
e-verbose | print very verbose version of E code always |
optimization-stats | show combined stats of optimization passes |
rules | show all user rules and catalysts |
rules-spec | show specialization rules |
Grin code | |
---|---|
grin | dump all grin to the screen |
grin-datalog | print out grin information in a format suitable for loading into a database |
grin-final | final grin before conversion to C |
grin-graph | print dot file of final grin code to outputname_grin.dot |
grin-initial | grin right after conversion from core |
grin-normalized | grin right after first normalization |
grin-pass | show each iteration of code while transforming |
grin-posteval | show grin code just before eval/apply inlining |
grin-preeval | show grin code just before eval/apply inlining |
grin-steps | show what happens in each transformation |
steps | show interpreter go |
tags | list of all tags and their types |
General | |
---|---|
html | use html escape codes in output |
progress | show basic progress indicators |
stats | show extra information about stuff |
verbose | progress |
veryverbose | progress stats |
Various options affecting how jhc interprets and compiles code can be controlled with the '-f' flag, the following options are availible, you can negate any particular one by prepending 'no-' to it.
Code options | |
---|---|
bang-patterns | support bang pattern strictness annotations |
cpp | pass haskell source through c preprocessor |
ffi | support foreign function declarations |
m4 | pass haskell source through m4 preprocessor |
unboxed-tuples | allow unboxed tuple syntax to be recognized |
unboxed-values | allow unboxed value syntax |
Typechecking | |
---|---|
defaulting | perform defaulting of ambiguous types |
monomorphism-restriction | enforce monomorphism restriction |
Debugging | |
---|---|
lint | perform lots of extra type checks |
Optimization Options | |
---|---|
cpr | do CPR analysis |
float-in | perform float inward transform |
global-optimize | perform whole program E optimization |
inline-pragmas | use inline pragmas |
rules | use rules |
strictness | perform strictness analysis |
type-analysis | perhaps a basic points-to analysis on types right after method generation |
Code Generation | |
---|---|
boehm | use Boehm garbage collector |
debug | enable debugging code in generated executable |
full-int | extend Int and Word to 32 bits on a 32 bit machine (rather than 30) |
jgc | use the jgc garbage collector |
profile | enable profiling code in generated executable |
raw | just evaluate main to WHNF and nothing else. |
via-ghc | compile via ghc |
wrapper | wrap main in exception handler |
Default settings | |
---|---|
default | inline-pragmas rules wrapper float-in strictness defaulting type-analysis monomorphism-restriction boxy eval-optimize global-optimize full-int |
Unlike many other compilers, jhc is a native cross compiler. What this means is that every compile of jhc is able to create code for all possible target systems. This leads to many simplifications when it comes to cross compiling with jhc. Basically in order to cross compile, you need only pass the flag '--cross' to jhc, and pass an appropriate '-m' option to tell jhc what machine you are targetting. An example would be
; jhc --cross -mwin32 test/HelloWorld.hs
The targets list is extensible at run-time via the targets.ini file explained below.
This file determines what targets are available. The format consists of entries as follows.
[targetname]
key1=value
key2=value
key3+=value
merge=targetname2
merge is a special key meaning to merge the contents of another target into the current one. The configuration file is read in order, and the final value set for a given key is the one that is used.
An example describing how to cross compile for windows is as follows:
[win32]
gcc=i386-mingw32-gcc
cflags+=-mwindows -mno-cygwin
executable_extension=.exe
merge=i686
This sets the compiler to use as well as a few other options then jumps to the generic i686 routine. The special target [default] is always read before all other targets. If '--cross' is specified on the command line then this is the only implicitly included configuration, otherwise jhc will assume you are compiling for the current architecture and choose an appropriate target to include in addition to default.
jhc will attempt to read several targets.ini files in order. they are
$PREFIX/etc/jhc-$VERSION/targets.ini : this is the targets.ini that is included with jhc and contains the default options.
$PREFIX/etc/jhc-$VERSION/targets-local.ini : jhc will read this if it exists, it is used to specify custom system wide configuration options, such as the name of local compilers.
$HOME/.jhc/targets.ini : this is where a users local configuration information goes.
$HOME/etc/jhc/targets.ini : this is simply for people that prefer to not use hidden directories for configuration
The last value specified for an option is the one used, so a users local configuration overrides the system local version which overrides the built in options.
Option | Meaning |
---|---|
cc | what c compiler to use. generally this will be gcc for local builds and something like ARCH-HOST-gcc for cross compiles |
byteorder | one of le or be for little or big endian |
gc | what garbage collector to use. It should be one of static or boehm. |
cflags | options to pass to the c compiler |
cflags_debug | options to pass to the c compiler only when debugging is enabled |
cflags_nodebug | options to pass to the c compiler only when debugging is disabled |
profile | whether to include profiling code in the generated executable |
autoload | what haskell libraries to autoload, seperated by commas. |
executable_extension | specifies an extension that should be appended to executable files, (i.e. .EXE on windows) |
merge | a special option that merges the contents of another configuration target into the currrent one. |
bits | the number of bits a pointer contains on this architecture |
bits_max | the number of bits in the largest integral type. should be the number of bits in the 'intmax_t' C type. |
arch | what to pass to gcc as the architecture |
These must appear in the same file as the definition of a function. To apply one to a instance or class method, you must place it in the where clause of the instance or class declaration.
NOINLINE : Do not inline the given function during core transformations. The function may be inlined during grin transformations.
INLINE : Inline this function whenever possible
SUPERINLINE : Always inline no matter what, even if it means making a local copy of the functions body.
NOETA : When applied to a class method, do not perform eta expansion up to the number of arguments specified by the type.
RULES : rewrite rules. These have the same syntax and behave like GHC's rewrite rules, except 'phase' information is not allowed.
SPECIALIZE : create a version of a function that is specialized for a given type
SUPERSPECIALIZE : has the same effect as SPECIALIZE, but also places a run-time check in the generic version of the function to determine whether to call the specialized version.
These pragmas are only valid in the 'head' of a file, meaning they must come before the initial 'module' definition and in the first 4096 bytes of the file and must be preceded by and contain only characters in the ASCII character set.
NOPRELUDE : do not load the 'Prelude' automatically. equivalent to passing --noprelude on the command line.
OPTIONS_JHC : Specify extra options to use when processing this file. The options available are equivalent to the command line options, though, not all may have meaning when applied to a single file.
LANGUAGE : Specify various language options
These must appear in the same file as the definition of a function. To apply one to a instance or class method, you must place it in the where clause of the instance or class declaration.
NOINLINE : Do not inline the given function during core transformations. The function may be inlined during grin transformations.
INLINE : Inline this function whenever possible
SUPERINLINE : Always inline no matter what, even if it means making a local copy of the functions body.
NOETA : When applied to a class method, do not perform eta expansion up to the number of arguments specified by the type.
RULES : rewrite rules. These have the same syntax and behave like GHC's rewrite rules, except 'phase' information is not allowed.
SPECIALIZE : create a version of a function that is specialized for a given type
SUPERSPECIALIZE : has the same effect as SPECIALIZE, but also places a run-time check in the generic version of the function to determine whether to call the specialized version.
These pragmas are only valid in the 'head' of a file, meaning they must come before the initial 'module' definition and in the first 4096 bytes of the file and must be preceded by and contain only characters in the ASCII character set.
NOPRELUDE : do not load the 'Prelude' automatically. equivalent to passing --noprelude on the command line.
OPTIONS_JHC : Specify extra options to use when processing this file. The options available are equivalent to the command line options, though, not all may have meaning when applied to a single file.
LANGUAGE : Specify various language options
In addition to foreign imports of external functions as described in the FFI spec. Jhc supports 'primitive' imports that let you communicate primitives directly to the compiler. In general, these should not be used other than in the implementation of the standard libraries. They generally do little error checking as it is assumed you know what you are doing if you use them. All haskell visible entities are introduced via foreign declarations in jhc.
They all have the form
foreign import primitive "specification" haskell_name :: type
where "specification" is one of the following
seq : evaluate first argument to WHNF, then return the second argument
zero,one : the values zero and one of any primitive type.
const.C_CONSTANT : the text following const is directly inserted into the resulting C file
peek.TYPE : the peek primitive for raw value TYPE
poke.TYPE : the poke primitive for raw value TYPE
sizeOf.TYPE, alignmentOf.TYPE, minBound.TYPE, maxBound.TYPE, umaxBound.TYPE : various properties of a given internal type.
error.MESSAGE : results in an error with constant message MESSAGE.
constPeekByte : peek of a constant value specialized to bytes, used internally by Jhc.String
box : take an unboxed value and box it, the shape of the box is determined by the type at which this is imported
unbox : take an boxed value and unbox it, the shape of the box is determined by the type at which this is imported
increment, decrement : increment or decrement a numerical integral primitive value
fincrement, fdecrement : increment or decrement a numerical floating point primitive value
exitFailure__ : abort the program immediately
C-- Primitive : any C-- primitive may be imported in this manner.
Jhc supports monadic actions declared at the top level of your module. These can be used to do things such as initialize IORefs or allocate static data. An example of a top level action is the following.
import Jhc.ACIO
import Data.IORef
ref <- newIORefAC 0
count = do
modifyIORef ref (1 +)
readIORef ref >>= print
main = do
count
count
count
Which will print 1, 2, and 3. A special monad ACIO (which stands for Affine Central IO) is provided to restrict what may take place in top level actions. Basically, top level actions can only consist of IO that can be omitted or reordered without changing the meaning of a program. In practice, this means that it does not matter whether such actions are all performed at the beginning or are only computed once on demand.
If you need to use arbitrary IO, a utility function 'runOnce' is provided. using it you can ensure arbitrary IO actions are run only once and the return values shared, however you must access the value inside the IO monad, thus ensuring program integrity. An example using a hypothetical GUI library is below.
import Jhc.ACIO
getWindow <- runOnce $ do
connection <- newGUIConnection
window <- createWindow (640,480)
setTitle window "My Global Window"
return window
main = do
w <- getWindow
draw w "Hello!"
Note, top level global variables can be indicative of design issues. In general, they should only be used when necessary to interface with an external library, opaque uses inside a library where the shared state can not be externally observed, or inside your Main program as design dictates.
Unboxed values in jhc are specified in a similar fashion to GHC however the lexical syntax is not changed to allow # in identifiers. # is still used in the syntax for various unboxed constructs, but normal Haskell rules apply to other Haskell values. The convention is to suffix such types with '_' to indicate their status as unboxed.
Jhc supports unboxed tuples with the same syntax as GHC, (# 2, 4 #) is an unboxed tuple of two numbers. Unboxed tuples are enabled with -funboxed-tuples
Unboxed strings are enabled with the -funboxed-values flag. They are specified like a normal string but have a '#' at the end. Unboxed strings have types 'Addr_' which is as synonym for 'BitsPtr'
Unboxed numbers are enabled with the -funboxed-values flag. They are postpended with a '#' such as in 3# or 4#. Jhc supports a limited form of type inference for unboxed numbers, if the type is fully specified by the environment and it is a suitable unboxed numeric type then that type is used. Otherwise it defaults to Int__.
In addition to foreign imports of external functions as described in the FFI spec. Jhc supports 'primitive' imports that let you communicate primitives directly to the compiler. In general, these should not be used other than in the implementation of the standard libraries. They generally do little error checking as it is assumed you know what you are doing if you use them. All haskell visible entities are introduced via foreign declarations in jhc.
They all have the form
foreign import primitive "specification" haskell_name :: type
where "specification" is one of the following
seq : evaluate first argument to WHNF, then return the second argument
zero,one : the values zero and one of any primitive type.
const.C_CONSTANT : the text following const is directly inserted into the resulting C file
peek.TYPE : the peek primitive for raw value TYPE
poke.TYPE : the poke primitive for raw value TYPE
sizeOf.TYPE, alignmentOf.TYPE, minBound.TYPE, maxBound.TYPE, umaxBound.TYPE : various properties of a given internal type.
error.MESSAGE : results in an error with constant message MESSAGE.
constPeekByte : peek of a constant value specialized to bytes, used internally by Jhc.String
box : take an unboxed value and box it, the shape of the box is determined by the type at which this is imported
unbox : take an boxed value and unbox it, the shape of the box is determined by the type at which this is imported
increment, decrement : increment or decrement a numerical integral primitive value
fincrement, fdecrement : increment or decrement a numerical floating point primitive value
exitFailure__ : abort the program immediately
C-- Primitive : any C-- primitive may be imported in this manner.
Jhc supports monadic actions declared at the top level of your module. These can be used to do things such as initialize IORefs or allocate static data. An example of a top level action is the following.
import Jhc.ACIO
import Data.IORef
ref <- newIORefAC 0
count = do
modifyIORef ref (1 +)
readIORef ref >>= print
main = do
count
count
count
Which will print 1, 2, and 3. A special monad ACIO (which stands for Affine Central IO) is provided to restrict what may take place in top level actions. Basically, top level actions can only consist of IO that can be omitted or reordered without changing the meaning of a program. In practice, this means that it does not matter whether such actions are all performed at the beginning or are only computed once on demand.
If you need to use arbitrary IO, a utility function 'runOnce' is provided. using it you can ensure arbitrary IO actions are run only once and the return values shared, however you must access the value inside the IO monad, thus ensuring program integrity. An example using a hypothetical GUI library is below.
import Jhc.ACIO
getWindow <- runOnce $ do
connection <- newGUIConnection
window <- createWindow (640,480)
setTitle window "My Global Window"
return window
main = do
w <- getWindow
draw w "Hello!"
Note, top level global variables can be indicative of design issues. In general, they should only be used when necessary to interface with an external library, opaque uses inside a library where the shared state can not be externally observed, or inside your Main program as design dictates.
Unboxed values in jhc are specified in a similar fashion to GHC however the lexical syntax is not changed to allow # in identifiers. # is still used in the syntax for various unboxed constructs, but normal Haskell rules apply to other Haskell values. The convention is to suffix such types with '_' to indicate their status as unboxed.
Jhc supports unboxed tuples with the same syntax as GHC, (# 2, 4 #) is an unboxed tuple of two numbers. Unboxed tuples are enabled with -funboxed-tuples
Unboxed strings are enabled with the -funboxed-values flag. They are specified like a normal string but have a '#' at the end. Unboxed strings have types 'Addr_' which is as synonym for 'BitsPtr'
Unboxed numbers are enabled with the -funboxed-values flag. They are postpended with a '#' such as in 3# or 4#. Jhc supports a limited form of type inference for unboxed numbers, if the type is fully specified by the environment and it is a suitable unboxed numeric type then that type is used. Otherwise it defaults to Int__.
Class contexts on data types are silently ignored.
Class methods are fully 'eta expanded' out to the argument count specified by the type. This is often beneficial as instances that need to share partial applications are rare. This behavior can be turned off with the NOETA pragma for specific methods.
In addition to a larger set of base libraries roughly modeled on GHC's base. Jhc provides a number of extensions/minor modifications to the standard libraries. These are designed to be mostly backwards compatible and most are to the class system.
There are many other additional libraries provided with jhc, here I list only changes that affect modules that are defined by the haskell 98 or FFI specifications.
Data.Int and Data.Word provide WordPtr, WordMax, IntPtr, and IntMax that correspond to the C types uintptr_t, uintmax_t, intptr_t, and intmax_t respectively.
fromInt,toInt,fromDouble,toDouble have been added alongside Integer and Rational routines in their respective classes.
floating point truncation and rounding functions have varieties that don't return an integral type, but rather return something of the same type as its argument. These have the same name but end in 'f'.
Jhc differs from GHC in certain ways that are allowed by Haskell 98, but might come as a surprise to some.
An Int may be only 30 bits and may not observe simple binary truncation on overflow. If you need known bit width and binary semantics for your numbers then use the types in Data.Int and Data.Word. Overflow on Int or Word has undefined results.
A Char may only preserve values within the Unicode range. Storing values greater than 0x10FFFF has undefined results.
The Int and Word types are at most 32 bits, even on 64 bit architectures.
All text based IO is performed according to the current locale. This means that Unicode works seamlessly, but older programs that assumed IO was performed by simple truncation of chars down to 8 bits will fail. Use the explicit binary routines if you need binary IO.
These misfeatures will be fixed at some point.
Integer corresponds to IntMax rather than an arbitrary precision type. As soon as a suitable arbitrary precision library emerges, it will be replaced.
Ix is not derivable.
Class contexts on data types are silently ignored.
Class methods are fully 'eta expanded' out to the argument count specified by the type. This is often beneficial as instances that need to share partial applications are rare. This behavior can be turned off with the NOETA pragma for specific methods.
In addition to a larger set of base libraries roughly modeled on GHC's base. Jhc provides a number of extensions/minor modifications to the standard libraries. These are designed to be mostly backwards compatible and most are to the class system.
There are many other additional libraries provided with jhc, here I list only changes that affect modules that are defined by the haskell 98 or FFI specifications.
Data.Int and Data.Word provide WordPtr, WordMax, IntPtr, and IntMax that correspond to the C types uintptr_t, uintmax_t, intptr_t, and intmax_t respectively.
fromInt,toInt,fromDouble,toDouble have been added alongside Integer and Rational routines in their respective classes.
floating point truncation and rounding functions have varieties that don't return an integral type, but rather return something of the same type as its argument. These have the same name but end in 'f'.
Jhc differs from GHC in certain ways that are allowed by Haskell 98, but might come as a surprise to some.
An Int may be only 30 bits and may not observe simple binary truncation on overflow. If you need known bit width and binary semantics for your numbers then use the types in Data.Int and Data.Word. Overflow on Int or Word has undefined results.
A Char may only preserve values within the Unicode range. Storing values greater than 0x10FFFF has undefined results.
The Int and Word types are at most 32 bits, even on 64 bit architectures.
All text based IO is performed according to the current locale. This means that Unicode works seamlessly, but older programs that assumed IO was performed by simple truncation of chars down to 8 bits will fail. Use the explicit binary routines if you need binary IO.
These misfeatures will be fixed at some point.
Integer corresponds to IntMax rather than an arbitrary precision type. As soon as a suitable arbitrary precision library emerges, it will be replaced.
Ix is not derivable.
Jhc is very minimalist in that it does not have a precompiled run time system, but rather generates what is needed as part of the compilation process. However, we call whatever conventions and binary layouts used in the generated executable the run time system. Since jhc generates the code anew each time, it can build a different 'run time' based on compiler options, trading things like the garbage collector as needed or changing the closure layout when we know we have done full program optimization. This describes the 'native' layout upon which other conventions are layered.
A basic value in jhc is represented by a 'smart pointer' of c type sptr_t. a smart pointer is the size of a native pointer, but can take on different roles depending on a pair of tag bits.
smart pointers take on a general form as follows:
-------------------------
| payload | GL|
-------------------------
G - if set, then the garbage collector should not treat value as a pointer to be followed
L - lazy, this bit being set means the value is not in WHNF
A raw sptr_t on its own in the wild can only take on one of the following values:
-------------------------
| raw value | 10|
-------------------------
-------------------------
| whnf location | 00|
-------------------------
-------------------------
| lazy location | 01|
-------------------------
A raw value can be anything and not necessarily a pointer in general, a WHNF location is a pointer to some value in WHNF. The system places no restrictions on what is actually pointed to by a WHNF pointer, however the garbage collector in use may. In general, the back end is free to choose what to place in the raw value field or in what a WHNF points to with complete freedom. If an implementation sees the L bit is clear, it can pass on the smart pointer without examining it knowing the value is in WHNF.
A lazy location points to a potential closure or an indirection to a WHNF value. The lazy location is an allocated chunk of memory that is at least one pointer long. the very first location in a closure must be one of the following.
-------------------------
| raw value or whnf |X0|
-------------------------
An evaluated value, interpreted exactly as above. one can always replace any occurance of a lazy location with an evaluated indirecton.
-------------------------
| code pointer | 11|
-------------------------
| data ... |
This is something to evaluate, code pointer is a pointer to a function that takes the memory location as its only argument, the called function is in charge of updating the location if needed.
note that it is invalid to have a lazy location point to another lazy location. there is only ever one level of indirection allowed, and only from lazy locations
note that a partial application is just like any other value in WHNF as far as the above is concered. It happens to possibly contain a code pointer.
Jhc's core is based on a pure type system. A pure type system (also called a PTS) is actually a parameterized set of type systems. Jhc's version is described by the following.
Sorts = (*, !, **, #, (#), ##, □)
Axioms = (*:**, #:##, !:**, **:□, ##:□)
-- sort kind
* is the kind of boxed values
! is the kind of boxed strict values
# is the kind of unboxed values
(#) is the kind of unboxed tuples
-- sort superkind
** is the superkind of all boxed value
## is the superkind of all unboxed values
-- sort box
□ superkinds inhabit this
in addition there exist user defined kinds, which are always of supersort ##
The following Rules table shows what sort of abstractions are allowed, a rule of the form (A,B,C) means you can have functions of things of sort A to things of sort B and the result is something of sort C. Function in this context subsumes both term and type level abstractions.
Notice that functions are always boxed, but may be strict if they take an unboxed tuple as an argument. When a function is strict it means that it is represented by a pointer to code directly, it cannot be a suspended value that evaluates to a function.
These type system rules apply to lambda abstractions. It is possible that data constructors might exist that cannot be given a type on their own with these rules, even though when fully applied it has a well formed type. An example would be unboxed tuples. This presents no difficulty as one concludes correctly that it is a type error for these constructors to ever appear when not fully saturated with arguments.
as a shortcut we will use *# to mean every combination involving * and #, and so forth.
for instance, (*#,*#,*) means the set (*,*,*) (#,*,*) (*,#,*) (#,#,*)
Rules =
(*#!,*#!,*) -- functions from values to values are boxed and lazy
(*#!,(#),*) -- functions from values to unboxed tuples are boxed and lazy
((#),*#!,!) -- functions from unboxed tuples to values are boxed and strict
((#),(#),!) -- functions from unboxed tuples to unboxed tuples are boxed and strict
(**,*,*) -- may have a function from an unboxed type to a value
(**,#,*)
(**,!,*)
(**,**,**) -- we have functions from types to types
(**,##,##) -- Array__ a :: #
The defining feature of boxed values is
_|_ :: t iff t::*
This PTS is functional but not injective
The PTS can be considered stratified into the following levels
□ - sort box
**,##, - sort superkind
*,#,(#),! - sort kind
Int,Bits32_,Char - sort type
3,True,"bob" - sort value
The boxed kinds (* and !) represent types that have a uniform run time representation. Due to this, functions may be written that are polymorphic in types of these kinds. Hence the rules of the form (**,?,?), allowing taking types of boxed kinds as arguments.
the unboxed kind # is inhabited with types that have their own specific run time representation. Hence you cannot write functions that are polymorphic in unboxed types
Although sort box does not appear in the code, it is useful from a theoretical point of view to talk about certain types such as the types of unboxed tuples. Unboxed tuples may have boxed and unboxed arguments, without sort box it would be impossible to express this since it must be superkind polymorphic. sort box allows one to express this as (in the case of the unboxed 2-tuple)
∀s1:□ ∀s2:□ ∀k1:s1 ∀k2:s2 ∀t1:k1 ∀t2:k2 . (# t1, t2 #)
However, although this is a valid typing of what it would mean if a unboxed tuple were not fully applied, since we do not have any rules of form (##,?,?) or (□,?,?) this type obviously does not typecheck. Which is what enforces the invarient that unboxed tuples are always fully applied, and is also why we do not need a code representation of sort box.
You will notice that if you look at the axioms involving the sorts, you end up with a disjoint graph
□ - the box
/ \
** ## - superkind
/\ \
* ! # (#) - kind
This is simply due to the fact that nothing is polymorphic in unboxed tuples of kind (#) so we never need to refer to any super-sorts of them. We can add sorts (##),(□) and □□ to fill in the gaps, but since these sorts will never appear in code or discourse, we will ignore them from now on.
□□ - sort superbox
/ \
□ (□) - sort box
/ \ \
** ## (##) - sort superkind
/\ \ |
* ! # (#) - sort kind
Jhc core has a number of 'normalized forms' in which certain invarients are met. many routines expect code to be in a certain form, and guarentee theier output is also in a given form. The type system also can change with each form by adding/removing terms from the PTS axioms and rules.
normalized form alpha : There are basically no restrictions other than the code is typesafe, but certain constructs that are checked by the type checker are okay when they wouldn't otherwise be. In particular, 'newtype' casts still exist at the data level. 'enum' scrutinizations are creations may be in terms of the virtual constructors rather than the internal representations. let may bind unboxed values, which is normaly not allowed.
normalized form beta : This is like alpha except all data type constructors and case scrutinizations are in their final form. As in, newtype coercions are removed, Enums are desugared etc. also, 'let' bindings of unboxed values are translated to the appropriate 'case' statements. The output of E.FromHs is in this form.
normalized form blue : This is the form that most routines work on.
normalized form larry : post lambda-lifting
normalized form mangled : All polymorphism has been replaced with subtyping
Jhc is very minimalist in that it does not have a precompiled run time system, but rather generates what is needed as part of the compilation process. However, we call whatever conventions and binary layouts used in the generated executable the run time system. Since jhc generates the code anew each time, it can build a different 'run time' based on compiler options, trading things like the garbage collector as needed or changing the closure layout when we know we have done full program optimization. This describes the 'native' layout upon which other conventions are layered.
A basic value in jhc is represented by a 'smart pointer' of c type sptr_t. a smart pointer is the size of a native pointer, but can take on different roles depending on a pair of tag bits.
smart pointers take on a general form as follows:
-------------------------
| payload | GL|
-------------------------
G - if set, then the garbage collector should not treat value as a pointer to be followed
L - lazy, this bit being set means the value is not in WHNF
A raw sptr_t on its own in the wild can only take on one of the following values:
-------------------------
| raw value | 10|
-------------------------
-------------------------
| whnf location | 00|
-------------------------
-------------------------
| lazy location | 01|
-------------------------
A raw value can be anything and not necessarily a pointer in general, a WHNF location is a pointer to some value in WHNF. The system places no restrictions on what is actually pointed to by a WHNF pointer, however the garbage collector in use may. In general, the back end is free to choose what to place in the raw value field or in what a WHNF points to with complete freedom. If an implementation sees the L bit is clear, it can pass on the smart pointer without examining it knowing the value is in WHNF.
A lazy location points to a potential closure or an indirection to a WHNF value. The lazy location is an allocated chunk of memory that is at least one pointer long. the very first location in a closure must be one of the following.
-------------------------
| raw value or whnf |X0|
-------------------------
An evaluated value, interpreted exactly as above. one can always replace any occurance of a lazy location with an evaluated indirecton.
-------------------------
| code pointer | 11|
-------------------------
| data ... |
This is something to evaluate, code pointer is a pointer to a function that takes the memory location as its only argument, the called function is in charge of updating the location if needed.
note that it is invalid to have a lazy location point to another lazy location. there is only ever one level of indirection allowed, and only from lazy locations
note that a partial application is just like any other value in WHNF as far as the above is concered. It happens to possibly contain a code pointer.