I don't have a problem with all the character encoding infrastructure and whatnot, but it makes sense logically in an Language API to provide
access to the lowest level of IO that is possible, whether the more
advanced operations themselves are implemented in Haskell or specially
by the compiler is an implementation issue and an optimization. the
point is that if i wanted to write my own character handling code in
Haskell there is no way to because the base IO primitives (of reading
and writing bytes) are hidden below the current IO API and 'implementation defined' behavior.
My proposal was meant to satisfy three different needs which are not currently
met by the Haskell IO primitives:
- program determinism (soundness): the goal is to write one program,
guaranteed to produce the exact same byte output for a given byte stream input
regardless of compiler/platform it is run on. if this is impossible for a
given platform, the compilation should die with an error,
- IO
to/from externally defined byte formats: XDR encoded files, RIFF files, network
protocols, odd character encodings, people need to be able to read and write
these in a standard way to files as well as things like sockets and pipes.
-
there is no Byte type for people writing new libraries, each person who writes a
library which works on Byte streams must come up with their own kludge,
oftentimes this causes needless conflicting namespaces, and even more often it
is not strictly portable. a common solution is to assume a Char is a byte, which
is not true, the language specifies that a Char is a Unicode encoded character,
which arbitrary binary data is not, this leads to confusion and programming
errors because the type information is lost, there is nothing to distinguish a
raw UTF8 encoded byte stream from an actual Haskell string, people will be
tempted to call things like isLower on the Chars which will deceptively work as
long as they stick to ASCII and then people will be perplexed when their program
mysteriously stops working in Japan. Also there are other obvious differences
Bytes should be Integrals, Chars should not, On many platforms Chars will be 32
bits, this results in a 4 fold increase in space needed for evaluated byte
streams. often you use Byte streams in areas which have nothing at all to do
with character or string encoding, the use of String and [Char] in those cases
would be confusing to new and experienced users. if a database API has a
function lookup :: [Char] -> [Char] can it only work on strings? or arbitrary
byte streams? what is the character encoding used in the database if i want to
access it from C? the answers to all these questions are lost by using that
definition as opposed to the utterly unambiguous lookup::[Byte] -> [Byte].
without these properties it is almost impossible to write any 'real' application
in Haskell, mainly because most programs need to do at least one interesting
thing where interesting means interact with the outside world in a new/undefined
manner, the programmer needs to know his Haskell program or library will work as
expected across platforms.
the solution needs not be the best or even the most efficient, it merely needs
to solve these problems in a way that is not inconsistent with the current
language APIs and definition and be not difficult to implement for any compiler
writer. the reason is that it is unclear at this time what a completely efficient
API would look like, different compilers have their own efficient IO extensions
which people can use if they are that concerned, but it needs to be possible to
write portable programs and libraries that need the above properties independent
of the above. that is what i designed my API to be, simply implemented on top
of the current compilers private binary APIs yet simple and modeled after the
current Prelude functions so as to mesh well with existing codebase and
mindshare.