Trivial UTF-8

Trivial UTF-8 is a small library for doing UTF-8-based in- and output on a Lisp implementation that already supports Unicode -- meaning char-code and code-char deal with Unicode character codes.

The rationale for the existence of this library is that while Unicode-enabled implementations usually do provide some kind of interface to dealing with character encodings, these are typically not terribly flexible or uniform.

The Babel library solves a similar problem while understanding more encodings. Trivial UTF-8 was written before Babel existed, but for new projects you might be better off going with Babel. The one plus that Trivial UTF-8 has is that it doesn't depend on any other libraries.

Download and installation

Trivial-utf-8 is released under a BSD-style license (see source file). The latest release can be downloaded from http://common-lisp.net/project/trivial-utf-8/trivial-utf-8.tgz, or installed with asdf-install.

A darcs repository with the most recent changes can be checked out with:

> darcs get http://common-lisp.net/project/trivial-utf-8/darcs/trivial-utf-8

Or look at it online.

Support and mailing lists

The trivial-utf-8-devel mailing list can be used for any questions, discussion, bug-reports, patches, or anything else relating to this library. Or mail the author/maintainer directly: Marijn Haverbeke.

Reference

function string-to-utf-8-bytes (string) => array of (unsigned-byte 8)

Convert a string into an array of unsigned bytes containing its utf-8 representation.

function utf-8-bytes-to-string (bytes) => string

Convert a byte array containing utf-8 encoded characters into the string it encodes.

function write-utf-8-bytes (string output &key null-terminate)

Write a string to a byte-stream, encoding it as utf-8.

function read-utf-8-string (input &key null-terminated stop-at-eof char-length byte-length)

Read utf-8 encoded data from a byte stream and construct a string with the characters found. When null-terminated is given it will stop reading at a null character, stop-at-eof tells it to stop at the end of file without raising an error, and the char-length and byte-length parameters can be used to specify the maximum amount of characters or bytes to read.

function utf-8-byte-length (string) => integer

Calculate the amount of bytes needed to encode a string.

function utf-8-group-size (byte) => integer

Determine the amount of bytes that are part of the character starting with a given byte.

condition utf-8-decoding-error

A condition of this type is raised whenever an incorrectly encoded character is encountered.


Back to Common-lisp.net.

Valid XHTML 1.0 Strict