CL-UNICODE - A portable Unicode library for Common Lisp


 

Abstract

CL-UNICODE is a library which provides Common Lisp implementations with knowledge about Unicode characters including their name, their general category, the scripts and blocks they belong to, their numerical value, and several other properties. It also provides the ability to replace the standard syntax for reading Lisp characters with one that is Unicode-aware and is used to enhance CL-PPCRE with Unicode properties.

CL-UNICODE is based on Unicode 5.1.

The code comes with a BSD-style license so you can basically do with it whatever you want.

Download shortcut: http://weitz.de/files/cl-unicode.tar.gz.


 

Contents

  1. Download and installation
  2. Support and mailing lists
  3. Function and variable reference
    1. Specific character properties
    2. General character properties
    3. Property symbols and look-up
    4. Character names
    5. Alternative reader syntax
    6. Miscellaneous
  4. Symbol index
  5. Acknowledgements

 

Download and installation

CL-UNICODE together with this documentation can be downloaded from http://weitz.de/files/cl-unicode.tar.gz. The current version is 0.1.1.

The library comes with a system definition for ASDF and you compile and load it in the usual way. It depends on CL-PPCRE.

CL-UNICODE builds parts of its source code automatically the first time it is compiled. This is done by parsing several Unicode data files which are included with the distribution and might take some time. This happens only once. FLEXI-STREAMS is needed for this process, but it is not used anymore once CL-UNICODE has been built.

You can run a test suite which tests most aspects of the library with

(asdf:oos 'asdf:test-op :cl-unicode)
(Some of these tests are expected to fail if your Lisp has a very low CHAR-CODE-LIMIT like for example CMUCL.)
 

Support and mailing lists

For questions, bug reports, feature requests, improvements, or patches please use the cl-ppcre-devel mailing list. If you want to be notified about future releases, subscribe to the cl-ppcre-announce mailing list. These mailing lists were made available thanks to the services of common-lisp.net.

If you want to send patches, please read this first.
 

Function and variable reference

Specific character properties


[Generic function]
general-category c => name, symbol


Returns the general category of a character as a string. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. The second return value is the property symbol of the category.
CL-USER 1 > (general-category #\A)
"Lu"
CL-UNICODE-NAMES::LU

CL-USER 2 > (general-category #\-)
"Pd"
CL-UNICODE-NAMES::PD

CL-USER 3 > (general-category #\8)
"Nd"
CL-UNICODE-NAMES::ND

See also GENERAL-CATEGORIES.


[Generic function]
script c => name, symbol


Returns the script of a character as a string or NIL if there is no script for that particular character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. The second return value (if there is one) is the property symbol of the script.
CL-USER 1 > (script #\B)
"Latin"
CL-UNICODE-NAMES::LATIN

CL-USER 2 > (script (code-char #x5d0))
"Hebrew"
CL-UNICODE-NAMES::HEBREW
See also SCRIPTS.


[Generic function]
code-block c => name, symbol


Returns the block of a character as a string or NIL if there is no block for that particular character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. The second return value (if there is one) is the property symbol of the block.
CL-USER 1 > (code-block #\a)
"Basic Latin"
CL-UNICODE-NAMES::BASICLATIN

CL-USER 2 > (code-block #\ä)
"Latin-1 Supplement"
CL-UNICODE-NAMES::LATIN1SUPPLEMENT
See also CODE-BLOCKS.


[Generic function]
has-binary-property c property => generalized-boolean


Checks whether a character has the binary property property. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. property can be a string naming the property or the corresponding property symbol. If a true value is returned, it is the property symbol.
CL-USER 1 > (has-binary-property #\Space "White_Space")
CL-UNICODE-NAMES::WHITESPACE

CL-USER 2 > (has-binary-property #\F "ASCII_Hex_Digit")
CL-UNICODE-NAMES::ASCIIHEXDIGIT

CL-USER 3 > (has-binary-property #\- "Dash")
CL-UNICODE-NAMES::DASH

CL-USER 4 > (has-binary-property #\= "Dash")
NIL
See also BINARY-PROPERTIES.


[Generic function]
numeric-type c => name, symbol


Returns the numeric type of a character (one of "Decimal", "Digit", or "Numeric") as a string or NIL if that particular character has no numeric type. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. The second return value (if there is one) is the property symbol of the numeric type.
CL-USER 1 > (numeric-type #\3)
"Decimal"
CL-UNICODE-NAMES::DECIMAL

CL-USER 2 > (numeric-type (character-named "VULGAR FRACTION THREE QUARTERS"))
"Numeric"
CL-UNICODE-NAMES::NUMERIC

CL-USER 3 > (numeric-type #\z)
NIL
NIL


[Generic function]
numeric-value c => number-or-nil


Returns the numeric value of a character as a Lisp rational or NIL (for NaN). c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.
CL-USER 1 > (numeric-value #\3)
3

CL-USER 2 > (numeric-value (character-named "VULGAR FRACTION THREE QUARTERS"))
3/4

CL-USER 3 > (numeric-value #\z)
NIL


[Generic function]
bidi-class c => name, symbol


Returns the bidirectional (Bidi) class of a character as a string or NIL if there is no bidirectional class for that particular character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. The second return value (if there is one) is the property symbol of the class.
CL-USER 1 > (bidi-class #\Space)
"WS"
CL-UNICODE-NAMES::WS

CL-USER 2 > (bidi-class #\A)
"L"
CL-UNICODE-NAMES::L

CL-USER 3 > (bidi-class (character-named "HEBREW LETTER ALEF"))
"R"
CL-UNICODE-NAMES::R
See also BIDI-CLASSES.


[Function]
bidi-mirroring-glyph c &key want-code-point-p => char-or-code-point


Returns the Bidi mirroring glyph for a character if the character has the BidiMirrored property and an appropriate mirroring glyph is defined. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.

Returns the code point instead of the character if want-code-point-p is true. This can be especially useful for Lisp implementations where CHAR-CODE-LIMIT is smaller than +CODE-POINT-LIMIT+.

CL-USER 1 > (bidi-mirroring-glyph #\[)
#\]

CL-USER 2 > (bidi-mirroring-glyph #\])
#\[

CL-USER 3 > (bidi-mirroring-glyph #\|)
NIL


[Function]
lowercase-mapping c &key want-code-point-p => char-or-code-point


Returns the simple lowercase mapping of a character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. Returns the character itself if no such mapping is explicitly defined. Note that case mapping only makes sense for characters with the LC property.

Returns the code point instead of the character if want-code-point-p is true. This can be especially useful for Lisp implementations where CHAR-CODE-LIMIT is smaller than +CODE-POINT-LIMIT+.

CL-USER 1 > (lowercase-mapping #\Ä)
#\ä

CL-USER 2 > (unicode-name (lowercase-mapping (character-named "GEORGIAN CAPITAL LETTER AN")))
"GEORGIAN SMALL LETTER AN"

CL-USER 3 > (lowercase-mapping (character-named "LATIN CAPITAL LETTER SHARP S"))
#\ß


[Function]
uppercase-mapping c &key want-code-point-p => char-or-code-point


Returns the simple uppercase mapping of a character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. Returns the character itself if no such mapping is explicitly defined. Note that case mapping only makes sense for characters with the LC property.

Returns the code point instead of the character if want-code-point-p is true. This can be especially useful for Lisp implementations where CHAR-CODE-LIMIT is smaller than +CODE-POINT-LIMIT+.

CL-USER 1 > (uppercase-mapping #\s)
#\S

CL-USER 2 > (unicode-name (uppercase-mapping (character-named "GLAGOLITIC SMALL LETTER AZU")))
"GLAGOLITIC CAPITAL LETTER AZU"


[Function]
titlecase-mapping c &key want-code-point-p => char-or-code-point


Returns the simple titlecase mapping of a character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point. Returns the character itself if no such mapping is explicitly defined. Note that case mapping only makes sense for characters with the LC property.

Returns the code point instead of the character if want-code-point-p is true. This can be especially useful for Lisp implementations where CHAR-CODE-LIMIT is smaller than +CODE-POINT-LIMIT+.

CL-USER 1 > (unicode-name (titlecase-mapping (char-code (character-named "LATIN SMALL LETTER DZ WITH CARON"))))
"LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON"

CL-USER 2 > (unicode-name (uppercase-mapping (char-code (character-named "LATIN SMALL LETTER DZ WITH CARON"))))
"LATIN CAPITAL LETTER DZ WITH CARON"


[Generic function]
combining-class c => class


Returns the combining class of a character as a non-negative integer. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.
CL-USER 1 > (combining-class #\~)
0

CL-USER 2 > (combining-class (character-named "COMBINING TILDE OVERLAY"))
1

CL-USER 3 > (combining-class (character-named "NON-SPACING DOUBLE OVERSCORE"))
230


[Generic function]
age c => age


Returns the age of a character or NIL if there is no age entry for that particular character. The age of a character is a list of two integers denoting the major and minor number of the Unicode version where the character first appeared. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.
CL-USER 1 > (age #\K)
(1 1)

CL-USER 2 > (age (character-named "HANGUL SYLLABLE PWILH"))
(2 0)

CL-USER 3 > (age (character-named "LATIN CAPITAL LETTER SHARP S"))
(5 1)


[Function]
general-categories => list


Returns a sorted list of all general categories known to CL-UNICODE. These are the possible return values of GENERAL-CATEGORY.
CL-USER 1 > (general-categories)
("Cc" "Cf" "Cn" "Co" "CS" "Ll" "Lm" "Lo" "Lt" "Lu" "Mc" "Me" "Mn" "Nd" "Nl" "No"
 "Pc" "Pd" "Pe" "Pf" "Pi" "Po" "Ps" "Sc" "Sk" "Sm" "So" "Zl" "Zp" "Zs")


[Function]
scripts => list


Returns a sorted list of all scripts known to CL-UNICODE. These are the possible return values of SCRIPT.


[Function]
code-blocks => list


Returns a sorted list of all blocks known to CL-UNICODE. These are the possible return values of CODE-BLOCK.


[Function]
binary-properties => list


Returns a sorted list of all binary properties known to CL-UNICODE. These are the allowed second arguments (modulo canonicalization) to HAS-BINARY-PROPERTY.
CL-USER 1 > (binary-properties)
("ASCII_Hex_Digit"
 "BidiMirrored"
 "Bidi_Control"
 "Dash"
 "Deprecated"
 "Diacritic"
 "Extender"
 "Hex_Digit"
 "Hyphen"
 "Ideographic"
 "IDS_Binary_Operator"
 "IDS_Trinary_Operator"
 "Join_Control"
 "Logical_Order_Exception"
 "Other_Alphabetic"
 "Other_Default_Ignorable_Code_Point"
 "Other_Grapheme_Extend"
 "Other_ID_Continue"
 "Other_ID_Start"
 "Other_Lowercase"
 "Other_Math"
 "Other_Uppercase"
 "Pattern_Syntax"
 "Pattern_White_Space"
 "Quotation_Mark"
 "Radical"
 "Soft_Dotted"
 "STerm"
 "Terminal_Punctuation"
 "Unified_Ideograph"
 "Variation_Selector"
 "White_Space")


[Function]
bidi-classes => list


Returns a sorted list of all Bidi classes known to CL-UNICODE. These are the possible return values of BIDI-CLASS.
CL-USER 1 > (bidi-classes)
("AL" "AN" "B" "BN" "CS" "EN" "ES" "ET" "L" "LRE" "LRO" "NSM" "ON" "PDF" "R" "RLE" "RLO" "S" "WS")

General character properties


[Function]
has-property c property => generalized-boolean


Checks whether a character has the named property property. property can be a string naming a property (which will be used for look-up after canonicalization) or it can be a property symbol (see PROPERTY-SYMBOL). c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.

Properties in the sense of CL-UNICODE can be names of general categories, scripts, blocks, binary properties, or Bidi classes, amongst other things. If there are a block and a script with the same name (like, say, "Cyrillic"), the bare name denotes the script. Prepend "Block:" to the name to refer to the block. (You can also prepend "Script:" to refer to the script unambiguously.) Names of Bidi classes must be prepended with "BidiClass:" if there's a potential for ambiguity.

This function also recognizes several aliases for properties (like "Symbol" for "S") and you can, as in Perl, prepend block names with "In" instead of "Block:" and most other properties with "Is". See RECOGNIZED-PROPERTIES.

Signals an error if no property named property was found.

CL-USER 1 > (has-property #\A "L")
T

CL-USER 2 > (has-property #\A "Letter")
T

CL-USER 3 > (has-property #\A "LC")
T

CL-USER 4 > (has-property #\A "CasedLetter")
T

CL-USER 5 > (has-property #\A "Lu")
T

CL-USER 6 > (has-property #\A "UppercaseLetter")
T

CL-USER 7 > (has-property #\A "IsUppercaseLetter")
T

CL-USER 8 > (has-property #\A "LowercaseLetter")
NIL

CL-USER 9 > (has-property #\A "Latin")
T

CL-USER 10 > (has-property #\A "Script:Latin")
T

CL-USER 11 > (has-property #\A "Script:Hebrew")
NIL

CL-USER 12 > (has-property #\A "Basic Latin")
T

CL-USER 13 > (has-property #\A "Block:BasicLatin")
T

CL-USER 14 > (has-property #\A "InBasicLatin")
T

CL-USER 15 > (has-property #\A "Block:Arabic")
NIL

CL-USER 16 > (has-property #\A "WhiteSpace")
NIL

CL-USER 17 > (has-property #\A "HexDigit")
CL-UNICODE-NAMES::HEXDIGIT

CL-USER 18 > (has-property #\A "BidiClass:L")
T

CL-USER 19 > (has-property #\A "BidiClass:Left-to-Right")
T

CL-USER 20 > (has-property #\A "LeftToRight")
T

CL-USER 21 > (has-property #\A "Any")
T

CL-USER 22 > (has-property #\A "Assigned")
T

CL-USER 23 > (has-property #\A "Unassigned")
NIL

CL-USER 24 > (has-property #\A "ASCII")
T
See also PROPERTY-TEST.


[Generic function]
property-test property &key errorp => function


Returns a unary function which can test code points or Lisp characters for the named property property. property is interpreted as in HAS-PROPERTY and PROPERTY-TEST is actually used internally by HAS-PROPERTY but might come in handy if you need a faster way to test for property (as you're saving the time to look up the property).

Returns NIL if no property named property was found or signals an error if errorp is true.

CL-USER 1 > (let ((ascii-tester (property-test "ASCII_Hex_Digit")))
              (count-if 'identity (map 'list ascii-tester "ALEF")))
3
See also CL-PPCRE's CREATE-OPTIMIZED-TEST-FUNCTION.


[Function]
list-all-characters property &key want-code-point-p => list


Lists all character (ordered by code point) which have the property property where property is interpreted as in HAS-PROPERTY. If want-code-point-p is true, a list of code points instead of a list of characters is returned. (If CHAR-CODE-LIMIT is smaller than +CODE-POINT-LIMIT+ in your Lisp implementation, the list of code points can actually be longer than the list of characters.).
CL-USER 1 > (mapcar 'unicode-name (list-all-characters "Grapheme_Link" :want-code-point-p t))
("DEVANAGARI SIGN VIRAMA"
 "BENGALI SIGN VIRAMA"
 "GURMUKHI SIGN VIRAMA"
 "GUJARATI SIGN VIRAMA"
 "ORIYA SIGN VIRAMA"
 "TAMIL SIGN VIRAMA"
 "TELUGU SIGN VIRAMA"
 "KANNADA SIGN VIRAMA"
 "MALAYALAM SIGN VIRAMA"
 "SINHALA SIGN AL-LAKUNA"
 "THAI CHARACTER PHINTHU"
 "TIBETAN MARK HALANTA"
 "MYANMAR SIGN VIRAMA"
 "MYANMAR SIGN ASAT"
 "TAGALOG SIGN VIRAMA"
 "HANUNOO SIGN PAMUDPOD"
 "KHMER SIGN COENG"
 "BALINESE ADEG ADEG"
 "SUNDANESE SIGN PAMAAEH"
 "SYLOTI NAGRI SIGN HASANTA"
 "SAURASHTRA SIGN VIRAMA"
 "REJANG VIRAMA"
 "KHAROSHTHI VIRAMA")


[Function]
recognized-properties &optional all => list


Returns a list of all property names known to CL-UNICODE. These are the allowed second arguments (modulo canonicalization) to HAS-PROPERTY. If all is true, known aliases (like Letter for L) are also included.
CL-USER 1 > (length (recognized-properties t))
996

Property symbols and look-up


[Function]
property-symbol name => symbol, name


Returns a symbol in the CL-UNICODE-NAMES packages (which is only used for this purpose) which can stand in for the string name in look-ups. The symbol's name is the result of canonicalizing and then upcasing name.

A symbol returned by this function is only really useful and only actually a property symbol if the second return value is true.

All exported functions of CL-UNICODE which return strings which are property names return the corresponding property symbol as their second return value. All exported functions of CL-UNICODE which accept property names as arguments will also accept property symbols.

CL-USER 1 > (property-symbol "XID_Start")
CL-UNICODE-NAMES::XIDSTART
"XIDStart"

CL-USER 2 > (property-symbol "Foo")
CL-UNICODE-NAMES::FOO
NIL
See also PROPERTY-NAME.


[Function]
property-name symbol => name-or-nil


Returns a name (not the name) for a property symbol symbol if it is known to CL-UNICODE. Note that
(STRING= (PROPERTY-NAME (PROPERTY-SYMBOL <string>)) <string>)
is not necessarily true even if the property name is not NIL while
(EQ (PROPERTY-SYMBOL (PROPERTY-NAME <symbol>)) <symbol>)
always holds if there is a property name for <symbol>.
CL-USER 1 > (property-name 'cl-unicode-names::asciihexdigit)
"ASCII_Hex_Digit"
See also PROPERTY-SYMBOL.


[Function]
canonicalize-name name => name'


Converts the string name into a canonicalized name which can be used for unambiguous look-ups by removing all whitespace, hyphens, and underline characters.

Tries not to remove hyphens preceded by spaces or underlines if this could lead to ambiguities as described in http://unicode.org/unicode/reports/tr18/#Name_Properties.

All CL-UNICODE functions which accept string names for characters or properties will canonicalize the name first using this function and will then look up the name case-insensitively.

CL-USER 1 > (canonicalize-name "Left-to-Right")
"LefttoRight"

CL-USER 2 > (canonicalize-name "Left_To_Right")
"LeftToRight"

CL-USER 3 > (string-equal * **)
T

CL-USER 4 > (canonicalize-name "TIBETAN LETTER A")
"TIBETANLETTERA"

CL-USER 5 > (canonicalize-name "TIBETAN LETTER -A")
"TIBETANLETTER -A"

CL-USER 6 > (canonicalize-name (canonicalize-name "TIBETAN LETTER A"))
"TIBETANLETTERA"

CL-USER 7 > (canonicalize-name (canonicalize-name "TIBETAN LETTER -A"))
"TIBETANLETTER -A"

CL-USER 8 > (canonicalize-name "Tibetan_Letter_-A")
"TibetanLetter -A"
Note that the preceding chracter is relevant in the ambiguous cases (but there are only three of them):
CL-USER 8 > (char= (character-named "TibetanLetter A") (character-named "TibetanLetter -A"))
NIL

CL-USER 9 > (char= (character-named "TibetanLetterA") (character-named "TibetanLetter-A"))
T

Character names


[Generic function]
unicode-name c => name-or-nil


Returns the Unicode name of a character as a string or NIL if there is no name for that particular character. c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.
CL-USER 1 > (unicode-name #\ß)
"LATIN SMALL LETTER SHARP S"

CL-USER 2 > (unicode-name #\ü)
"LATIN SMALL LETTER U WITH DIAERESIS"

CL-USER 3 > (unicode-name #xd4db)
"HANGUL SYLLABLE PWILH"


[Generic function]
unicode1-name c => name-or-nil


Returns the Unicode 1.0 name of a character as a string or NIL if there is no name for that particular character. This name is only non-NIL if it is significantly different from the Unicode name (see UNICODE-NAME). For control characters, sometimes the ISO 6429 name is returned instead.

c can be the character's code point (a positive integer) or a (Lisp) character assuming its character code is also its Unicode code point.

CL-USER 1 > (unicode-name (code-char 1))
NIL

CL-USER 2 > (unicode1-name (code-char 1))
"START OF HEADING"

CL-USER 3 > (unicode-name (code-char #x67e))
"ARABIC LETTER PEH"

CL-USER 4 > (unicode1-name (code-char #x67e))
"ARABIC LETTER TAA WITH THREE DOTS BELOW"


[Function]
character-named name &key want-code-point-p try-unicode1-names-p try-abbreviations-p scripts-to-try try-hex-notation-p try-lisp-names-p => char-or-code-point


Returns the character which has the name name (a string) by looking up the Unicode name (see UNICODE-NAME).

If try-unicode1-names is true, the Unicode 1.0 name (see UNICODE1-NAME) will be used as a fallback.

If try-abbreviations-p is true, name is treated as an abbreviation as follows: If name contains a colon, it is interpreted as "<script>:<short-name>" and the function tries to look up, in turn, the characters named "<script> <size> LETTER <short-name>", "<script> LETTER <short-name>", and "<script> <short-name>" where <size> is "SMALL" if none of the characters in <short-name> is uppercase, "CAPITAL" otherwise. If name does not contain a colon, the same algorithm as above is tried with name instead of <short-name> and each element of the list of strings scripts-to-try as <string>. (scripts-to-try can also be a single string which is interpreted as a one-element list.)

If try-hex-notation-p is true, name can be of the form "U+<x>" where <x> is a hexadecimal number with four to six digits with the obvious meaning.

If try-lisp-names-p is true, the function returns the character with the character name name (if there is one) or, if name is exactly one character, it returns this character.

All the keyword-governed alternatives are tried in the order they're described above.

See also *TRY-UNICODE1-NAMES-P*, *TRY-ABBREVIATIONS-P*, *SCRIPTS-TO-TRY*, *TRY-HEX-NOTATION-P*, and *TRY-LISP-NAMES-P*.

Returns the code point instead of the character if want-code-point-p is true. This can be especially useful for Lisp implementations where CHAR-CODE-LIMIT is smaller than +CODE-POINT-LIMIT+.

CL-USER 1 > (character-named "LATIN SMALL LETTER SHARP S")
#\ß

CL-USER 2 > (character-named "latin small letter sharp s")
#\ß

CL-USER 3 > (character-named "LatinSmallLetterSharpS")
#\ß

CL-USER 4 > (character-named "Latin:sharps" :try-abbreviations-p t)
#\ß

CL-USER 5 > (character-named "sharps" :try-abbreviations-p t :scripts-to-try "Latin")
#\ß

CL-USER 6 > (character-named "Backspace")
#\Backspace

CL-USER 7 > (character-named "Backspace" :try-unicode1-names-p nil)
NIL

CL-USER 8 > (character-named "Newline")
NIL

CL-USER 9 > (character-named "Newline" :try-lisp-names-p t)
#\Newline

CL-USER 10 > (character-named "U+0020" :try-hex-notation-p t)
#\Space


[Special variable]
*try-unicode1-names-p*


This is the default value for the try-unicode1-names-p keyword argument to CHARACTER-NAMED. Its initial value is T.


[Special variable]
*try-abbreviations-p*


This is the default value for the try-abbreviations-p keyword argument to CHARACTER-NAMED. Its initial value is NIL.


[Special variable]
*scripts-to-try*


This is the default value for the scripts-to-try keyword argument to CHARACTER-NAMED. Its initial value is NIL.


[Special variable]
*try-hex-notation-p*


This is the default value for the try-hex-notation-p keyword argument to CHARACTER-NAMED. Its initial value is NIL.


[Special variable]
*try-lisp-names-p*


This is the default value for the try-lisp-names-p keyword argument to CHARACTER-NAMED. Its initial value is NIL.

Alternative reader syntax


[Macro]
enable-alternative-character-syntax => |


Enables an alternative Lisp character syntax which replaces the usual syntax: After a sharpsign (#\#) and a backslash (#\\) have been read, at least one more character is read. Reading then continues as long as ASCII letters, digits, underlines, hyphens, colons, or plus signs are read. The resulting string is then used as input to CHARACTER-NAMED to produce a character.

This macro expands into an EVAL-WHEN so that if you use it as a top-level form in a file to be loaded and/or compiled it'll do what you expect. Technically, this'll push the current readtable on a stack so that matching calls of this macro and DISABLE-ALTERNATIVE-CHARACTER-SYNTAX can be nested.

Note that by default the alternative character syntax is not enabled after loading CL-UNICODE.

CL-USER 1 > (enable-alternative-character-syntax)

CL-USER 2 > (setq *try-abbreviations-p* t)
T

CL-USER 3 > (setq *scripts-to-try* "Hebrew")
"Hebrew"

CL-USER 4 > (char-code #\Alef)
1488
(It is recommended that you set *TRY-LISP-SYNTAX-P* to a true value when enabling the alternative syntax, so that you can still use the short syntax (like #\a) for characters.)

For an alternative syntax for strings see CL-INTERPOL.


[Macro]
disable-alternative-character-syntax => |


Restores the readtable which was active before the last call to ENABLE-ALTERNATIVE-CHARACTER-SYNTAX. If there was no such call, the standard readtable is used.

This macro expands into an EVAL-WHEN so that if you use it as a top-level form in a file to be loaded and/or compiled it'll do what you expect. Technically, this'll pop a readtable from the stack described in ENABLE-ALTERNATIVE-CHARACTER-SYNTAX so that matching calls of these macros can be nested.

Miscellaneous


[Constant]
+code-point-limit+


#x110000, the smallest integer which is not a code point in the Unicode codespace.


[Condition type]
unicode-error


All errors signalled by CL-UNICODE are of this type.

 

Symbol index

Here are all exported symbols of CL-UNICODE in alphabetical order linked to their corresponding entries:
 

Acknowledgements

This documentation was prepared with DOCUMENTATION-TEMPLATE.

$Header: /usr/local/cvsrep/cl-unicode/doc/index.html,v 1.13 2008/07/24 14:56:33 edi Exp $

BACK TO MY HOMEPAGE