;; Copyright (c) 2011 nklein software ;; MIT License. See included LICENSE.txt file for licensing details.

USerial v0.4.2011.04.11

Patrick Stein mailto:pat@nklein.com

Overview

The USerial library is a general purpose library for serializing items into byte buffers and unserializing items back out of byte buffers. The "Buffer Handling" section below describes the various ways one can manipulate USerial buffers. The "Serializing and Unserializing" section below describes the different tools the USerial library provides for creating serializers and the serializers that the library provides out of the box. The "Sample Application: Game Protocol" section below describes how one might put all of these functions to use in preparing network packets for a simple game.

Buffer Handling

To serialize, one needs a place to put the data. To unserialize, one needs a place from which to fetch the data. Some libraries choose to implement such things as streams. The USerial library serializes to and unserializes from memory buffers because the primary goal for this library is to facility assembly and disassembly of datagram packets.

The USerial library uses adjustable arrays of unsigned bytes with fill pointers. The fill pointer is used to track the current position in the buffer for serializing or unserializing. The buffers are automatically resized to accomodate the serialized data.

The basic types and constants used for buffer-related operations are described in the "Buffer-related Types and Constants" section below.

The USerial library provides a function for allocating a new buffer. This function is described in the "Creating Buffers" section below.

Many USerial library routines that use a buffer declare the buffer parameter as key parameter. There is a macro one can use to execute a body of statemets with a particular buffer as the default for calls in which the buffer parameter is omitted. This macro is described in the "Using a Buffer" section below.

The only exported USerial library routines that take a buffer parameter directly (as opposed to with a key parameter) are the with-buffer macro and the unserialize-let* macro.

There are a variety of functions provided to allow one to query and manipulate the size of USerial buffers. These functions are described in the "Manipulating and Querying Buffer Sizes" section below.

There are some basic functions for adding an unsigned byte to a buffer and retrieving an unsigned byte from a buffer. Those functions are described below in the "Adding and Retrieving Bytes" section.

Buffer-related Types and Constants

The USerial buffers are adjustable arrays of unsigned bytes with fill pointers. The array is adjustable so that it can be easily grown as needed to accomodate serialized data. It has a fill pointer that is used to track the current length of serialized data (as distinguished from the current allocated capacity of the array) or the current point from which data will be unserialized.

(deftype buffer () '(array (unsigned-byte 8) (*)))

When one creates a USerial buffer, one can provide the initial capacity for the buffer. If no initial capacity is given for the buffer, a default size is used.

(defconstant +default-buffer-capacity+ 32768)

When one is adding bytes to a buffer, it would be very inefficient to reallocate the buffer each time an additional byte of space is needed. To this end, when the USerial library needs to increase the size of a buffer it adds at least the minimum of the current buffer size and this +default-buffer-expand+.

(defconstant +default-buffer-expand+ 8192)

For example, if the buffer were currently 256 bytes when the buffer needed to grow by a byte, it would be expanded to 512 bytes. If the buffer were currently 10,000 bytes when the buffer needed to grow by a byte, it would be expanded to 18,192 bytes.

Creating Buffers

The buffer allocator itself is the make-buffer function. It takes an optional parameter specifying the initial capacity of the buffer.

(ftype (function (&optional (integer 1 *)) buffer) make-buffer)
(defun make-buffer (&optional initial-capacity)
  ...)

As the buffer will be resized as needed, this parameter need not be set high enough to accomodate any and all serializations. It is provided merely to keep from having to reallocate the buffer several times if one can provide a decent, probable upper bound on the serialized size of the contents.

Using a Buffer

Most of the buffer manipulation and serialization functions declare the buffer as optional. The following macro allows one to specify the buffer to use for these functions when the buffer parameter is omitted.

(defmacro with-buffer (buffer &body body))

This macro assigns the dynamic variable *buffer* to be the given buffer for the duration of the body.

(declaim (special *buffer*))

Manipulating and Querying Buffer Sizes

When serializing a buffer, the buffer-length function returns the current length of the serialized data within the buffer. When unserializing a buffer, the buffer-length function returns the current length of the serialized data which has already been unserialized from the buffer.

(ftype (function (&key (:buffer buffer)) (integer 0 *)) buffer-length)
(defun buffer-length (&key (buffer *buffer*))
  ...)

The current allocated size of a buffer can be queried with the buffer-capacity function. One can (setf ...) the buffer-capacity if needed to explicitly modify the amount of buffer space allocated.

(ftype (function (&key (:buffer buffer)) (integer 0 *)) buffer-capacity)
(defun buffer-capacity (&key (buffer *buffer*))
  ...)
(setf (buffer-capacity &key (buffer *buffer*)) (integer 0 *))

One can advance the current position within the buffer either to save space for later serialization or to skip over bytes during unserialization.

(ftype (function (&key (:amount (integer 0 *))
                       (:buffer buffer))       buffer)
       buffer-advance)
(defun buffer-advance (&optional (amount 1) (buffer *buffer*)))
  ...)

If not specified, the buffer-advance function advances by a single byte.

One can reset the current position within the buffer back to the beginning to begin unserializing a serialized buffer, to fill in places that one skipped during the first stage of serialization, or to re-use the same buffer for the next serialization.

(ftype (function (&key (:buffer buffer)) buffer) buffer-rewind)
(defun buffer-rewind (&key (buffer *buffer*))
  ...)

Adding and Retrieving Bytes

At its base, the buffer class is an adjustable array of unsigned bytes. To add a byte to a buffer, one can use the following function. This function will expand the buffer if needed, place the given byte at the current fill pointer and advance the fill pointer.

(ftype (function (uchar &key (:buffer buffer)) buffer) buffer-add-byte)
(defun buffer-add-byte (byte &key (buffer *buffer*))

Similarly, to retrieve an unsigned byte from a buffer, one can use the following function. This function will retrieve the byte at the current fill pointer and advance the fill pointer.

(ftype (function (&key (:buffer buffer)) (values uchar buffer))
       buffer-get-byte)
(defun buffer-get-byte (&key (buffer *buffer*))

Serializing and Unserializing

The ultimate purpose of the USerial library is to allow one to serialize and unserialize data. To this end, the library defines two generic functions that dispatch on a keyword parameter. These generic functions are described in the "Serializing and Unserializing" section below.

There are some macros that facilitate serializing and unserializing sequences of items. These macros are described in the "Serializing and Unserializing Multiple Items" section below.

There are other macros which facilitate defining new serialize and unserialize methods for common situations. These macros are described in the "Defining New Serializers" section below.

There are a variety of pre-defined serialize and unserialize methods. These are describe in the "Pre-defined Serializers" section below.

Serialize and Unserialize Generics

The generic function used to serialize items takes a keyword as its first parameter, a value as its second parameter, and an optional buffer. The keyword is used to dispatch the appropriate implementation of the function for the given value. The serialize methods serialize the value into the buffer and return the buffer.

(ftype (function (symbol &key (:buffer buffer) &allow-other-keys)
                 buffer)
       serialize)
(defgeneric serialize (keyword value &key (buffer *buffer*)
                                     &allow-other-keys))

The generic function used to unserialize items takes a keyword as its first parameter and an optional buffer. The keyword is used to dispatch the appropriate implementation of the function. The unserialize methods unserialize a value from the buffer and return the value and buffer.

(ftype (function (symbol &key (:buffer buffer) &allow-other-keys)
                 (values t buffer))
       unserialize)
(defgeneric unserialize (keyword &key (buffer *buffer*)
                                 &allow-other-keys))

Serializing and Unserializing Multiple Items

For most purposes, one wants to serialize more than one thing into a given buffer. The USerial library provides some convenience macros so that one is not forced to explicitly call serialize or unserialize for each item. Here is an example of explicitly calling the serialize method for each item.

(with-buffer (make-buffer 1024)
  (serialize :opcode :login)
  (serialize :string login-name)
  (serialize :string password)
  (serialize :login-flags '(:hidden)))

The first such macro is serialize*. With this macro, one specifies a keyword-value list and an optional buffer. With it, the above example could be serialized as follows.

(serialize* (:opcode :login
             :string login-name
             :string password
             :login-flags '(:hidden)) :buffer (make-buffer 1024))

To unserialize from the resulting buffer, one could explicitly call unserialize for each item in the buffer storing each item explicitly into a place.

(let (opcode login-name password flags)
  (with-buffer buffer
    (setf opcode     (unserialize :opcode)
          login-name (unserialize :string)
          password   (unserialize :string)
          flags      (unserialize :login-flags)))
  ...)

To do the same sort of thing more directly, one can use the unserialize* macro. This macro allows one to unserialize from a given buffer into given places using given keywords on which to dispatch.

(let (opcode login-name password flags)
  (unserialize* (:opcode      opcode
                 :string      login-name
                 :string      password
                 :login-flags flags)     :buffer buffer)
  ...)

Another way one might have used explicit calls to unserialize is to replace the let construct in the above with a let* and unserialize each variable as it is created.

(with-buffer buffer
  (let* ((opcode     (unserialize :opcode))
         (login-name (unserialize :string))
         (password   (unserialize :string))
         (flags      (unserialize :login-flags)))
    ...))

To condense the above, one can use the unserialize-let* macro. It takes a list of keyword/variable-names, a buffer (which is not optional), and a body of statements to execute while the named variables are in scope. Note: the buffer argument here is required.

(unserialize-let* (:opcode      opcode
                   :string      login-name
                   :string      password
                   :login-flags flags)      buffer
  ...)

Suppose one wanted to unserialize into a list (as this is Lisp after all). One could explicitly call unserialize for each item in the list.

(with-buffer buffer
  (list (unserialize :opcode)
        (unserialize :string)
        (unserialize :string)
        (unserialize :login-flags)))

To eliminate a great deal of typing the word unserialize, one can use the unserialize-list* macro. The macro takes a list of keywords and an optional buffer. It returns a list as the first value and the buffer as the second value.

(unserialize-list* (:opcode :string :string :login-flags)
                   :buffer buffer)

Defining New Serializers

Almost every protocol requires the encoding and decoding of integer values. To make it easy to create as many of these types as one's application requires, the USerial library defines a macro which creates a serialize and unserialize method for an integer that is a given number of bytes long. The macro takes two arguments: the key used to specify the method and an integer number of bytes.

(defmacro make-int-serializer (key bytes))

For example, to make serialize and unserialize methods for signed bytes and signed quadwords, one could simply call:

(make-int-serializer :signed-byte 1)
(make-int-serializer :signed-quadword 8)

Similarly, if one wanted to create serialize and unserialize methods for unsigned bytes and unsigned doublewords, one could use the following macro:

(defmacro make-uint-serializer (key bytes))
(make-uint-serializer :unsigned-byte 1)
(make-uint-serializer :unsigned-doubleword 4)

Note: the bytes argument to the make-int-serializer and make-uint-serializer macros must be a constant value available at the time the macro is expanded.

To serialize floating point numbers, one must have a function that encodes floating point numbers into an integer representation and a function that decodes the integer representation back into a floating point number. Then, one can use the make-float-serializer macro which takes a key used to specify the method, a lisp type for the floating point number, a constant number of bytes for the encoded values, an encoder, and a decoder.

(make-float-serializer (key type bytes encoder decoder))

For example, the following would create serializers that encode rational numbers (technically not floating point, I know) as 48-bit fixed point numbers with 16-bits devoted to the fractional portion and 32-bits devoted to the integer portion.

(make-float-serializer :fixed-32/16 rational 6
                       #'(lambda (rr) (round (* rr 65536)))
                       #'(lambda (ii) (/ ii 65536)))

The USerial library defines macros for helping one encode bit fields (to represent choices where more than one possibility at a time is acceptable) and enumerations (to represent choices where only a single selection can be made). These macros take a keyword used to specify the method and a list of choices.

(make-bitfield-serializer :wants (:coffee :tea :sega))
(make-enum-serializer :direction (:left :right :up :down))

With the bit field serializer, one can specify a single option or a list of zero or more options. With the enumeration serializer, must specify a single option.

(serialize :wants :tea)
(serialize :wants nil)
(serialize :wants '(:tea :sega))
(serialize :direction :up)

When unserializing, the bit field will always return a list even when there is a single item in it as in the :tea example above.

To facilitate serializing and deserializing classes and structs, the USerial library provides macros which create serializers and unserializers for items based on slots or accessors. These macros take a key used to specify the methods, a factory form used by the unserialize method to create a new instance of the class or struct, and a plist of key/name pairs where the name is a slot name for the slot serializers or an accessor name for the accessor serializers and the key with each name specifies how to serialize the value in that slot.

An example will help to clarify the previous paragraph. Suppose one had a simple struct listing a person's name, age, and favorite color.

(defstruct person name age color)

One could create the following serialize and unserialize pairs to allow encoding the data for internal use (where all data is available) or for public use (where the age is kept secret).

(make-slot-serializer :person-internal
                      (make-person)
                      (:string name
                       :uint8 age
                       :string color))
(make-accessor-serializer :person-public
                          (make-person :age :unknown)
                          (:string person-name
                           :string person-color))

Here is a simple session showing the above in action. The following code first defines a function which serializes a value using a given key to a new buffer, rewinds the buffer, and unserializes from the buffer using the key.

CL-USER> (defun u-s (key value)
            (with-buffer (make-buffer)
               (serialize key value)
               (buffer-rewind)
               (nth-value 0 (unserialize key))))
U-S

CL-USER> (defvar *p* (make-person :name "Patrick"
                                  :age 40
                                  :color "Green"))
*P*

CL-USER> (u-s :person-internal *p*)
#S(PERSON :NAME "Patrick" :AGE 40 :COLOR "Green")

CL-USER> (u-s :person-public *p*)
#S(PERSON :NAME "Patrick" :AGE :UNKNOWN :COLOR "Green")

Pre-defined Serializers

The USerial library defines some commonly required serializers.

For signed integers, the USerial library defines the following serializers (and unserializers): :int8 for signed bytes, :int16 for signed 16-bit integers, :int32 for signed 32-bit integers, and :int64 for signed 64-bit integers.

For unsigned integers, the USerial library defines the following serializers: :uint8 for unsigned bytes, :uint16 for unsigned 16-bit integers, :uint24 for unsigned 24-bit integers, :uint32 for unsigned 32-bit integers, :uint48 for unsigned 48-bit integers, and :uint64 for unsigned 64-bit integers.

For floating point numbers, the USerial library defines the :float32 serializer for encoding single-float values as 32-bit IEEE floating point numbers and the :float64 serializer for encoding double-float values as 64-bit IEEE floating point numbers. The USerial library uses the ieee-float library to encode and decode floating point numbers.

For arbitrary byte sequences, the USerial library defines the :bytes serializer.

For strings, the USerial library defines the :string serializer for encoding strings as UTF-8 encoded sequences of arbitrary length. The USerial library uses the trivial-utf-8 library to encode and decode UTF-8 strings.

For enumerated types, the USerial library defines the :boolean serializer for encoding an option that will be either nil or t.

Logging

For various applications, it may be useful to log serialized messages. The userial library provides a simple way to do that with the serialize-log macro. The serialize-log macro takes a category and arguments for serialize* and invokes a logger (if one is available) with the serialized information. [Currently, cl-log is the only supported logging system since it is the only one that I am certain will accept binary messages.]

For example, if one wanted to log an integer and two strings in the logging category :packet one might do the following:

(serialize-log :packet :int32 the-int :string s1 :string s2)

Sample Application: Game Protocol

This example shows how one might use the tools above to serialize the data that would need to be exchanged between a client and server to implement a two-player game similar to Milton-Bradley's Battleship game.

For this game, there will be a server and two clients. Each client will begin the game by placing his ships on an (2K+1)x(2K+1) board. The board will have coordinates ranging from -K through +K in both the X and Y axis. Ships will have to be placed either horizontally or vertically at integer coordinates. All ships are three units in length. It takes only one missile shot to sink a ship.

Once the ships are placed, regular play begins. During his turn during regular play, a client can either ping or fire. Each client begins with a defined amount of energy available with which to ping and a defined number of missiles.

If the client chooses to ping, the client chooses the radius of the ping and its center of origin. The server will calculate the distance from the center of origin to each enemy ship within the specified radius from the origin, round those distances to the nearest integer, and reply to the client with that list.

If the client chooses to fire, the client chooses the location upon which to fire. The server will respond to the client to tell him whether the shot was a hit or a miss.

Opcodes

To facilitate handling of received messages, each message will begin with an opcode identifying the message type. Some messages will be sent only from the client to the server. Others will be sent only from the server to the client.

(make-enum-serializer :client-opcodes
                      (:login :place-ship :ping :fire))
(make-enum-serializer :server-opcodes
                      (:welcome :ack :sunk :shot-results))

The message-receiving portion on the server side could then do something like this:

(defun handle-message-from-client (message)
  (ecase (unserialize :client-opcodes :buffer message)
    (:login      (handle-login-message message))
    (:place-ship (handle-place-ship-message message))
    (:ping       (handle-ping-message message))
    (:fire       (handle-ping-message message))))

Logging In

To begin a game, the client sends a message to the server with opcode :login. The message declares the player's name, which board sizes the client will play, and an optional name of an opponent that the client is waiting to play.

(make-bitfield-serializer :playable-board-sizes
                          (:small :medium :large :huge))
(defun make-login-message (name &key opponent small medium large huge)
  (let ((sizes (append (when small  '(:small))
                       (when medium '(:medium))
                       (when large  '(:large))
                       (when huge   '(:huge)))))
    (let ((message (make-buffer)))
      (with-buffer message
         (serialize* (:client-opcode        :login
                      :string               name
                      :playable-board-sizes sizes
                      :boolean              (if opponent t nil)))
         (when opponent
           (serialize :string opponent)))
      message)))

On the receiving side, the server might do something like the following (given that it already read the opcode from the message as it had in the previous section).

(defun handle-login-message (message)
  (unserialize-let* (:string               name
                     :playable-board-sizes sizes
                     :boolean              has-opponent) message
    (assert (plusp (length name)))
    (assert (plusp (length sizes)))
    (cond
      (has-opponent (unserialize-let* (:string opponent) message
                      (match-or-queue name sizes opponent)))
      (t            (match-or-queue name sizes)))))

When the server finds a match for the requested game, it composes welcome messages to each client. The welcome message contains the size of the board in squares, the number of ships each player has, the amount of ping energy each player has, the number of missiles each player has, and the name of the opponent.

(defun make-welcome-message (squares ships energy missiles opponent)
  (serialize* (:server-opcode :welcome
               :uint8 squares
               :uint8 ships
               :float32 energy
               :uint16 missiles
               :string opponent)
              :buffer (make-buffer)))

Suppose the client had a class it was using to track the current state of the game. The client could then use a slot-serializer or accessor-serializer to parse the incoming welcome message.

(make-accessor-serializer
  :game-state-from-welcome (make-game-state)
  (:uint8   game-state-board-size
   :uint8   game-state-ships
   :float32 game-state-energy
   :uint16  game-state-missiles
   :string  game-state-opponent))

The client could then handle the welcome message as follows (assuming the opcode has already been unserialized from the message buffer):

(defun handle-welcome-message (message)
  (unserialize-let* (:game-state-from-welcome game-state) message
    ;; do anything with this game state here
    )

Placing Ships

To place ships, a client specifies the center coordinate of the ship and whether the ship is oriented horizontally or vertically.

(make-enum-serializer :orientation (:horizontal :vertical))
(defun make-place-ship-message (x y orientation)
  (serialize* (:client-opcode :place-ship
               :int8 x
               :int8 y
               :orientation orientation)
              :buffer (make-buffer)))

The server could read the coordinates and orientation into local variables before calling a method to add the ship to the map.

(defun handle-place-ship-message (message)
   (let (x y orientation)
     (unserialize* (:int8        x
                    :int8        y
                    :orientation orientation)
                   :buffer message)
     (add-ship-to-map x y
                     :is-vertical (eql orientation
                                       :vertical))))

Pinging

To perform a ping move, a client encodes a radius and a center for the ping.

(serialize* (:client-opcode :ping
             :float32       radius
             :int8          x
             :int 8         y)
            :buffer (make-packet))

Here, the server will decode the ping request into a list to send to its routine to calculate the reply.

(apply #'calculate-ping-response
       (unserialize-list* (:float32 :int8 :int8)
                          :buffer message))

Supposing the return from calculate-ping-response is a list of distances to ships, the ack message could be encoded like this:

(with-packet ack-message
  (serialize* (:server-opcodes :ack
               :float32 remaining-ping-energy
               :uint16 (length hits)))
  (mapcar #'(lambda (d) (serialize :uint8 d)) hits))

Firing

To send a fire message, the client just sends the coordinates of the location upon which to fire.

(serialize* (:client-opcodes :fire
             :int8           x
             :int8           y)
            :buffer (make-buffer))

If the server determines the shot was a hit, it must send a sunk message to the opponent. Either way, a shot results message must be sent to the client.

(make-enum-serializer :shot-result (:hit :miss))

(defun make-sunk-message (x y)
  (serialize* (:server-opcodes :sunk
               :int8           x
               :int8           y)
              :buffer (make-buffer)))

(defun make-shot-results-message (hit)
  (serialize* (:server-opcodes :shot-results
               :shot-result    (if hit :hit :miss))
              :buffer (make-buffer)))