Function SPLIT-SEQUENCE, SPLIT-SEQUENCE-IF, SPLIT-SEQUENCE-IF-NOT

Syntax:

split-sequence delimiter sequence &key count remove-empty-subseqs from-end start end test test-not key => list, index

split-sequence-if predicate sequence &key count remove-empty-subseqs from-end start end key => list, index

split-sequence-if-not predicate sequence &key count remove-empty-subseqs from-end start end key => list, index

Arguments and Values:

delimiter---an object.

predicate---a designator for a function of one argument that returns a generalized boolean.

sequence---a proper sequence.

count---an integer or nil. The default is nil.

remove-empty-subseqs---a generalized boolean. The default is false.

from-end---a generalized boolean. The default is false.

start, end---bounding index designators of sequence. The defaults for start and end are 0 and nil, respectively.

test---a designator for a function of two arguments that returns a generalized boolean.

test-not---a designator for a function of two arguments that returns a generalized boolean.

key---a designator for a function of one argument, or nil.

list---a proper sequence.

index---an integer greater than or equal to zero, and less than or equal to the length of the sequence.

Description:

Splits sequence into a list of subsequences delimited by objects satisfying the test.

List is a list of sequences of the same kind as sequence that has elements consisting of subsequences of sequence that were delimited in the argument by elements satisfying the test. Index is an index into sequence indicating the unprocessed region, suitable as an argument to subseq to continue processing in the same manner if desired.

The count argument, if supplied, limits the number of subsequences in the first return value; if more than count delimited subsequences exist in sequence, the count leftmost delimited subsequences will be in order in the first return value, and the second return value will be the index into sequence at which processing stopped.

If from-end is non-null, sequence is conceptually processed from right to left, accumulating the subsequences in reverse order; from-end only makes a difference in the case of a non-null count argument. In the presence of from-end, the count rightmost delimited subsequences will be in the order that they are in sequence in the first return value, and the second is the index indicating the end of the unprocessed region.

The start and end keyword arguments permit a certain subsequence of the sequence to be processed without the need for a copying stage; their use is conceptually equivalent to partitioning the subsequence delimited by start and end, only without the need for copying.

If remove-empty-subseqs is null (the default), then empty subsequences will be included in the result.

In all cases, the subsequences in the first return value will be in the order that they appeared in sequence.

Examples:

 (split-sequence:SPLIT-SEQUENCE #\Space "A stitch in time saves nine.")
=>  ("A" "stitch" "in" "time" "saves" "nine.")
    28
 (split-sequence:SPLIT-SEQUENCE #\, "foo,bar ,baz, foobar , barbaz,")
=>  ("foo" "bar " "baz" " foobar " " barbaz" "")
    30

Implementation notes:

This code was written various people, and the license is unknown. Since multiple people worked on it collaboratively and none of them seem interested in keeping their intellectual property rights to it, I'll assume that it is in the public domain (since the process that produced it seems like the very essence of public domain). If this is incorrect, please contact me so we can get it straightened out.

The implementation itself is mature and well tested, and it is widely used. The code should be fast enough for most people, but be warned: it was written with vectors in mind, with list manipulation as an afterthought. It does a lot of things that are quick on vectors but slow on lists, and this can result in many orders of magnitude slowdown in list benchmarks versus code written for lists. If this is a problem for you, it should be straightforward to write your own, such as the (more limited, not API compatible) example function given by Szymon in this mailing list post:

(defun split-list-if (test list &aux (start list) (end list))
  (loop while (and end (setq start (member-if-not test end)))
	collect (ldiff start (setq end (member-if test start)))))

If this is an issue for enough people, I could optimize the code and fix this problem. I'm reluctant to do that, however, since the code works and is tested. It's usually more important to be correct and non-buggy than to be fast, and I have been known to introduce bugs.


Manual Index