URL-REWRITE - Rewrite (X)HTML attributes with Common Lisp


 

Abstract

URL-REWRITE is a small package which can be used to programmatically rewrite (X)HTML documents such that certain attributes values are replaced by others. It was written to rewrite URLs (as in <a href="..."> or <img src="...">) for cookie-less session handling but maybe you'll find other uses for it.

URL-REWRITE is intended to be portable and should work with all conforming Common Lisp implementations. Let me know if you encounter any problems.

It comes with a BSD-style license so you can basically do with it whatever you want.

URL-REWRITE was originally written as a support library for TBNL.

Download shortcut: http://weitz.de/files/url-rewrite.tar.gz.


 

Contents

  1. Example usage
  2. Download and installation
  3. The URL-REWRITE dictionary
    1. rewrite-urls
    2. *url-rewrite-tags*
    3. *url-rewrite-fill-tags*
    4. starts-with-scheme-p
    5. add-get-param-to-url
    6. url-encode
  4. Known bugs and problems
  5. Acknowledgements

 

Example usage

The canonical usage (the one URL-REWRITE was originally written for) is to scan through an HTML page and extend all non-external links with a GET parameter holding the current session ID - something like this:
* (defvar +session-cookie-name+ "session")
+SESSION-COOKIE-NAME+
* (defun add-session-var (html session-value)
    (with-input-from-string (*standard-input* html)
      (with-output-to-string (*standard-output*)
        (rewrite-urls (lambda (url)
                        (add-get-param-to-url url
                                              +session-cookie-name+
                                              session-value))))))
ADD-SESSION-VAR
* (add-session-var "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"> 
 <HTML>
   <BODY BGCOLOR=white>
     This is the <A NAME=foo HREF='first.html?foo=bar'>first link</A>, and this is the <A CLASS=NOBORDER HREF=\"http://www.cliki.net/\" TITLE='bar'>second one</A>.
     And here's a picture: <img src='/pics/cool_pic.gif' width=100 height=100>
   </BODY>
 </HTML>" "foo42")
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"> 
 <HTML>
   <BODY BGCOLOR=white>
     This is the <A NAME=foo HREF='first.html?foo=bar&amp;session=foo42'>first link</A>, and this is the <A CLASS=NOBORDER HREF=\"http://www.cliki.net/\" TITLE='bar'>second one</A>.
     And here's a picture: <img src='/pics/cool_pic.gif?session=foo42' width=100 height=100>
   </BODY>
 </HTML>"

 

Download and installation

URL-REWRITE together with this documentation can be downloaded from http://weitz.de/files/url-rewrite.tar.gz. The current version is 0.1.1.

URL-REWRITE comes with simple system definitions for MK:DEFSYSTEM and asdf so you can either adapt it to your needs or just unpack the archive and from within the URL-REWRITE directory start your Lisp image and evaluate the form (mk:compile-system "url-rewrite") (or the equivalent one for asdf) which should compile and load the whole system.

If you're on Gentoo Linux you should probably use the cl-url-rewrite package which is available thanks to Matthew Kennedy. Installation via asdf-install should also be possible.

If for some reason you don't want to use MK:DEFSYSTEM or asdf you can just LOAD the file load.lisp or you can also get away with something like this:

(loop for name in '("packages" "specials" "primitives" "util" "url-rewrite")
      do (compile-file (make-pathname :name name
                                      :type "lisp"))
         (load name))
Note that on CL implementations which use the Python compiler (i.e. CMUCL, SBCL, SCL) you can concatenate the compiled object files to create one single object file which you can load afterwards:
cat {packages,specials,primitives,util,url-rewrite}.x86f > url-rewrite.x86f
(Replace ".x86f" with the correct suffix for your platform.)
 

The URL-REWRITE dictionary

URL-REWRITE exports the following symbols:


[Function]
rewrite-urls rewrite-fn &optional test-fn => whatever


Reads an (X)HTML document from *STANDARD-INPUT* and writes it back to *STANDARD-OUTPUT*. Any attribute value which is in one of the positions denoted by *URL-REWRITE-TAGS* is rewritten by rewrite-fn if it passes the test denoted by the optional function test-fn (which defaults to the complement of STARTS-WITH-SCHEME-P).

rewrite-fn must be a function which accepts one argument (a string or NIL) and returns a string. The argument to rewrite-fn is the attribute value which is to be rewritten (or NIL if the corresponding attribute didn't have a value or if it wasn't present at all - see *URL-REWRITE-FILL-TAGS*). The return value is the new attribute value.

test-fn must be a function which accepts one argument (the attribute's value as a string) and returns a generalized boolean denoting whether the value should be rewritten or not.

This function (which should be named rewrite-attribute-values, shouldn't it?) is only called for its side-effect, so its return value doesn't matter.


[Special variable]
*url-rewrite-tags*


The value of this variable is initially set to
'(("a" . "href")
  ("area" . "href")
  ("frame" . "src")
  ("img" . "src")
  ("input" . "src")
  ("form" . "action")
  ("iframe" . "src"))
It is an alist where the cars are strings denoting the HTML tags which are to be examined and where the corresponding cdr is a string denoting which attribute's value will be rewritten (if it passes the test - see above). The case of these strings doesn't matter.


[Special variable]
*url-rewrite-fill-tags*


The value of this variable is initially set to
'(("form" . "action"))
It is an alist where the cars are strings denoting the HTML tags which will get a new attribute (denoted by the corresponding cdr) if the tag doesn't have an attribute of this name. (Note that in this case the argument to rewrite-fn will be NIL and test-fn won't be invoked - see REWRITE-URLS.) The case of these strings doesn't matter.


[Function]
starts-with-scheme-p string => boolean


This little convenience function will accept a string string and test whether it is a URL starting with a scheme (see RFC 1738) as opposed to a relative URL like, say, "pics/foo.png".
* (starts-with-scheme-p "http://www.alu.org/")
T

* (starts-with-scheme-p "mailto:bill@microsoft.com")
T

* (starts-with-scheme-p "/index.html")
NIL


[Function]
add-get-param-to-url url name value => new-url


Another little convenience function. This one will accept a string url which is supposed to be a http URL. The pair of name and value (both strings) will be added as a GET parameter to this URL. The function silently assumes that there's no other parameter of the same name. It only checks if #\? is part of the string to decide how to attach the new parameter to the end of the string. It doesn't check for question marks which are written as, say, &x3f;. URL-ENCODE is applied to value before it is appended to the resulting string.
* (add-get-param-to-url "http://common-lisp.net/" "foo" "bar")
"http://common-lisp.net/?foo=bar"

* (add-get-param-to-url "http://common-lisp.net/index.html?frob=42" "foo" "bar")
"http://common-lisp.net/index.html?frob=42&amp;foo=bar"

* (add-get-param-to-url "http://common-lisp.net/" "foo" "+")
"http://common-lisp.net/?foo=%2B"


[Function]
url-encode string => url-encoded-string


This function will URL-encode the string string. Note that this function uses your Lisp's CHAR-CODE function to determine the hex value for characters which need to be encoded and that it further assumes that all character codes of the characters comprising string are between 0 and 255. If you need support for other characters you'll have to write your own function. (See A.J. Flavell's entertaining article "FORM submission and i18n" for more info on this topic.)
* (url-encode "Fête Sørensen naïve Hühner Straße")
"F%EAte+S%F8rensen+na%EFve+H%FChner+Stra%DFe"

 

Known bugs and problems

URL-REWRITE aims to yield correct results for correct (X)HTML input and it also tries hard to never signal an error although it may warn if it encounters syntax errors. It will not detect any possible error nor is there any warranty that it will work correctly with faulty input.
 

Acknowledgements

Thanks to Jeff Caldwell for his suggestion (and initial code) to rewrite FORM tags without an ACTION attribute.

$Header: /usr/local/cvsrep/url-rewrite/doc/index.html,v 1.10 2006/08/28 19:09:28 edi Exp $

BACK TO MY HOMEPAGE