Our survey of other work in this area has led us to formulate five desiderata for an adaptive web site.
We took the above desiderata as constraints on our approach to creating adaptive web sites. We use transformation rather than customization, both to avoid confronting visitors with questionnaires and to facilitate the sharing of site improvements for a wide range of visitors. We focus on an access-based approach, as automatic understanding of free text is difficult. We do not assume any annotations on Web pages beyond HTML. For safety, we limit ourselves to nondestructive transformations: changes to the site that leave existing structure intact. We may add links but not remove them, create pages but not destroy them, add new structures but not scramble existing ones. Finally, we restrict ourselves to generating candidate adaptations and presenting them to the human webmaster -- any non-trivial changes to the web site are under webmaster control.
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:05 -0800] "GET /home/jones/collectors.html HTTP/1.0" 200 13119 |
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/jones/madewithmac.gif HTTP/1.0" 200 855
cs106-14.u.washington.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/chinn/ HTTP/1.0" 200 1896
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/jones/gustop2.gif HTTP/1.0" 200 25460
x67-122.ejack.umn.edu - - [21/Nov/1996:00:01:08 -0800] "GET /home/rich/aircrafts.html HTTP/1.0" 404 617
x67-122.ejack.umn.edu - - [21/Nov/1996:00:01:08 -0800] "GET /general/info.gif HTTP/1.0" 200 331
220.127.116.11 - - [21/Nov/1996:00:01:09 -0800] "GET /home/smith/kitty.html HTTP/1.0" 200 5160
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:10 -0800] "GET /home/jones/thumbnails/awing-bo.gif HTTP/1.0" 200 5117
The main source of information we rely on is the site's web server log, which records the pages visited by a user at the site. Our underlying intuition is what we call the visit-coherence assumption: the pages a user visits during one interaction with the site tend to be conceptually related. We do not assume that all pages in a single visit are related. After all, the information we glean from individual visits is noisy; for example, a visitor may pursue multiple distinct tasks in a single visit. However, if a large number of visitors continue to visit and re-visit the same set of pages, that provides strong evidence that the pages in the set are related. Thus, we accumulate statistics over many visits by numerous visitors and search for overall trends.
It is not difficult to devise a number of simple, non-destructive transformations that could improve a site; we describe several in . Examples include highlighting popular links, promoting popular links to the top of a page or to the site's front page, and linking together pages that seem to be related. We have implemented one such transformation: shortcutting, in which we attempt to provide links on each page to visitors' eventual goals, thus skipping the in-between pages. As reported in , we found a significant number of visitors used these automatic shortcuts.
However, our long-term goal is to demonstrate that more fundamental adaptations are feasible. An example of this is change in view, where a site could offer an alternative organization of its contents based on user access patterns. Consider, for example, the Music Machines web site, which has been our primary testbed, as it is maintained by one of the authors, and we have full access to all documents and access logs. Music Machines is devoted to information about various kinds of electronic musical instruments. Most of the data at the site is organized by the manufacturer of the instrument and the particular model number. That is, there is a page for the manufacturer Roland and, on that page, links to pages for each instrument Roland produces. However, imagine a visitor to the site who is interested in a comprehensive overview of all the keyboards available from various manufacturers. She would have to first visit the Roland page and look at each of the Roland keyboards, then visit each of the other keyboard manufacturers for its offerings as well. Now, imagine if the site repeatedly observed this kind of behavior and automatically created a new web page containing all the links to all the keyboards. Now our visitor need only visit this new page rather than search for all the keyboards. This page represents a change in view, from the former ``manufacturer-centric'' organization to one based on type of instrument. If we can discover these user access patterns and create new web pages to facilitate them, we should in theory be able to create new views of the site.