Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection?
Journal article, Peer reviewed
Postprint version of published article. original article is available at www.springerlink.com
View/ Open
Date
2013Metadata
Show full item recordCollections
Original version
Steinhauer, J., Delcambre, L. M., Lykke, M., & Ådland, M. K. (2013). Do User (Browse and Click) Sessions Relate to Their Questions in a Domain-Specific Collection?. In Research and Advanced Technology for Digital Libraries (pp. 96-107). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-40501-3_10Abstract
We seek to improve information retrieval in a domain-specific collection
by clustering user sessions from a click log and then classifying later user sessions in
real-time. As a preliminary step, we explore the main assumption of this approach:
whether user sessions in such a site are related to the question that they are answering.
Since a large class of machine learning algorithms use a distance measure at the core,
we evaluate the suitability of common machine learning distance measures to distinguish
sessions of users searching for the answer to same or different questions. We
found that two distance measures work very well for our task and three others do not.
As a further step, we then investigate how effective the distance measures are when
used in clustering. For our dataset, we conducted a user study where we had multiple
users answer the same set of questions. This data, grouped by question, was used as
our gold standard for evaluating the clusters produced by the clustering algorithms.
We found that the observed difference between the two classes of distance measures
affected the quality of the clusterings, as expected. We also found that one of the two
distance measures that worked well to differentiate sessions, worked significantly
better than the other when clustering. Finally, we discuss why some distance metrics
performed better than others in the two parts of our work.