Repository logo

Towards exploratory faceted search systems



Change log


Ksikes, Alex 


In this thesis, we cover what we believe would be the main ingredients of an exploratory search system (ESS). In a nutshell, these are textual queries, facets, visual results, social search and query-by-example. The goal of the thesis is to show how all of these elements could readily be integrated into a typical faceted search system that users are already accustomed to. In this respect, we propose that the future of exploratory search might be a traditional faceted search system, but with the added ingredients of information visualizations and query-by-example.

To illustrate our ideas we have built two freely available web applications. The first one, Biomed Search, has been positively received by the community and offers some novel characteristics. First, in order to improve on both precision and recall, Biomed Search indexes not only the text caption but also the text that refers to the image. Second, the interface uses a common pattern of zooming in on a particular search result in order to display more information. User feedback on Biomed Search has hinted towards faceted search, visual search results and query-by-example.

The second system, Cloud Mining, is an attempt at implementing the vision set forth in this thesis. The system is a framework used to instantiate ESSs. It offers the novel characteristics of facet views as well as multiple-item based searches combined with textual queries. Cloud Mining paves the way to a completely pluggable search framework, in which every component would be driven by a community of users. The system was tested on large publicly available datasets and all its software components are available under an open source license.

The main contributions of this thesis come as lessons learned, suggestions or recommendations as to how to extend the current paradigm of faceted search into the one of exploratory search. The search results and facets should be extended with different views. Query by example should be integrated with Bayesian Sets as it reduces the handling of complex content based searches to choosing the right plugin. Finally, the system should be thought as a framework to instantiate ESSs, in which every one of its component is a community driven plugin. These customized tailored tools, when applied to a dataset of interest, could offer a collective intelligence approach to information overload.






Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge