Some of my recent work has taken me into the realm of web search technology. In particular I've been thinking about the relationship between regular 'Google' style search and social tagging on tools such as Delicious and Scuttle. This post outlines some of my thinking about how these two search services can start to merge into a single complimentary service, comparing the two technologies and taking a deeper dive in to some of the less obvious strengths of tag based searching.
Let's start by examining some key facts about traditional and social search
Traditional Search
Positives | Negatives |
---|
- Searches entire content of a page
- Searches linked content not just a single page or object
| - Index entries have no cognitive input at time of creation
- Content is indexed regardless of its degree of relevance
|
Social Search
Positives | Negatives |
---|
- Only indexes content which is cognitively perceived to be valuable to one more persons
- Tags used to describe information provide a cognitive bias to each index entry
| - Creation of each uniquely identified resource in the search index requires an investment of time and energy from one or more persons
- The body of content indexed as a whole reflects a bias toward the socio-political status quo over time (this may also be perceived as a positive)
|
Given the differences between these two types of search, it's not difficult to perceive why they co-exist happily side by side (at least for the present anyway) owing to the fact that they each offer capabilities to users and to site/resource owners which the other does not. What is not so immediately obvious is how to bring these together in a one-stop shop search service. Current offerings on the web typically tend to focus on one or other search methods or perhaps build a site based on a mixture of the two but without any real integration (i.e. just a basic dump of top search results for each type of search). Fortunately there is light on the horizon due to a great deal of potential for greater integration going forward. This is due to the close relationship between the process of performing a search and choosing a result, and of tagging a resource for inclusion in a social search database. Let's take a closer look at these two user oriented methods of defining relationships...
- Social search - During a delicious style tagging process, a user enters a series of keywords which describe a resource. This creates a relationship between the words entered which lives within the context of the link/resource subject itself. In other words if I create a new bookmark for eBay with the tag words 'auction', 'online', 'ecommerce', 'bargain', 'international', 'sell', and 'search', I am in effect creating a relationship between these words in the context of the subject 'eBay'. Additionally, as more people bookmark eBay, the closest and most universally relevant of these relationships are reinforced which means there is a constant process of refinement of relationships. Furthermore, and most importantly for our current train of thought, as different resources are tagged en masse it becomes possible to start creating relationships between tags across different resources based on common tag words, a kind of user generated thesaurus.
- Traditional search - the generation of relationships here is similar to that for social search and also applies to social search itself. During the search process, a user enters words relating to the site/resource they wish to find. The user then typically will click on or more links returned by the search query, information which can be, and in the case of traditional search usually is, captured by the search engine. Here we have the reverse of tagging, the user puts the tags in first and then links them to the resource afterwards. Although this may be a little hit and miss initially, over time it should be possible to get a high degree of reliability by comparing a large number of searches for the same, or similar, combination of words and analysing the most popular resources chosen from the results.
Ok, so there are some synergies, but how can these be leveraged to create a more integrated search service across traditional and social search? There are two possibilities here...
Integrating the results into a single unified list
Given that we effectively have the same process for creating relationships operating in both traditional search and social search (albeit with the process steps in reverse for traditional search) we have a synergy which should be easy to leverage. By combining the social tag dataset with the search and preferred results data-set (both of which relate words to resources) we can create a unified single data-set. An issue presents itself here in that one search dataset will likely have a significantly different number of entries than the other. However this can be overcome by creating an aggregation of the data which represents it in a qualitative rather than quantative way (i.e. by using ratios to represent relationship strength in each case rather than absolute volumes based on frequency of usage).
A further more serious issue with this approach, is that new sites which have not yet been actively chosen as the result of a traditional search will not appear in the results as the relationship has not yet been formed (social sesarch does not suffer from this problem). A method of incorporating these sites, a kind of process to nurture the new, is threfore required.
Using the unified data-set to return search refinement possibilities to the user
Many searches typically don't point the user in the desired direction in a single iteration. However in a typical search process (e.g. using google) the user is required to figure out the correct words to use to successfully refine the search and remove irrelevant results. Using the unified data-set, it is now possible to send back a tag cloud of the most heavily related words which the user can then select to refine the search. This creates a faster and more precise search refinement process and allow a user to navigate to weaker relationships much more quickly if desired by a continual process of suggestive refinement.
These are coneptual ideas based around my perceived view of the strength of the semantic word relationships represented, as such I warmly welcome discussion and feedback which will help the develop these ideas.