The term taxonomy crept into the search lexicon by stealth and is now firmly entrenched. The very early search engines, circa 1972-73, presented searchers with the retrieval option of selecting content using controlled vocabularies from a standardized thesaurus of terminology in a particular discipline. With no neat graphical navigation tools, searches were crafted on a typewriter-like device, painfully typed in an arcane syntax. A stray hyphen, period or space would render the query un-computable, so after deciphering the error message, the searcher would try again. Each minute and each result cost money, so errors were a real expense.

We entered the Web search era bundling content into a directory structure, like the “Yellow Pages,” or organizing query results into “folders” labeled with broad topics. The controlled vocabulary that represented directory topics or folder labels became known as a taxonomic structure, with the early ones at NorthernLight and Yahoo crafted by experts with knowledge of the rules of controlled vocabulary, thesaurus development and maintenance. Google derailed that search model with its simple “search box” requiring only a word or phrase to grab heaps of results. Today we are in a new era. Some people like searching by typing keywords in a box, while others prefer the suggestions of a directory or tree structure. Building taxonomic structures for more than e-commerce sites is now serious business for searches within enterprises where many employees prefer to navigate through the terminology to browse and discover the full scope of what is there.

Taxonomies for navigation are but one purpose for them to be used in search. Depending on the application domain, richness of the subject matter, scope and depth of topics, these lists can become quite large and complex. The more cross-references (e.g. cell phones USE wireless phones) are embedded in the list, the more likely the searcher’s preferred term will be present. There is a diminishing return, however; if the user has to navigate to a system’s preferred term too often; the entire process of searching becomes unwieldy and abandoned. On the other hand, if the system automates the smooth transition from one term to another, the richness and complexity of a taxonomy can be an asset.

In more sophisticated applications of taxonomies, the thesaurus model of relationships becomes a necessity. When a search engine, has embedded algorithms that can interpret explicit term relationships, it indexes content according to a taxonomy and all its cross-references. Taxonomy here informs the index engine. It requires substantial maintenance and governance of a much more granular nature than for navigation. To work well, a large corpus of terminology needs to be built to assure that what the content says and means, and what the searcher expects are a match in results. If the results of a search give back unsatisfactory results due to a poor taxonomy, trust in the search system fails rapidly and the benefits of whatever effort was put into building a taxonomy are lost.

I bring this up because the intent of any taxonomy is the first step in deciding whether to start building one. Either model is an on-going commitment but the latter is a much larger investment in sophisticated human resources. The conditions that must be met to have any taxonomy succeed must be articulated in selling the project and value proposition.