This blog entry on the “Taxonomy Watch” website prompts me to correct the impression that I believe naysayers who say that taxonomies take too much time and effort to be valuable. Nothing could be further from the truth. I believe in and have always been highly vested in taxonomies because I am convinced that an investment in pre-processing enterprise generated content into meaningfully organized results brings large returns in time savings for a searcher. S/he, otherwise, needs to invest personally in the laborious post-processing activity of sifting and rejecting piles of non-relevant content. Consider that categorizing content well and only once brings benefit repeatedly to all who search an enterprise corpus.
Prime assets of enterprises are people and their knowledge; the resulting captured information can be leveraged as knowledge assets (KA). However, there is a serious problem “herding” KA into a form that results in leveragable knowledge. Bringing content into a focus that is meaningful to a diverse but specialized audience of users, even within a limited company domain is tough because the language of the content is so messy.
So, what does this have to do with taxonomies and enterprise search, and how they factor into leveraging KA? Taxonomies have a role as a device to promote and secure the meaningful retrievability of content when we need it most or fastest, just-in-time retrieval. If no taxonomies exist to pre-collocate and contextualize content for an audience, we will be perpetually stuck in a mode of having to do individual human filtering of excessive search results that come from “keyword” queries. If we don’t begin with taxonomies for helping search engines categorize content, we will certainly never get to the holy grail of semantic search. We need every device we can create and sustain to make information more findable and understandable; we just don’t have time to both filter and read, comprehensively, everything a keyword search throws our way to gain the knowledge we need to do our jobs.
Experts recognize that organizing content with pre-defined terminology (aka controlled vocabularies) that can be easily displayed in an expandable taxonomic structure is a useful aid for a certain type of searcher. The audience for navigated search is one that appreciates the clustering of search results into groups that are easily understood. They find value in being able to move easily from broad concepts to narrower ones. They especially like it when the categories and terminology are a close match to the way they view a domain of content in which they are subject experts. It shows respect for their subject area and gives them a level of trust that those maintaining the repository know what they need.
Taxonomies, when properly employed, serve triple duty. Exposing them to search engines that are capable of categorizing content puts them into play as training data. Setting them up within content management systems provides a control mechanism and validation table for human assigned metadata. Finally, when used in a navigated search environment, they provide a visual map of the content landscape.
U.S. businesses are woefully behind in “getting it;” they need to invest in search and surrounding infrastructure that supports search. Comments from a recent meeting I attended reflected the belief that the rest of the world is far ahead in this respect. As if to highlight this fact, a colleague just forwarded this news item yesterday. “On February 13, 2008, the XBRL-based financial listed company taxonomy formulated by the Shanghai Stock Exchange (SSE) was “Acknowledged” by the XBRL International. The acknowledgment information has been released on the official website of the XBRL International (http://www.xbrl.org/FRTaxonomies/)….”.
So, let’s get on with selling the basic business case for taxonomies in the enterprise to insure that the best of our knowledge assets will be truly findable when we need them.
This week’s thoughts come from the pile of serendipitous reading that routinely piles up on my desk. In this case a short article in Information Week caught my eye because it featured the husband of a former neighbor, Ken Krugler, co-founder of Krugle. I’d set it aside because a fellow, David Eddy, in my knowledge management forum group keeps telling us that we need tools to facilitate searching for old but still useful source code. In order to do it, he believes, we need an investment in semantic search tools that normalize the voluminous language variants scattered throughout source code. That would enable programmers to find code that could be re-purposed in new applications.
Now, I have taken the position that source code is just one set of intellectual property (IP) asset that is wasted, abandoned and warehoused for technology archaeologists of centuries hence. I just don’t see a solid business case being made to develop search tools that will become a semantic search engine for proprietary treasure troves of code.
Enters old acquaintance Ken Krugler with what seems to be, at first glance, a Web search system that might be helpful for finding useful code out on the Web, including open source. I have finally visited his Web site and I see language and new offerings that intrigue me. “Krugle Enterprise is a valuable tool for anyone involved in software development. Krugle makes software development assets easily accessible and increases the value of a company’s code base. By providing a normalized view into these assets, wherever they may be stored, Krugle delivers value to stakeholders throughout the enterprise.” They could be onto something big. This is a kind of enterprise search I haven’t really had time to think about but may-be I will now.
One thing leading to another, I checked out Ken Krugler’s blog and saw an earlier posting: Is Writing Your Own Search Engine Hard? This is recommended reading for anyone who even dabbles in enterprise search technology but doesn’t want to get her/his hands dirty with the mechanics. It is short, to-the-point and summarizes how and why so many variations of search are battling it out in the marketplace.
I don’t want end-users to struggle too much with the under the hood details but when you are thinking about enterprise search for your organization, it is worth considering how much technology you are getting for the value you want it to deliver, year after year, as your mountains of IP content accrue. Don’t give this idea short shrift because search is an investment that keeps giving if it is chosen appropriately for the problem you need to solve.
Called to account for the nomenclature “enterprise search,” which is my area of practice for The Gilbane Group, I will confess that the term has become as tiresome as any other category to which the marketplace gives full attention. But what is in a name, anyway? It is just a label and should not be expected to fully express every attribute it embodies. A year ago I defined it to mean any search done within the enterprise with a primary focus of internal content. “Enterprise” can be an entire organization, division, or group with a corpus of content it wants to have searched comprehensively with a single search engine.
A search engine does not need to be exclusive of all other search engines, nor must it be deployed to crawl and index every single repository in its path to be referred to as enterprise search. There are good and justifiable reasons to leave select repositories un-indexed that go beyond even security concerns, implied by the label “search behind the firewall.” I happen to believe that you can deploy enterprise search for enterprises that are quite open with their content and do not keep it behind a firewall (e.g. government agencies, or not-for-profits). You may also have enterprise search deployed with a set of content for the public you serve and for the internal audience. If the content being searched is substantively authored by the members of the organization or procured for their internal use, enterprise search engines are the appropriate class of products to consider. As you will learn from my forthcoming study, Enterprise Search Markets and Applications: Capitalizing on Emerging Demand, and that of Steve Arnold (Beyond Search) there are more than a lot of flavors out there, so you’ll need to move down the food chain of options to get it right for the application or problem you are trying to solve.
OK! Are you yet convinced that Microsoft is pitting itself squarely against Google? The Yahoo announcement of an offer to purchase for something north of $44 billion makes the previous acquisition of FAST for $1.2 billion pale. But I want to know how this squares with IBM, which has a partnership with Yahoo in the Yahoo edition of IBM’s OmniFind. This keeps the attorneys busy. Or may-be Microsoft will buy IBM, too.
Finally, this dog fight exposed in the Washington Post caught my eye, or did one of the dogs walk away with his tail between his legs? Google slams Autonomy – now, why would they do that?
I had other plans for this week’s blog but all the Patriots Super Bowl talk puts me in the mode for looking at other competitions. It is kind of fun.