The gradual upturn from the worst economic conditions in decades is reason for hope. A growing economy coupled with continued adoption of enterprise software, in spite of the tough economic climate, keep me tuned to what is transpiring in this industry. Rather than being cajoled into believing that “search” has become commodity software, which it hasn’t, I want to comment on the wisdom of Jill Dyché and her Anti-predictions for 2011 in a recent Information Management Blog. There are important lessons here for enterprise search professionals, whether you have already implemented or plan to soon.
Taking her points out of order, I offer a bit of commentary on those that have a direct relationship to enterprise search. Based on past experience, Ms. Dyché predicts some negative outcomes but with a clear challenge for readers to prove her wrong. As noted, enterprise search offers some solutions to meet the challenges:
- No one will be willing to shine a bright light on the fact that the data on their enterprise data warehouse isn’t integrated. It isn’t just the data warehouse that lacks integration among assets, but among all applications housing critical structured and unstructured content. This does not have to be the case. Several state-of-the-art enterprise search products that are not tied to a specific platform or suite of products do a fine job of federating indexing of disparate content repositories. In a matter of weeks or few months, a search solution can be deployed to crawl, index and search multiple sources of content. Furthermore, newer search applications are being offered for pre-purchase testing for out-of-the-box suitability in pilot or proof-of-concept (POC) projects. Organizations that are serious about integrating content silos have no excuse for not taking advantage of easier to deploy search products.
- Even if they are presented with proof of value, management will be reluctant to invest in data governance. Combat this entrenched bias with a strategy to overcome lack of governance; a cost cutting argument is unlikely to change minds. However, risk is an argument that will resonate, particularly when bolstered with examples. Include instances when customers were lost due to poor performance or failure to deliver adequate support services, sales were lost because answers to qualifying questions could not be answered or were not timely, legal or contract issues could not be defended due to inaccessibility of critical supporting documents, or when maintenance revenue was lost due to incomplete, inaccurate or late renewal information getting out to clients. One simple example is the consequences of not sustaining a concordance of customer name, contact, and address changes. The inability of content repositories to talk to each other or aggregate related information in a search because a Customer labeled as Marion University at one address is the same as the Customer labeled University of Marion at another address will be embarrassing in communications and, even worse, costly. Governance of processes like naming conventions and standardized labeling enhances the value and performance of every enterprise system including search.
- Executives won’t approve new master data management or business intelligence funding without an ROI analysis. This ties in with the first item because many enterprise search applications include excellent tools for performing business intelligence, analytics, and advanced functions to track and evaluate content resource use. The latter is an excellent way to understand who is searching, for what types of data, and the language used to search. These supporting functions are being built into applications for enterprise search and do not add additional cost to product licenses or implementation. Look for enterprise search applications that are delivered with tools that can be employed on an ad hoc basis by any business manager.
- Developers won’t track their time in any meaningful way. This is probably true because many managers are poorly equipped to evaluate what goes into software development. However, in this era of adoption of open source, particularly for enterprise search, organizations that commit to using Lucene or Solr (open source search) must be clear on the cost of building these tools into functioning systems for their specialized purposes. Whether development will be done internally or by a third party, it is essential to place strong boundaries around each project and deployment, with specifications that stage development, milestones and change orders. “Free” open source software is not free or even cost effective when an open meter for “time and materials” exists.
- Companies that don’t characteristically invest in IT infrastructure won’t change any time soon. So, the silo-ed projects will beget more silo-ed data…Because the adoption rate for new content management applications is so high, and the ease for deploying them encourages replication like rabbits, it is probably futile to try to staunch their proliferation. This is an important area for governance to be employed, to detect redundancy, perform analytics across silos, and call attention to obvious waste and duplication of content and effort. Newer search applications that can crawl and index a multitude of formats and repositories will easily support efforts to monitor and evaluate what is being discovered in search results. Given a little encouragement to report redundancy and replicated content, every user becomes a governor over waste. Play on the natural inclination for people to complain when they feel overwhelmed by messy search results, by setting up a simple (click a button) reporting mechanism to automatically issue a report or set a flag in a log file when a search reveals a problem.
It is time to stop treating enterprise search like a failed experiment and instead, leverage it to address some long-standing technology elephants roaming around our enterprises.
To follow other search trends for the coming year, you may want to attend a forthcoming webinar, 11 Trends in Enterprise Search for 2011, which I will be moderating on January 25th. These two blogs also have interesting perspectives on what is in store for enterprise applications: CSI Info-Mgmt: Profiling Predictors 2011, by Jim Ericson and The Hottest BPM Trends You Must Embrace In 2011!, by Clay Richardson. Also, some of Ms. Dyché’s commentary aligns nicely with “best practices” offered in this recent beacon, Establishing a Successful Enterprise Search Program: Five Best Practices
When search fails me, the reasons may be hard to discover as a user but once on the inside of an enterprise I can learn a lot about what is going on. After listening to scores of business case studies, personal experiences and reading about rampant dissatisfaction with search it is discouraging to recognize the simple reasons for most negative outcomes.
Consider this scenario. I was attempting to find the address of the office of a major global platform vendor (one of the largest) that sells an entire suite of enterprise search and content management software products. One can usually find business location information from links on the home page of any corporate Web site or at least from the site representing the division one is visiting. But there was no such link for this corporate site. Then using the “search” box and later the “advanced search” option, trying a dozen variations of the division name, town in which the office is located, and product names I struck out on every query. All paths lead to a page with a single corporate address, or a couple of other remote addresses, and links to web pages that contained no address. Even those pages with addresses had no link to directions. I followed up with queries using Google and these got me back to the same dead-ends. Finally, I found the address through various online non-specific business directories.
This experience lead to a couple of conclusions about why my search failed: 1. The content does not exist; there is no such listing of locations. 2. The search engine is not properly tuned or metadata is not supplied with labels such as “locations,” “directions,” “business offices,” etc. The immediate solution for this case is to ensure that someone with practical business sense and usability competency has ownership of the overall web site experience to make sure that essential company data is available and easy to find. Or, if the company has made a conscious decision not to publish that information, at the least they should have a page stating the alternative for potential visitors as to how they can find their destination or to what office they can direct postal mail.
I had to two reasons for needing this information; one was a visit to an individual who was not available to give me the address in time to reach the office, and the second was a personal follow-up letter after someone from the company had been a speaker at an event I chaired. As things stand, I have been left with personal skepticism about the commitment of this company to build, produce and actually use content management or search products that will be truly responsive to needs of their potential buyers. When you don’t or can’t showcase your products, I question “why.” This is not a technology problem; it is a human factors and human resource allocation problem.
This brings me to some search fundamentals:
- No content – If content that customers or employees expect to find is not included in explicit directives to the search engine for the repositories to be crawled and indexed, it will never be found.
- No metadata – Any content lacking explicit language likely to be used by a searcher will probably not be found if it also lacks sufficient metadata.
- Poor indexing or search rule base – If the content being searched is business documents without many unique contextual “hooks,” such as product names, technical terminology or topics of narrow interest, the search engine being used must be “smart” enough to glean the intent of the searcher from the context of query. In my case, I supplied a half a dozen terms to layer the context, tried them in different combinations, with and without quotations around phrases, but nothing worked.
Conclusion, if you really don’t want searchers to find what they want to find, it is not hard at all to compromise findability. I will not arrive at my destination and you won’t get any first class letters from me.
When you look for an e-mail you sent last week, a vendor account rep’s phone number, a PowerPoint presentation you received from a colleague in the Paris office, a URL to an article recommended for reading before the next Board meeting, or background on a company project you have been asked to manage, you are engaged in search in, about, or for your enterprise. Whether you are working inside applications that you have used for years, or simply perusing the links on a decade’s old corporate intranet, trying to find something when you are in the enterprise doing its work, you are engaging with a search interface.
Dissatisfaction comes from the numbers of these interfaces and the lack of cohesive roadmap to all there is to be found. You already know what you know and what you need to know. Sometimes you know how to find what you need to know but more often you don’t know and stumble through a variety of possibilities up to and including asking someone else how to find it. That missing roadmap is more than an annoyance; it is a major encumbrance to doing your job and top management does not get it. They simply won’t accept that one or two content roadmap experts (overhead) could be saving many people-years of company time and lost productivity.
In most cases, the simple notion of creating clear guidelines and signposts to enterprise content is a funding showstopper. It takes human intelligence to design and build that roadmap and put the technology aids in place to reveal it. Management will fund technology but not the content architects, knowledge “mappers” and ongoing gatekeepers to stay on top of organizational change, expansions, contractions, mergers, rule changes and program activities that evolve and shift perpetually. They don’t want infrastructure overhead whose primary focus, day-in and day-out, will be observing, monitoring, communicating, and thinking about how to serve up the information that other workers need to do their jobs. These people need to be in place as the “black-boxes” that keep search tools in tip-top operating form.
Last week I commented on the products that will be featured in the Search Track at Gilbane Boston, Dec. 3rd and 4th. What you will learn about these tools is going to be couched in case studies that reveal the ways in which search technology is leveraged by people who think a lot about what needs to be found and how search needs to work in their enterprises. They will talk about what tools they use, why and what they are doing to get search to do its job. I’ve asked the speakers to tell their stories and based on my conversations with them in the past week, that is what we will hear, the reality!
While considering what is most important in selecting the search tools for any given enterprise application, I took a few minutes off to look at the New York Times. This article, He Wrote 200,000 Books (but Computers Did Some of the Work), by Noam Cohen, gave me an idea about how to compare Internet search with enterprise search.
A staple of librarians’ reference and research arsenal has been a category of reference material called “bibliographies of bibliographies.” These works, specific to a subject domain, are aimed at a usually scholarly audience to bring a vast amount of content into focus for the researcher. Judging from the article, that is what Mr. Parker’s artificial intelligence is doing for the average person who needs general information about a topic. According to at least one reader, the results are hardly scholarly.
This article points out several things about computerized searching:
- It does a very good job of finding a lot of information easily.
- Generalized Internet searching retrieves only publicly accessible, free-for-consumption, content.
- Publicly available content is not universally vetted for accuracy, authoritativeness, trustworthiness, or comprehensiveness, even though it may be all of these things.
- Vast amounts of accurate, authoritative, trustworthy and comprehensive content does exist in electronic formats that search algorithms used by Mr. Parker or the rest of us on the Internet will never see. That is because it is behind-the-firewall or accessible only through permission (e.g. subscription, need-to-know). None of his published books will serve up that content.
Another concept that librarians and scholars understand is that of primary source material. It is original content, developed (written, recorded) by human beings as a result of thought, new analysis of existing content, bench science, or engineering. It is often judged, vetted, approved or otherwise deemed worthy of the primary source label by peers in the workplace, professional societies or professional publishers of scholarly journals. It is often the substance of what get republished as secondary and tertiary sources (e.g. review articles, bibliographies, books).
We all need secondary and tertiary sources to do our work, learn new things, and understand our work and our world better. However, advances in technology, business operations, and innovation depend on sharing primary source material in thoughtfully constructed domains in our enterprises of business, healthcare, or non-profits. Patient’s laboratory or mechanical device test data that spark creation of primary source content need surrounding context to be properly understood and assessed for value and relevancy.
To be valuable enterprise search needs to deliver context, relevance, opportunities for analysis and evaluation, and retrieval modes that give the best results for any user seeking valid content. There is a lot that computerized enterprise search can do to facilitate this type of research but that is not the whole story. There must still be real people who select the most appropriate search product for that enterprise and that defined business case. They must also decide content to be indexed by the search engine based on its value, what can be secured with proper authentication, how it should be categorized appropriately, and so on. To throw a computer search application at any retrieval need without human oversight is a waste of capital. It will result in disappointment, cynicism and skepticism about the value of automating search because the resulting output will be no better than Mr. Parker’s books.