Tag: Enterprise search implementation

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Enterprise Search is Never Magic

How is it that the blockbuster deals for acquiring software companies that rank highest in their markets spaces seem to end up smelling bad several months into the deals? The latest acquisition to take on taint was written about in the Wall Street Journal today, noting that HP Reports $8.8 Billion Charge on Accounting Misstatement at Autonomy. Not to dispute the fact that enterprise search megastars Fast (acquired by Microsoft) and Autonomy had some terrific search algorithms and huge presence in the enterprise market, there is a lot more to supporting search than the algorithms.

The fact that surrounding support services have always been essential requirements for making these two products successful in deployment has been well documented over the years. Hundreds of system integrators and partner companies to Microsoft and Autonomy do very well making these systems deliver the value that has never been attainable with just out-of-the-box installations. It takes a team of content, search and vocabulary management specialists to deliver excellent results. For any but the largest corporations, the costs and time to achieve full implementation have rarely been justifiable.

Many fine enterprise search products deliver high value at much more reasonable costs, and with much more efficient packaging, shorter deployment times and lower on-going overhead. Never to be ignored is that enterprise search must be accounted for as infrastructure. Without knowing where the accounting irregularities (also true with Fast) actually lay, I suspect that HP bought the brand and the prospective customer relationships only to discover that the real money was being made by partners and integrators, and the software itself was a loss leader. If Autonomy did not bring with it a solid service and integration operation with strong revenues and work in the pipeline, HP could not have gained what it bargained for in the purchase. I “know” nothing but these are my hunches.

Reflecting back on a couple of articles (If a Vendor Spends Enough… and Enterprise Search and Collaboration…) I wrote a couple of years ago, as Autonomy began hyping its enterprise search prowess in Information Week ads, it seems that marketing is all the magic it needed to reel in the biggest fish of all – a sale to HP.

It Takes Work to Get Good-to-Great Enterprise Search

It takes patience, knowledge and analysis to tell when search is really working. For the past few years I have seen a trend away from doing any “dog work” to get search solutions tweaked and tuned to ensure compliance with genuine business needs. People get cut, budgets get sliced and projects dumped because (fill the excuse) and the message gets promoted “enterprise search doesn’t work.” Here’s the secret, when enterprise search doesn’t work the chances are it’s because people aren’t working on what needs to be done. Everyone is looking for a quick fix, short cut, “no thinking required” solution.

This plays out in countless variations but the bottom line is that impatience with human processing time and the assumption that a search engine “ought to be able to” solve this problem without human intervention cripple possibilities for success faster than anything else.

It is time for search implementation teams to get realistic about the tasks that must be executed and milestones to be reached. Teams must know how they are going to measure success and reliability, then to stick with it, demanding that everyone agrees on the requirements before throwing the towel in at the first executive anecdote that the “dang thing doesn’t work.”

There are a lot of steps to getting even an out-of-the-box solution working well. But none is more important than paying attention to these:

  • Know your content
  • Know your search audience
  • Know what needs to be found and how it will be looked for
  • Know what is not being found that should be

The operative verb here is to know and to really know anything takes work, brain work, iterative, analytical and thoughtful work. When I see these reactions from IT upon setting off a search query that returns any results: “we’re done” OR “no error messages, good” OR “all these returns satisfy the query,” my reaction is:

  • How do you know the search engine was really looking in all the places it should?
  • What would your search audience be likely to look for and how would they look?
  • Who is checking to make sure these questions are being answered correctly
  • How do you know if the results are complete and comprehensive?

It is the last question that takes digging and perseverance. It is pretty simple to look at search results and see content that should not have been retrieved and figure out why it was. Then you can tune to make sure it does not happen again.

To make sure you didn’t miss something takes systematic “dog work” and you have to know the content. This means starting with a small body of content that it is possible for you to know, thoroughly. Begin with content representative of what your most valued search audience would want to find. Presumably, you have identified these people through establishing a clear business case for enterprise search. (This is not something for the IT department to do but for the business team that is vested in having search work for their goals.) Get these “alpha worker” searchers to show you how they would go about trying to find the stuff they need to get their work done every day, to share with you some of what they consider some of the most valuable documents they have worked with over the past few years. (Yes, years – you need to work with veterans of the organization whose value is well established, as well as with legacy content that is still valuable.)

Confirm that these seminal documents are in the path of the search engine for the index build; see what is retrieved when they are searched for by the seekers. Keep verifying by looking at both content and results to be sure that nothing is coming back that shouldn’t and that nothing is being missed. Then double the content with documents on similar topics that were not given to you by the searchers, even material that they likely would never have seen that might be formatted very differently, written by different authors, and more variable in type and size but still relevant. Re-run the exact searches that were done originally and see what is retrieved. Repeat in scaling increments and validate at every point. When you reach points where content is missing from results that should have been found using the searcher’s method, analyze, adjust, and repeat.

A recent project revealed to me how willing testers are to accept mediocre results when it became apparent how closely content must be scrutinized and peeled back to determine its relevance. They had no time for that and did not care how bad the results were because they had a pre-defined deadline. Adjustments may call for refinements in the query formulation that might require an API to make it more explicit, or the addition of better category metadata with rich cross-references to cover vocabulary variations. Too often this type of implementation discovery signals a reason to shut down the project because all options require human resources and more time. Before you begin, know that this level of scrutiny will be necessary to deliver good-to-great results; set that expectation for your team and management, so it will be acceptable to them when adjustments are needed for more work to be done to get it right. Just don’t blame it on the search engine – get to work, analyze and fix the problem. Only then can you let search loose on your top target audience.

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑