Month: July 2009

Searching Email in the Enterprise

Last week I wrote about “personalized search” and then a chance encounter at a meeting triggered a new awareness of business behavior that makes my own personalized search a lot different than might work for others. A fellow introduced himself to me as the founder of a start-up with a product for searching email. He explained that countless nuggets of valuable information reside in email and will never be found without a product like the one his company had developed. I asked if it only retrieved emails that were resident in an email application like Outlook; he looked confused and said “yes.” I commented that I leave very little content in my email application but instead save anything with information of value in the appropriate file folders with other documents of different formats on the same topic. If an attachment is substantive, I may create a record with more metadata in my content management database so that I can use the application search engine to find information germane to projects I work on. He walked away with no comment, so I have no idea what he was thinking.

It did start me thinking about the realities of how individuals dispose of, store, categorize and manage their work related documents. My own process goes like this. My work content falls into four broad categories: products and vendors, client organizations and business contacts, topics of interest, and local infrastructure related materials. When material is not purposed for a particular project or client but may be useful for a future activity, it gets a metadata record in the database and is hyperlinked to the full-text. The same goes for useful content out on the Web.

When it comes to email, I discipline myself to dispose of all email into its appropriate folder as soon as I can. Sometimes this involves two emails, the original and my response. When the format is important I save it in the *.mht format (it used to be *.htm until I switched to Office 2007 and realized that doing so created a folder for every file saved); otherwise, I save content in *.txt format. I rename every email to include a meaningful description including topic, sender and date so that I can identify the appropriate email when viewing a folder. If there is an attachment it also gets an appropriate title and date, is stored in its native format and the associated email has “cover” in the file name; this helps associate the email and attachment. The only email that is saved in Outlook in personal folders is current activity where lots of back and forth is likely to occur until a project is concluded. Then it gets disposed of by deleting, or with the project file folders as described above. This is personal governance that takes work. Sometimes I hit a wall and fall behind on the filtering and disposing but I keep at it because it pays off in the long term.

So, why not relax and leave it all in Outlook, then let a search engine do the retrieval? Experience had revealed that most emails are labeled so poorly by senders and the content is so cryptic that to expect a search engine to retrieve it in a particular context or with the correct relevance would be impossible. I know this from the experience of having to preview dozens of emails stored in folders for projects that are active. I have decided to give myself the peace of mind that when the crunch is on, and I really need to go to that vendor file and retrieve what they sent me in March of last year, I can get it quickly in a way that no search engine could ever do. Do you realize how much correspondence you receive from business contacts using their “gmail” account with no contact information revealing their organization in the body and signed with a nickname like “Bob” and messages “like we’re releasing the new version in four weeks” or that just have a link to an important article on the web with “thought this would interest you?”

I did not have a chance to learn if my new business acquaintance had any sense of the amount of competition he has out there for email search, or what his differentiator is that makes a compelling case for a search product that only searches through email, or what happens to his product when Microsoft finally gets FAST search bundled to work with all Office products. OR, perhaps the rest of the world is storing all content in Outlook. Is this true? If so, he may have a winner.

Personalized Search in the Enterprise

This is an interesting topic for two reasons: there is enormous diversity in the ways we all think and go about finding content; personalizing a search interface without being intrusive is extremely difficult. Any technology that requires us to do activities according to someone else’s design, which bends our natural inclination, is by definition not going to be personal.

This topic comes to mind because of two unrelated pieces of content I read in the past 24 hours. The first was an email asking me about personal information management and automated tagging, and the second was an interview I read with Mike Moran, a thought leader in search and speaker at one of our Gilbane Conferences. In the interview, Mike talks about personalized search. Then Information Week referenced search personalization in an article about a patent suit against Google.

Here is my take on the many personalized search themes that have recently emerged. From dashboards to customizing results, options to focus on particular topics or types of content, socialized search to support interacting with and sharing results, to retrieving content we personally created or received (email), content we used or were named in, all might be referred to as search personalization. Getting each to work well will enhance enterprise search but….

Knowing how transient and transformative our thoughts and behaviors really are, we should focus realistically on the complexity of producing software tools and services that satisfy and enhance personal findability. We are ambiguous beings, seeking structured equilibrium in many of our activities to create efficiency and reduce anxiety, while desiring new, better, quicker and smarter devices to excite and engage us. Once we achieve a level of comfort with a method or mechanism, whether quickly or over time, we evolve and seek change. But, when change is imposed on an unprepared mind, our emotions probably override any real benefit that might be gained in productivity. Then we tend to self-sabotage the potential for operational usefulness when an uncomfortable process intrudes. Mental lack of preparedness undermines our work when a new design demands a behavioral shift that lacks connection to our current state or past experiences. How often are we just not in a frame of mind to take on something totally alien, especially with deadlines looming?

Look at the single most successful aspect of Google, minimalism in its interface. One did not need to wade through massively dense graphics scrambled with text in disordered layouts to figure out what to do when Google first appeared. The focus was immediately obvious.

I am presenting this challenge to vendors; there is a need to satisfy a huge array of personal preferences while introducing a minimal amount of change in any one release. Easy adoption requires that new products be simple. Usefulness must be quickly obvious to multiple audiences.

I am presenting this challenge to technology users; focus your appetite. Decide before shopping or adopting new tools what would bring the most immediate productivity gain and personal adoptability for maximum efficiency. Think about how defeated you feel when approaching a new release of an upgraded product that has added so many new “bells and whistles” that you are consumed with trying to rediscover all the old functions and features that gave your workflow a comfortable structure. Think carefully about how much learning and re-adjusting will be needed if you decide on technology that promises to do everything, with unlimited personalization. It may be possible, but does it really feel personally acceptable.

Semantic Search has Its Best Chance for Successes in the Enterprise

I am expecting significant growth in the semantic search market over the next five years with most of it focused on enterprise search. The reasons are pretty straightforward:

  • Semantic search is very hard and to scale it to the Web compounds the complexity.
  • Because the semantic Web is so elusive and results have been spotty with not much traction, it will be some time before it can be easily monetized.
  • Like many things that are highly complex, a good model will be to break the challenge of semantic search into smaller targeted business problems where focus is on a particular audience seeking content from a narrower domain.

I base this predication on my observation of the on-going struggle for organizations to get a strong framework in place to manage content effectively. By effectively I mean, establishing solid metadata, governance and publishing protocols that ensure that the best information knowledge workers produce is placed in range for indexing and retrieval. Sustained discipline and the people to exercise it just aren’t being employed in many enterprises to make this happen in a cohesive and comprehensive fashion. I have been discouraged by the number of well-intentioned projects I have seen flounder because organizations just can’t commit long-term or permanent human resources to the activity of content governance. Sometimes it is just on-again-off-again. What enterprises need are people with deep knowledge about the organization and how its content fits together in a logical framework for all types of knowledge workers. Instead, organizations tend to assign this job to external consultants or low-level staffers who are not well-grounded in the work of the particular enterprise. The results are predictably disappointing.

Enter semantic search technologies where there are multiple algorithmic tools available to index and retrieve content for complex and multi-faceted queries. Specialized semantic technologies are often well suited to shorter term projects for which domain specific vocabularies can be built more quickly with good results. Maintaining targeted vocabulary ontologies for a focused topic can be done with fewer human resources and a carefully bounded ontology can become an intelligent feed to a semantic search engine, helping it index with better precision and relevance.
This scenario is proposed with one caveat; enterprises must commit to having very smart people with enterprise expertise to build the ontology. Having a consultant coach the subject matter expert in method, process and maintenance guidelines for doing so is not a bad idea but the consultant has to prepare the enterprise for sustainability after exiting the scene.

The wager here is that enterprises can ramp up semantic search with a series of short, targeted projects, each of which establishes a goal of solving one business problem at a time and committing to efficient and accurate content retrieval as part of the solution. By learning what works well in each situation, intranet web retrieval will improve systematically and thoughtfully. The ramp to a better semantic Web will be paved with these interlocking pieces.

Keep an eye on these companies to provide technologies for point solutions in business critical applications: Basis Technology, Cognition Technology, Connotate, Expert Systems, Lexalytics, Linguamatics, Metatomix, Semantra, Sinequa and Temis.

It Takes Work to Get Good-to-Great Enterprise Search

It takes patience, knowledge and analysis to tell when search is really working. For the past few years I have seen a trend away from doing any “dog work” to get search solutions tweaked and tuned to ensure compliance with genuine business needs. People get cut, budgets get sliced and projects dumped because (fill the excuse) and the message gets promoted “enterprise search doesn’t work.” Here’s the secret, when enterprise search doesn’t work the chances are it’s because people aren’t working on what needs to be done. Everyone is looking for a quick fix, short cut, “no thinking required” solution.

This plays out in countless variations but the bottom line is that impatience with human processing time and the assumption that a search engine “ought to be able to” solve this problem without human intervention cripple possibilities for success faster than anything else.

It is time for search implementation teams to get realistic about the tasks that must be executed and milestones to be reached. Teams must know how they are going to measure success and reliability, then to stick with it, demanding that everyone agrees on the requirements before throwing the towel in at the first executive anecdote that the “dang thing doesn’t work.”

There are a lot of steps to getting even an out-of-the-box solution working well. But none is more important than paying attention to these:

  • Know your content
  • Know your search audience
  • Know what needs to be found and how it will be looked for
  • Know what is not being found that should be

The operative verb here is to know and to really know anything takes work, brain work, iterative, analytical and thoughtful work. When I see these reactions from IT upon setting off a search query that returns any results: “we’re done” OR “no error messages, good” OR “all these returns satisfy the query,” my reaction is:

  • How do you know the search engine was really looking in all the places it should?
  • What would your search audience be likely to look for and how would they look?
  • Who is checking to make sure these questions are being answered correctly
  • How do you know if the results are complete and comprehensive?

It is the last question that takes digging and perseverance. It is pretty simple to look at search results and see content that should not have been retrieved and figure out why it was. Then you can tune to make sure it does not happen again.

To make sure you didn’t miss something takes systematic “dog work” and you have to know the content. This means starting with a small body of content that it is possible for you to know, thoroughly. Begin with content representative of what your most valued search audience would want to find. Presumably, you have identified these people through establishing a clear business case for enterprise search. (This is not something for the IT department to do but for the business team that is vested in having search work for their goals.) Get these “alpha worker” searchers to show you how they would go about trying to find the stuff they need to get their work done every day, to share with you some of what they consider some of the most valuable documents they have worked with over the past few years. (Yes, years – you need to work with veterans of the organization whose value is well established, as well as with legacy content that is still valuable.)

Confirm that these seminal documents are in the path of the search engine for the index build; see what is retrieved when they are searched for by the seekers. Keep verifying by looking at both content and results to be sure that nothing is coming back that shouldn’t and that nothing is being missed. Then double the content with documents on similar topics that were not given to you by the searchers, even material that they likely would never have seen that might be formatted very differently, written by different authors, and more variable in type and size but still relevant. Re-run the exact searches that were done originally and see what is retrieved. Repeat in scaling increments and validate at every point. When you reach points where content is missing from results that should have been found using the searcher’s method, analyze, adjust, and repeat.

A recent project revealed to me how willing testers are to accept mediocre results when it became apparent how closely content must be scrutinized and peeled back to determine its relevance. They had no time for that and did not care how bad the results were because they had a pre-defined deadline. Adjustments may call for refinements in the query formulation that might require an API to make it more explicit, or the addition of better category metadata with rich cross-references to cover vocabulary variations. Too often this type of implementation discovery signals a reason to shut down the project because all options require human resources and more time. Before you begin, know that this level of scrutiny will be necessary to deliver good-to-great results; set that expectation for your team and management, so it will be acceptable to them when adjustments are needed for more work to be done to get it right. Just don’t blame it on the search engine – get to work, analyze and fix the problem. Only then can you let search loose on your top target audience.

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑