Archive for Types of Search

Search: a Term for the Long Haul, But…

There is no question that language influences marketing success; positioning software products has been a game of out-shining competitors with clever slogans and crafty coined terminology. Having been engaged with search technologies since 1974, and as the architect of a software application for enterprise content indexing and retrieval, I’ve observed how product positioning has played out in the enterprise search market over the years. When there is a new call for re-labeling “search,” the noun defining software designed for retrieving electronic content, I reflect on why and whether a different term would suffice.

Here is why a new term is not needed and the reasons why. For the definition of software algorithms that are the underpinning of finding and retrieving electronic content, regardless of native format, the noun search is efficient, to-the-point, unambiguous and direct.

We need a term that covers this category of software that will stand the test of time, as has automobile, which originated after terms too numerous to fully list had been tested: horseless buggy, self-contained power plant, car, motor vehicle, motor buggy, road engine, steam-powered wheeled vehicles, electric carriage, and motor wagon to name a few. Finally a term defined as a self-powered vehicle, was coined, “automobile.” It covered all types of self-powered “cars,” not just those pulled by another form of locomotive as is a rail car. Like the term “search,” automobiles are often qualified by modifiers, such as “electric,” “hybrid” or “sedan” versus “station wagon.” Search may be coupled with “Web” versus “Enterprise,” or “embedded” versus “stand-alone.” In the field of software technology we need and generally understand the distinctions.

So, I continue to be mystified by rhetoric that demands a new label but I am willing to concede where we need to be more precise, and that may be what the crowd is really saying. When and where the term is applied deserves reconsideration. Technologists who build and customize search software should be able to continue with the long established lingo, but marketers and conferences or meetings to educate a great variety of search users could probably do a better job of expressing what is available to non-techies. As one speaker at Enterprise Search Europe 2013 (ESEu2013) stated and others affirmed, “search” is not a project and to that I will add, nor is it a single product. Instead it is core to a very large and diverse range of products.

Packaging Software that includes Search Technology

Vendors are obviously aware of where they need to be marketing and the need to package for their target audience. There are three key elements that have contributed to ambiguity and resulted in a lethargic reaction in the so-called enterprise search marketplace in recent years: overly complex and diffuse categorization, poor product labeling and definition, and usability and product interface design that does not reflect an understanding of the true audience for a product. What can be done to mitigate confusion?

  1. Categorizing what is being offered has to speak to the buyer and potential user. When a single product is pitched to a dozen different market categories (text mining, analytics, content management, metadata management, enterprise search, big data management, etc.) buyers are skeptical and wary of all-in-one claims. While there are software packages that incorporate many or elements of a variety of software applications, diffusion ends up fracturing the buying audience into such minute numbers that a vendor does not gain real traction across the different types of needs. Recommendation: a product must be categorized to its greatest technical strengths and the largest audience to which it will appeal. The goal is to be a strong presence in the specific marketplaces where those buyers go to seek products. When a product has outstanding capabilities for that audience, buyers will be delighted to also find additional ancillary functions and features that are already built in.
  2. Software that is built on search algorithms or that embeds search must be packaged with labeling that pays attention to a functional domain and the target audience. Clear messaging that speaks to the defined audience is the wrapper for the product. It must state what and why you have a presence in this marketplace, the role the product plays and the professional functions that will benefit from its use. Messaging is how you let the audience know that you have created tools for them.
  3. Product design requires a deep understanding of professional users and their modes of pursuing business goals. At ESEu2013 several presentations and one workshop focused on usability and design; speakers all shared a deep understanding of differences across professional users. They recognized behavioral, cultural, geographic and mode preferences as key considerations without stating explicitly that different professional groups each work in unique ways. I assert that this is where so many applications break-down in design and implementation. Workflow design, look-and-feel, and product features must be very different for someone in accounting or finance versus an engineer or attorney. Highly successful software applications are generally initiated and development is sustained by professionals who need these tools to do their work, their way. Without deep professional knowledge embedded in product design teams, products often miss the market’s demands. Professionals bring know-how, methods and practices to their jobs and it is not the role of software developers to change the way they go about their business by forcing new models that are counter to what is intuitive in a market segment.

Attention to better software definition leads to the next topic.

Conference and meeting themes: Search technology versus business problems to be solved

Attention to conference and meeting content was the reason for this post. Having given an argument for keeping the noun search in our vocabulary, I have also acknowledged that it is probably a failed market strategy to label and attach messaging to every software product with search as either, enterprise search or web search. Because search is everywhere in almost every software application, we need conferences with exhibits that target more differentiated (and selective) audiences.

The days of generic all-in-one meetings like AIIM, the former National Online Meeting (Information Today’s original conference), E2, and so on may have run their course. As a failed conference attendee, my attention span lasts for about one hour maximum, and results in me listening to no more than a half dozen exhibitor pitches before I become a wandering zombie, interested in nothing in particular because there is nothing specific to be drawn to at these mega-conferences.

I am proposing a return to professionally oriented programs that focus on audience and business needs. ESEu2013 had among its largest cohort, developers and software implementers. There were few potential users, buyers, content or metadata managers, or professional search experts but these groups seek a place to learn about products without slides showing snippets of programming code. There is still a need for meetings that include the technologists but it is difficult to attract them to a meeting that only offers programming sessions for users, the people for whom they will develop products. How do we get them into a dialogue with the very people for whom they are developing and designing products? How can vendors exhibit and communicate their capabilities for solving a professional problem when their target professional audience is not in the room.

At Enterprise Search Europe 2013, the sessions were both diverse and enlightening but, as I noted at the conference wrap-up, each track spoke to a unique set of enterprise needs and variety of professional interests. The underlying technology, search, was the common thread and yet each track might have been presented in a totally different meeting environment. One topic, Big Data, presents challenges that need explaining and information seekers come to learn about products for effectively leveraging it in a number of enterprise environments. These cases need to be understood as business problems, which call for unique software applications not just some generic search technology. Big data can and is already being offered as a theme for an entire conference where the emphasis on aspects of search technology is included. As previously noted topics related to big data problems vary: data and text mining, analytics, semantic processing aka natural language processing, and federation. However, data and text mining for finance has a totally different contextual relevance than for scientists engaged in genomics or targeted drug therapy research, and each audience looks for solutions in its field.

So, let’s rethink what each meeting is about, who needs to be in the room for each business category, what products are clearly packaged for the audience and the need, and schedule programs that bring developers, implementers, buyers and users into a forum around specially packaged software applications for meaningful dialogue. All of this is said with sincere respect for my colleagues who have suggested terms that range from “beyond search” to “discovery” and “findability” as alternative to “search. Maybe the predominant theme of the next Enterprise Search conference should be Information Seeking: Needs, Behaviors and Applications with tracks organized accordingly.

[NOTE: Enterprise Search Europe had excellent sessions and practical guidance. Having given a “top of mind” reaction to what we need to gain a more diverse audience in the future, my next post will be a litany of the best observations, recommendations and insights from the speakers.]

Search Engines; They’ve Been Around Longer Than You Think

It dates me, as well as search technology, to acknowledge that an article in Information Week by Ken North containing Medlars and Twitter in the title would be meaningful. Discussing search requires context, especially when trying to convince IT folks that special expertise is required to do search really well in the enterprise, and it is not something acquired in computer science courses.

Evolution of search systems from the print indexes of the early 1900s such as Index Medicus (National Library of Medicine’s index to medical literature) and Chemical Abstracts to the advent of the online Medical Literature Analysis and Retrieval System (Medlars) in the 1960s was slow. However, the phases of search technology evolution since the launch of Medlars has hardly been warp speed. This article is highly recommended because it gives historical context to automated search while defining application and technology changes over the past 50 years. The comparison between Medlars and Twitter, as search platforms is fascinating, something that would never have occurred to me to explore.

A key point of the article is the difference between a system of search designed for archival content with deeply hierarchical categorization for a specialized corpus versus a system of highly transient, terse and topically generalized content. Last month I commented on the need to have search present in your normal work applications and this article underscores an enormous range of purpose for search. Information of a short temporal nature and scholarly research each have a place in the enterprise but it would be a stretch to think of searching for both types via a single search interface. Wanting to know what a colleague is observing or learning at a conference is very different than researching the effects of a uranium exposure on the human anatomy.

What have not changed much in the world of applied search technology are the reasons we need to find information and how it becomes accessible. The type of search done in Twitter or on LinkedIn today is for information that we used to pick up from a colleague (in person or on the phone) or in industry daily or weekly news publications. That’s how we found the name of an expert, learned the latest technologies being rolled out at a conference or got breaking news on a new space material being tested. What has changed is the method of retrieval but not by a lot, and the relative efficiency may not be that great. Today, we depend on a lot of pre-processing of information by our friends and professional colleagues to park information where we can pick it up on the spur of the moment – easy for us but someone still spends the time to put it out there where we can grab it.

On the other end of the spectrum is that rich research content that still needs to be codified and revealed to search engines with appropriate terminology so we can pursue in-depth searching to get precisely relevant and comprehensive results. Technology tools are much better at assisting us with content enhancement to get us the right and complete results, but humans still write the rules of indexing and curate the vocabularies needed for classification.

Fifty years is a long time and we are still trying to improve enterprise search. It only takes more human work to make it work better.

Convergence of Enterprise Search and Text Analytics is Not New

Prompted by the news item about IBM’s bid for SPSS and similar acquisitions by Oracle, SAP and Microsoft made me think about the predictions of more business intelligence (BI) capabilities being conjoined with enterprise search. But why now and what is new about pairing search and BI? They have always been complementary, not only for numeric applications but also for text analysis. Another article by John Harney in KMWorld referred to the “relatively new technology of text analytics” for analyzing unstructured text. The article is a good summary of some newer tools but the technology itself has had a long shelf life, too long for reasons which I’ll explore later.

Like other topics in this blog this one requires a readjustment in thinking by technology users. One of the great things about digitizing text was the promise of ways in which it could be parsed, sorted and analyzed. With heavy adoption of databases that specialized in textual, as well as numeric and date data fields for business applications in the 1960s and 70s, it became much easier for non-technical workers to look at all kinds of data in new ways. Early database applications leveraged their data stores using command languages; the better ones featured statistical analysis and publication quality report builders. Three that I was familiar with were DRS from ADM, Inc., BASIS from Battelle Columbus Labs and INQUIRE from IBM.

Tools that accompanied database back-ends had the ability to extract, slice and dice the database content, including very large text fields to report: word counts, phrase counts (breaking on any delimiter), transaction counts, relationships among data elements across associated record types, ability to create relationships on the fly, report expert activity and working documents, and describe distribution of resources. These are just a few examples of how new content assets could be created for export in minutes. In particular, a sort command with DRS had histogram controls that were invaluable to my clients managing corporate document and records collections, news clippings files, photographs, patents, etc. They could evaluate their collections by topic, date ranges, distribution, source, and so on, at any time.

So, there existed years ago the ability to connect data structures and use a command language to formulate new data models that informed and elucidated how information was being used in the organization, or to illustrate where there were holes in topics related to business initiatives. What were the barriers to wide-spread adoption? Upon reflection, I came to realize that extracting meaningful content from database in new and innovative formats requires a level of abstract thinking for which most employees are not well-trained. Putting descriptive data into a database via a screen form, then performing a transaction on the object of that data on another form, and then adding more data about another similar but different object are isolated in the database user’s experience and memory. The typical user is not trained to think about how the pieces of data might be connected in the database and therefore is not likely to form new ideas of how it can all be extracted in a report with new information about the content. There is a level of abstraction that eludes most workers whose jobs consist of a lot of compartmentalized tasks.

It was exciting to encounter prospects that really grasped the power of these tools and were excited to push the limits of the command language and reporting applications, but they were scarce. It turned out that our greatest use came in applying text analytics to the extraction of valuable information from our customer support database. A rigorously disciplined staff populated it after every support call with not only demographic information about the nature of the call, linked to a customer record that had been created back at the first contact during the sales process (with appropriate updates along the way in the procurement process) but also a textual description of the entire transaction. Over time this database was linked to a “wish list” database and another “fixes” database and the entire networked structure provided extremely valuable reports that guided both development work and documentation production. We also issued weekly summary reports to the entire staff so everyone was kept informed about product conditions and customer relationships. The reporting tools provided transparency to all staff about company activity and enabled an early version of “social search collaboration.”

Current text analytics products have significantly more algorithmic horsepower than the old command languages. But making the most of their potential and transforming them into utilities that any knowledge worker can leverage will remain a challenge for vendors in the face of poor abstract reasoning among much of the work force. The tools have improved but maybe not in all the ways they need to for widespread adoption. Workers should not have to be dependent on IT folks to create that unique analysis report that reveals a pattern or uncovers product flaws described by multiple customers. We expect workers to multitask, have many aptitudes and skills, and be self-servicing in so many aspects of their work, but for them to flourish the tools fall short too often. I’m putting in a big plug for text analytics for the masses, soon, so that enterprise search begins to deliver more than personalized lists of results for one person at a time. Give more reporting power to the user.

Searching Email in the Enterprise

Last week I wrote about “personalized search” and then a chance encounter at a meeting triggered a new awareness of business behavior that makes my own personalized search a lot different than might work for others. A fellow introduced himself to me as the founder of a start-up with a product for searching email. He explained that countless nuggets of valuable information reside in email and will never be found without a product like the one his company had developed. I asked if it only retrieved emails that were resident in an email application like Outlook; he looked confused and said “yes.” I commented that I leave very little content in my email application but instead save anything with information of value in the appropriate file folders with other documents of different formats on the same topic. If an attachment is substantive, I may create a record with more metadata in my content management database so that I can use the application search engine to find information germane to projects I work on. He walked away with no comment, so I have no idea what he was thinking.
It did start me thinking about the realities of how individuals dispose of, store, categorize and manage their work related documents. My own process goes like this. My work content falls into four broad categories: products and vendors, client organizations and business contacts, topics of interest, and local infrastructure related materials. When material is not purposed for a particular project or client but may be useful for a future activity, it gets a metadata record in the database and is hyperlinked to the full-text. The same goes for useful content out on the Web.
When it comes to email, I discipline myself to dispose of all email into its appropriate folder as soon as I can. Sometimes this involves two emails, the original and my response. When the format is important I save it in the *.mht format (it used to be *.htm until I switched to Office 2007 and realized that doing so created a folder for every file saved); otherwise, I save content in *.txt format. I rename every email to include a meaningful description including topic, sender and date so that I can identify the appropriate email when viewing a folder. If there is an attachment it also gets an appropriate title and date, is stored in its native format and the associated email has “cover” in the file name; this helps associate the email and attachment. The only email that is saved in Outlook in personal folders is current activity where lots of back and forth is likely to occur until a project is concluded. Then it gets disposed of by deleting, or with the project file folders as described above. This is personal governance that takes work. Sometimes I hit a wall and fall behind on the filtering and disposing but I keep at it because it pays off in the long term.
So, why not relax and leave it all in Outlook, then let a search engine do the retrieval? Experience had revealed that most emails are labeled so poorly by senders and the content is so cryptic that to expect a search engine to retrieve it in a particular context or with the correct relevance would be impossible. I know this from the experience of having to preview dozens of emails stored in folders for projects that are active. I have decided to give myself the peace of mind that when the crunch is on, and I really need to go to that vendor file and retrieve what they sent me in March of last year, I can get it quickly in a way that no search engine could ever do. Do you realize how much correspondence you receive from business contacts using their “gmail” account with no contact information revealing their organization in the body and signed with a nickname like “Bob” and messages “like we’re releasing the new version in four weeks” or that just have a link to an important article on the web with “thought this would interest you?”
I did not have a chance to learn if my new business acquaintance had any sense of the amount of competition he has out there for email search, or what his differentiator is that makes a compelling case for a search product that only searches through email, or what happens to his product when Microsoft finally gets FAST search bundled to work with all Office products. OR, perhaps the rest of the world is storing all content in Outlook. Is this true? If so, he may have a winner.

Personalized Search in the Enterprise

This is an interesting topic for two reasons: there is enormous diversity in the ways we all think and go about finding content; personalizing a search interface without being intrusive is extremely difficult. Any technology that requires us to do activities according to someone else’s design, which bends our natural inclination, is by definition not going to be personal.
This topic comes to mind because of two unrelated pieces of content I read in the past 24 hours. The first was an email asking me about personal information management and automated tagging, and the second was an interview I read with Mike Moran, a thought leader in search and speaker at one of our Gilbane Conferences. In the interview, Mike talks about personalized search. Then Information Week referenced search personalization in an article about a patent suit against Google.
Here is my take on the many personalized search themes that have recently emerged. From dashboards to customizing results, options to focus on particular topics or types of content, socialized search to support interacting with and sharing results, to retrieving content we personally created or received (email), content we used or were named in, all might be referred to as search personalization. Getting each to work well will enhance enterprise search but….
Knowing how transient and transformative our thoughts and behaviors really are, we should focus realistically on the complexity of producing software tools and services that satisfy and enhance personal findability. We are ambiguous beings, seeking structured equilibrium in many of our activities to create efficiency and reduce anxiety, while desiring new, better, quicker and smarter devices to excite and engage us. Once we achieve a level of comfort with a method or mechanism, whether quickly or over time, we evolve and seek change. But, when change is imposed on an unprepared mind, our emotions probably override any real benefit that might be gained in productivity. Then we tend to self-sabotage the potential for operational usefulness when an uncomfortable process intrudes. Mental lack of preparedness undermines our work when a new design demands a behavioral shift that lacks connection to our current state or past experiences. How often are we just not in a frame of mind to take on something totally alien, especially with deadlines looming?
Look at the single most successful aspect of Google, minimalism in its interface. One did not need to wade through massively dense graphics scrambled with text in disordered layouts to figure out what to do when Google first appeared. The focus was immediately obvious.
I am presenting this challenge to vendors; there is a need to satisfy a huge array of personal preferences while introducing a minimal amount of change in any one release. Easy adoption requires that new products be simple. Usefulness must be quickly obvious to multiple audiences.
I am presenting this challenge to technology users; focus your appetite. Decide before shopping or adopting new tools what would bring the most immediate productivity gain and personal adoptability for maximum efficiency. Think about how defeated you feel when approaching a new release of an upgraded product that has added so many new “bells and whistles” that you are consumed with trying to rediscover all the old functions and features that gave your workflow a comfortable structure. Think carefully about how much learning and re-adjusting will be needed if you decide on technology that promises to do everything, with unlimited personalization. It may be possible, but does it really feel personally acceptable.

Semantic Search has Its Best Chance for Successes in the Enterprise

I am expecting significant growth in the semantic search market over the next five years with most of it focused on enterprise search. The reasons are pretty straightforward:
• Semantic search is very hard and to scale it to the Web compounds the complexity.
• Because the semantic Web is so elusive and results have been spotty with not much traction, it will be some time before it can be easily monetized.
• Like many things that are highly complex, a good model will be to break the challenge of semantic search into smaller targeted business problems where focus is on a particular audience seeking content from a narrower domain.
I base this predication on my observation of the on-going struggle for organizations to get a strong framework in place to manage content effectively. By effectively I mean, establishing solid metadata, governance and publishing protocols that ensure that the best information knowledge workers produce is placed in range for indexing and retrieval. Sustained discipline and the people to exercise it just aren’t being employed in many enterprises to make this happen in a cohesive and comprehensive fashion. I have been discouraged by the number of well-intentioned projects I have seen flounder because organizations just can’t commit long-term or permanent human resources to the activity of content governance. Sometimes it is just on-again-off-again. What enterprises need are people with deep knowledge about the organization and how its content fits together in a logical framework for all types of knowledge workers. Instead, organizations tend to assign this job to external consultants or low-level staffers who are not well-grounded in the work of the particular enterprise. The results are predictably disappointing.
Enter semantic search technologies where there are multiple algorithmic tools available to index and retrieve content for complex and multi-faceted queries. Specialized semantic technologies are often well suited to shorter term projects for which domain specific vocabularies can be built more quickly with good results. Maintaining targeted vocabulary ontologies for a focused topic can be done with fewer human resources and a carefully bounded ontology can become an intelligent feed to a semantic search engine, helping it index with better precision and relevance.
This scenario is proposed with one caveat; enterprises must commit to having very smart people with enterprise expertise to build the ontology. Having a consultant coach the subject matter expert in method, process and maintenance guidelines for doing so is not a bad idea but the consultant has to prepare the enterprise for sustainability after exiting the scene.
The wager here is that enterprises can ramp up semantic search with a series of short, targeted projects, each of which establishes a goal of solving one business problem at a time and committing to efficient and accurate content retrieval as part of the solution. By learning what works well in each situation, intranet web retrieval will improve systematically and thoughtfully. The ramp to a better semantic Web will be paved with these interlocking pieces.
Keep an eye on these companies to provide technologies for point solutions in business critical applications: Basis Technology, Cognition Technology, Connotate, Expert Systems, Lexalytics, Linguamatics, Metatomix, Semantra, Sinequa and Temis.

Open Source Search & Search Appliances Need Expert Attention

Search in the enterprise suffers from lack of expert attention to tuning, care and feeding, governance and fundamental understanding of what functionality comes with any one of the 100+ products now on the market. This is just as true for search appliances, and open source search tools (Lucene) and applications (Solr). But while companies licensing search out-of-the-box solutions or heavily customized search engines have service, support and upgrades built-in into their deliverables, the same level of support cannot be assumed for getting started with open source search or even appliances.

Search appliances are sold with licenses that imply some high level of performance without a lot of support, while open source search tools are downloadable for free. As speakers about both open source and appliances made perfectly clear at our recent Gilbane Conference, both come with requirements for human support. When any enterprise search product or tool is selected and procured, there is a presumed business case for acquisition. What acquirers need to understand above all else is the cost of ownership to achieve the expected value. This means people and people with expertise on an ongoing basis.

Particularly when budgets are tight and organizations lay off workers, we discover that those with specialized skills and expertise are often the first to go. The jack-of-all-trades, or those with competencies in maintaining ubiquitous applications are retained to be “plugged in” wherever needed. So, where does this leave you for support of the search appliance that was presumed to be 100% self-maintaining, or the open source code that still needs bug fixes, API development and interface design-work?

This is the time to look to system integrators and service companies with specialists in tools you use. They are immersed in the working innards of these products and will give you better support through service contracts, subscriptions or labor-based hourly or project charges than you would have received from your in-house generalists, anyway.

You may not see specialized system houses or service companies listed by financial publications as a growth business, but I am going to put my confidence in the industry to spawn a whole new category of search service organizations in the short term. Just-in-time development for you and lower overhead for your enterprise will be a growing swell in 2009. This is how outsourcing can really bring benefits to your organization.

Post-post note – Here is a related review on the state-of-open source in the enterprise: The Open Source Enterprise; its time has come, by Charles Babcock in Information Week, Nov. 17, 2008. Be sure to read the comments, too.

Search Engines Under the Hood

This week’s thoughts come from the pile of serendipitous reading that routinely piles up on my desk. In this case a short article in Information Week caught my eye because it featured the husband of a former neighbor, Ken Krugler, co-founder of Krugle. I’d set it aside because a fellow, David Eddy, in my knowledge management forum group keeps telling us that we need tools to facilitate searching for old but still useful source code. In order to do it, he believes, we need an investment in semantic search tools that normalize the voluminous language variants scattered throughout source code. That would enable programmers to find code that could be re-purposed in new applications.
Now, I have taken the position that source code is just one set of intellectual property (IP) asset that is wasted, abandoned and warehoused for technology archaeologists of centuries hence. I just don’t see a solid business case being made to develop search tools that will become a semantic search engine for proprietary treasure troves of code.
Enters old acquaintance Ken Krugler with what seems to be, at first glance, a Web search system that might be helpful for finding useful code out on the Web, including open source. I have finally visited his Web site and I see language and new offerings that intrigue me. “Krugle Enterprise is a valuable tool for anyone involved in software development. Krugle makes software development assets easily accessible and increases the value of a company’s code base. By providing a normalized view into these assets, wherever they may be stored, Krugle delivers value to stakeholders throughout the enterprise.” They could be onto something big. This is a kind of enterprise search I haven’t really had time to think about but may-be I will now.
One thing leading to another, I checked out Ken Krugler’s blog and saw an earlier posting: Is Writing Your Own Search Engine Hard? This is recommended reading for anyone who even dabbles in enterprise search technology but doesn’t want to get her/his hands dirty with the mechanics. It is short, to-the-point and summarizes how and why so many variations of search are battling it out in the marketplace.
I don’t want end-users to struggle too much with the under the hood details but when you are thinking about enterprise search for your organization, it is worth considering how much technology you are getting for the value you want it to deliver, year after year, as your mountains of IP content accrue. Don’t give this idea short shrift because search is an investment that keeps giving if it is chosen appropriately for the problem you need to solve.

Search Behind the Firewall aka Enterprise Search

Called to account for the nomenclature “enterprise search,” which is my area of practice for The Gilbane Group, I will confess that the term has become as tiresome as any other category to which the marketplace gives full attention. But what is in a name, anyway? It is just a label and should not be expected to fully express every attribute it embodies. A year ago I defined it to mean any search done within the enterprise with a primary focus of internal content. “Enterprise” can be an entire organization, division, or group with a corpus of content it wants to have searched comprehensively with a single search engine.
A search engine does not need to be exclusive of all other search engines, nor must it be deployed to crawl and index every single repository in its path to be referred to as enterprise search. There are good and justifiable reasons to leave select repositories un-indexed that go beyond even security concerns, implied by the label “search behind the firewall.” I happen to believe that you can deploy enterprise search for enterprises that are quite open with their content and do not keep it behind a firewall (e.g. government agencies, or not-for-profits). You may also have enterprise search deployed with a set of content for the public you serve and for the internal audience. If the content being searched is substantively authored by the members of the organization or procured for their internal use, enterprise search engines are the appropriate class of products to consider. As you will learn from my forthcoming study, Enterprise Search Markets and Applications: Capitalizing on Emerging Demand, and that of Steve Arnold (Beyond Search) there are more than a lot of flavors out there, so you’ll need to move down the food chain of options to get it right for the application or problem you are trying to solve.
OK! Are you yet convinced that Microsoft is pitting itself squarely against Google? The Yahoo announcement of an offer to purchase for something north of $44 billion makes the previous acquisition of FAST for $1.2 billion pale. But I want to know how this squares with IBM, which has a partnership with Yahoo in the Yahoo edition of IBM’s OmniFind. This keeps the attorneys busy. Or may-be Microsoft will buy IBM, too.
Finally, this dog fight exposed in the Washington Post caught my eye, or did one of the dogs walk away with his tail between his legs? Google slams Autonomy – now, why would they do that?
I had other plans for this week’s blog but all the Patriots Super Bowl talk puts me in the mode for looking at other competitions. It is kind of fun.

Enterprise Search and Its Semantic Evolution

That the Gilbane Group launched its Enterprise Search Practice this year was timely. In 2007 enterprise search become a distinct market force, capped off with Microsoft announcing in November that it has definitively joined the market.
Since Jan. 1, 2007, I have tried to bring attention to those issues that inform buyers and users about search technology. My intent has been to make it easier for those selecting a search tool while helping them to get a highly satisfactory result with minimal surprises. Playing coach and lead champion while clarifying options within enterprise search is a role I embrace. It is fitting then, that I wrap up this year with more insights gained from Gilbane Boston; these were not previously highlighted and relate to semantic search.
The semantic Web is a concept introduced almost ten years ago reflecting a vision of how the Worldwide Web (WWW) would evolve. In the beginning we needed a specific address (URL) to get to individual Web sites. Some of these had their own search engines while others were just pages of content we scrolled through or jumped through from link to link. Internet search engines like Alta Vista and Northern Light searched limited parts of the WWW. Then, Yahoo and Google came to provide much broader coverage of all “free” content. While popular search engines provided various categorizing, taxonomy navigation, keyword and advanced searching options, you had to know the terminology that content pages contained to find what you meant to retrieve. If your terms were not explicitly in the content, pages with synonymous or related meaning were not found. The semantic Web vision was to “understand” your inquiry intent and return meaningful results through its semantic algorithms.
The most recent Gilbane Boston conference featured presentations of commercial applications of various semantic search technologies that are contributing to enterprise search solutions. A few high level points gleaned from speakers on analytic and semantic technologies follow.
> Jordan Frank on blogs and wikis in enterprises articulated how they add context by tying content to people and other information like time. Human commentary is a significant content “contextualizer,” my term, not his.
> Steve Cohen and Matt Kodama co-presented an application using technology (interpretive algorithms integrated with search) to elicit meaning from erratic and linguistically difficult (e.g. Arabic, Chinese) text in the global soup of content.
> Gary Carlson gave us understanding of how subject matter expertise contributes substantively to building terminology frameworks (aka “taxonomies”) that are particularly meaningful within a unique knowledge community.
> Mike Moran helped us see how semantically improved search results can really improve the bottom line in the business sense in both his presentation and later in his blog, a follow-up to a question I posed during the session.
> Colin Britton described the value of semantic search to harvest and correlate data from highly disparate data sources needed to do criminal background checks.
> Kate Noerr explained the use of federating technologies to integrate search results in numerous scenarios, all significant and distinct ways to create semantic order (i.e. meaning) out of search results chaos.
> Bruce Molloy energized the late sessions with his description of how non-techies can create intelligent agents to find and feed colleagues relevant information by searching in the background in ways that go far beyond the typical keyword search.
> Finally, Sean Martin and John Stone co-presented an approach to computational data gathering and integrating the results in an analyzed and insightful format that reveals knowledge about the data, not previously understood.
Points taken are that each example represents a building block of the semantic retrieval framework we will encounter on the Web and within the enterprise. The semantic Web will not magically appear as a finished interface or product but it will become richer in how and what it helps us find. Similar evolutions will happen in the enterprise with a different focus, providing smarter paths for operating within business units.
There is much more to pass along in 2008 and I plan to continue with new topics relating to contextual analysis, the value, use and building of taxonomies, and the variety of applications of enterprise search tools. As for 2007, it’s a wrap.