Tag: Enterprise search applications

Right Fitting Enterprise Search: Content Must Fit Like a Glove

This story brought me up short: Future of Data: Encoded in DNA by Robert Lee Hotz in the Wall Street Journal, Aug. 16, 2012. It describes how “…researchers encoded an entire book into the genetic molecules of DNA, the basic building block of life, and then accurately read back the text.” The article then went on to quote Harvard University’s project senior researcher, molecular geneticist, George Church as saying, “A device the size of your thumb could store as much information as the whole Internet. While this concept intrigues and excites me for the innovation and creative thinking, it stimulates another thought, as well. Stop the madness of content overload first – force it to be managed responsibly.

While I have been sidelined from blogging for a couple of months, industry pundits have been contributing their comments, reflections and guidance on three major topics. Big Data tops the list, with analytics a close second, rounded out by contextual relevance as an ever present content findability issue. In November at Gilbane Boston the program features a study conducted by Findwise, Enterprise Search and Findability Survey,2012, which you can now download. It underscores a disconnect between what enterprise searchers want and how search is implemented (or not), within their organizations. As I work to assemble content, remarks and readings for an upcoming graduate course on “Organizing and Accessing Information and Knowledge,” I keep reminding myself what knowledge managers need to know about content to make it accessible.

So, how would experts for our three dominant topics solve the problems illustrated in the Findwise survey report?

For starters, organizations must be more brutal with content housekeeping, or more specifically housecleaning. As we debate whether our country is as great at innovation as in generations past, consider big data as a big barrier. Human beings, even brilliant ones, can only cope with so much information in their waking working hours. I posit that we have lost the concept of primary source content, in other words content that is original, new or innovative. It is nearly impossible to hone in on information that has never been articulated in print or electronically disseminated before, excluding all the stuff we have seen, over and over again. Our concept of terrific search is to be able to traverse and aggregate everything “out there” with no regard for what is truly conceptually new. How much of that “big data” is really new and valuable? I am hoping that other speakers at Gilbane Boston 2012 can suggest methods for crunching through the “big” to focus search on the best, most relevant and singular primary source information.

Second, others have commented, and I second the idea, that analytic tools can contribute significantly to cleansing search domains of unwanted and unnecessary detritus. Search tools that auto-categorize and cross-categorize content, whether the domain is large or small should be employed during any launch of a new search engine to organize content for quick visual examination, showing you where metadata is wrong, mis-characterized, or poorly tagged. Think of a situation where templates are commonly used for enterprise reports and the name of the person who created the template becomes the “author” of every report. Spotting this type of problem and taking steps to remediate and cleanse metadata, before deploying the search system is a fundamental practice that will contribute to better search outcomes. With thoughtful management, this type of exercise will also lead to corrective actions on the content governance side by pointing to how metadata must be handled. Analytics functions that leverage search to support cleaning up data stores are among the most practical tools now packaged with newer search products.

Finally, is the issue of vocabulary management and assigning terminology that is both accurate and relevant for a specific community that needs to find content quickly and without multiple versions, or without content that is just a re-hash of earlier findings published by the originator. Original publication dates, source information and proper author attribution are key elements of metadata that must be in place for any content that is targeted for crawling and indexing. When metadata is complete and accurate, a searcher can expect the best and most relevant content to rise to the top of a results page.

I hope others in a position to do serious research (perhaps a PhD dissertation) will take up my challenge to codify how much of “big data” is really worthy of being found – again, again, and again. In the meantime, use the tools you have in the search and content management technologies to get brutal. Weed the unwanted and unnecessary content so that you can get down to the essence of what is primary, what is good, and what is needed.

Search Engines; They’ve Been Around Longer Than You Think

It dates me, as well as search technology, to acknowledge that an article in Information Week by Ken North containing Medlars and Twitter in the title would be meaningful. Discussing search requires context, especially when trying to convince IT folks that special expertise is required to do search really well in the enterprise, and it is not something acquired in computer science courses.

Evolution of search systems from the print indexes of the early 1900s such as Index Medicus (National Library of Medicine’s index to medical literature) and Chemical Abstracts to the advent of the online Medical Literature Analysis and Retrieval System (Medlars) in the 1960s was slow. However, the phases of search technology evolution since the launch of Medlars has hardly been warp speed. This article is highly recommended because it gives historical context to automated search while defining application and technology changes over the past 50 years. The comparison between Medlars and Twitter, as search platforms is fascinating, something that would never have occurred to me to explore.

A key point of the article is the difference between a system of search designed for archival content with deeply hierarchical categorization for a specialized corpus versus a system of highly transient, terse and topically generalized content. Last month I commented on the need to have search present in your normal work applications and this article underscores an enormous range of purpose for search. Information of a short temporal nature and scholarly research each have a place in the enterprise but it would be a stretch to think of searching for both types via a single search interface. Wanting to know what a colleague is observing or learning at a conference is very different than researching the effects of a uranium exposure on the human anatomy.

What have not changed much in the world of applied search technology are the reasons we need to find information and how it becomes accessible. The type of search done in Twitter or on LinkedIn today is for information that we used to pick up from a colleague (in person or on the phone) or in industry daily or weekly news publications. That’s how we found the name of an expert, learned the latest technologies being rolled out at a conference or got breaking news on a new space material being tested. What has changed is the method of retrieval but not by a lot, and the relative efficiency may not be that great. Today, we depend on a lot of pre-processing of information by our friends and professional colleagues to park information where we can pick it up on the spur of the moment – easy for us but someone still spends the time to put it out there where we can grab it.

On the other end of the spectrum is that rich research content that still needs to be codified and revealed to search engines with appropriate terminology so we can pursue in-depth searching to get precisely relevant and comprehensive results. Technology tools are much better at assisting us with content enhancement to get us the right and complete results, but humans still write the rules of indexing and curate the vocabularies needed for classification.

Fifty years is a long time and we are still trying to improve enterprise search. It only takes more human work to make it work better.

Embedded Search in the Enterprise

We need to make a distinction between “search in the enterprise” and “enterprise-wide search.” The former is any search that exists persistently in view as we go about our primary work activities. The latter commonly assumes aggregation of all enterprise content via a single platform OR enterprise content to which everyone in the organization will have access. So many attempts at enterprise-wide search are reported to be compromised or frustrated before achieving successful outcomes that it is time to pay attention to point-of-need solutions. This is search that will smoothly satisfy routine retrieval requirements as we work.

Most of us work in a small number of applications all day. A writer will be wedded to a content creation application plus research sources both on the web and internal to the enterprise in which writing is being done. Finding information to support writing whether it is a press release, marketing brochure or technical documentation to accompany a technical product requires access to appropriate content for the writer to deliver to an audience. The audience may be a business analyst, customer’s buyer or product user with advanced technical expertise. During any one work assignment, the writer will usually be focused on one audience and will only need a limited view of content specific to that task.

When a search takes us on a merry chase through multiple resource repositories or in a single repository with heaps of irrelevant content and no good results, we are being forced into a mental traffic nightmare, not of our own making. As this blog post by Tony Schwartz reminds us, we need time to focus and concentrate. It enables us to work smarter and more calmly; for employers seeking to support workers with the best tools, search that works well at the point of doing an assignment is the ultimate perk. I know how frantic and fractionated my mental state becomes as I follow one fruitless web of links after another that I believe will lead me to the piece of information I need. Truthfully, I often become so absorbed in the search and ancillary information I “discover” along the way that sight of the target becomes secondary.

New wisdom from a host of analysts and writers suggests that embedded search is more than a trend, as is search with a specific focus or purposeful business goal. The fact that FAST is now embedded with and for SharePoint and its use is growing principally in that arena illustrates the trend. But readers should also consider a large array of newer search solutions that are strong on semantic features, APIs, integration options, and connectors to a huge variety of content that exists in other application repositories. This article by James Martin in CIO, How to Evaluate Enterprise Search has helpful comments from Leslie Owens of Forrester Research and the rise of connectors is highlighted by Alan Pelz-Sharpe in this post.

Right now two rather new search engines are on my radar screen because of their timely entrance to the marketplace. One is Q-Sensei, which has just released their version 2.0. It is an ontology-based solution very much focused on efficiently processing big data, quick deployment, and integration with content applications. The second is Cambridge Semantics with its Anzo semantic solutions for analyzing and retrieving business data. Finally, I am very excited that ISYS was the object of an acquisition by Lexmark. It was an unexpected move but they deserved to be recognized for having solid connector/filter technology and a large, satisfied customer base. It will be interesting to see how a hardware vendor, noted for print technology, will integrate ISYS search software into its product offerings. Information retrieval belongs where work is being done.

These are just three vendors poised to change the expectations of searchers by fulfilling search needs, embedded or integrated efficiently in select business application areas. Martin White’s most recent enumeration of search vendors puts the list at about 70; they are primarily vendors with standalone search products, products that support standalone search or search engines that complement other content applications. You will see many viable options there that are unfamiliar but be sure to dig down to understand where each might fill a unique need in your enterprise.

When seeking solutions for search problems you need to really understand the purpose before seeking candidate vendors. Then focus on products that have the same clarity of applicability you want. They may be embedded with a product such as Lexmark’s, or a CAD system. The first step is to decide where and for whom you need search to be present.

Lucene Open Source Community Commits to a Future in Search

It has been nearly two years since I commented on an article in Information Week, Open Source, Its Time has Come, Nov. 2008. My main point was the need for deep expertise to execute enterprise search really well. I predicted the growth of service companies with that expertise, particularly for open source search. Not long after I announced that, Lucid Imagination was launched, with its focus on building and supporting solutions based on Lucene and, its more turnkey version, Solr.

It has not taken long for Lucid Imagination (LI) to take charge of the Lucene/Solr community of practice (CoP), and to launch its own platform built on Solr, Lucidworks Enterprise. Open source depends on deep and sustained collaboration; LI stepped into the breach to ensure that the hundreds of contributors, users and committers have a forum. I am pretty committed to CoPs myself and know that nurturing a community for the long haul takes dedicated leadership. In this case it is undoubtedly enlightened self-interest that is driving LI. They are poised to become the strongest presence for driving continuous improvements to open source search, with Apache Lucene as the foundation.

Two weeks ago LI hosted Lucene Revolution, the first such conference in the US. It was attended by over 300 in Boston, October 7-8 and I can report that this CoP is vibrant, enthusiastic. Moderated by Steve Arnold, the program ran smoothly and with excellent sessions. Those I attended reflected a respectful exchange of opinions and ideas about tools, methods, practices and priorities. While there were allusions to vigorous debate among committers about priorities for code changes and upgrades, the mood was collaborative in spirit and tinged with humor, always a good way to operate when emotions and convictions are on stage.

From my 12 pages of notes come observations about the three principal categories of sessions:

  1. Discussions, debates and show-cases for significant changes or calls for changes to the code
  2. Case studies based on enterprise search applications and experiences
  3. Case studies based on the use of Lucene and Solr embedded in commercial applications

Since the first category was more technical in nature, I leave the reader with my simplistic conclusions: core Apache Lucene and Solr will continue to evolve in a robust and aggressive progression. There are sufficient committers to make a serious contribution. Many who have decades of search experience are driving the charge and they have cut their teeth on the more difficult problems of implementing enterprise solutions. In announcing Lucidworks Enterprise, LI is clearly bidding to become a new force in the enterprise search market.

New and sustained build-outs of Lucene/Solr will be challenged by developers with ideas for diverging architectures, or “forking” code, on which Eric Gries, LI CEO, commented in the final panel. He predicted that forking will probably be driven by the need to solve specific search problems that current code does not accommodate. This will probably be more of a challenge for the spinoffs than the core Lucene developers, and the difficulty of sustaining separate versions will ultimately fail.

Enterprise search cases reflected those for whom commercial turnkey applications will not or cannot easily be selected; for them open source will make sense. Coming from LI’s counterpart in the Linux world, Red Hat, are these earlier observations about why enterprises should seek to embrace open source solutions, in short the sorry state of quality assurance and code control in commercial products. Add to that the cost of services to install, implement and customize commercial search products. The argument would be to go with open source for many institutions when there is an imperative or call for major customization.

This appears to be the case for two types of enterprises that were featured on the program: educational institutions and government agencies. Both have procurement issues when it comes to making large capital expenditures. For them it is easier to begin with something free, like open source software, then make incremental improvements and customize over time. Labor and services are cost variables that can be distributed more creatively using multiple funding options. Featured on the program were the Smithsonian, Adhere Solutions doing systems integration work for a number of government agencies, MITRE (a federally funded research laboratory), U. of Michigan, and Yale. CISCO also presented, a noteworthy commercial enterprise putting Lucene/Solr to work.

The third category of presenters was, by far, the largest contingent of open source search adopters, producers of applications that leverage Lucene and Solr (and other open source software) into their offerings. They are solidly entrenched because they are diligent committers, and share in this community of like-minded practitioners who serve as an extended enterprise of technical resources that keeps their overhead low. I can imagine the attractiveness of a lean business that can run with an open source foundation, and operates in a highly agile mode. This must be enticing and exciting for developers who wilt at the idea of working in a constrained environment with layers of management and political maneuvering.

Among the companies building applications on Lucene that presented were: Access Innovations, Twitter, LinkedIn, Acquia, RivetLogic and Salesforce.com. These stand out as relatively mature adopters with traction in the marketplace. There were also companies present that contribute their value through Lucene/Solr partnerships in which their products or tools are complementary including: Basis Technology, Documill, and Loggly.

Links to presentations by organizations mentioned above will take you to conference highlights. Some will appeal to the technical reader for there was a lot of code sharing and technical tips in the slides. The diversity and scale of applications that are being supported by Lucene and Solr was impressive. Lucid Imagination and the speakers did a great job of illustrating why and how open source has a serious future in enterprise search. This was a confidence building exercise for the community.

Two sentiments at the end summed it up for me. On the technical front Eric Gries observed that it is usually clear what needs to be core (to the code) and what does not belong. Then there is a lot of gray area, and that will contribute to constant debate in the community. For the user community, Charlie Hull, of flax opined that customers don’t care whether (the code) is in the open source core or in the special “secret sauce” application, as long as the product does what they want.

Search is Not Taking a Summer Break & Call for Papers

Amidst post Gilbane San Francisco business I have been reading what everyone else has been writing about search the past couple of months. While there continues to be much speculation and gossip about the Microsoft acquisition of FAST, and which companies may soon be absorbed into larger entities, there also continues to be interesting activity among the mid-tier and start-up search vendors. Meanwhile, I advise those who aspire to acquire a search solution for “behind the firewall,” don’t wait for the “big players” to come up with the definitive solution to all your search needs because it will never happen. I’m in good company with other analysts who advise moving on with point search solutions for specific business needs. You will save money, and time because many of the new products are optimized for rapid deployment, in weeks or months, not years.
If you check out my new research report, Enterprise Search Markets and Applications; Capitalizing on Emerging Demand, June, 2008, you will find a directory to companies offering search solutions with choices for what Steve Arnold refers to as “beyond search.” Deep test drives of many of these products can be found in his report, as well. Meanwhile, new releases of products listed, and new products both continue to be announced. ISYS, Coveo and Expert System (Cogito) have brought new offerings to market in the last month and Collexis, a relative newcomer, is drawing attention to itself by demonstrating its products at numerous meetings this year.
So, keeping reading and checking out the possibilities. While you are at it, be sure to put the Gilbane Boston Conference on your calendar for December 3 – 4. We are all busy rounding out the program right now.
I am particularly interested in hearing from those of you who have participated in the selection of a search product in the past two years, implementing or deploying a system anywhere within your own enterprise. Please consider sending me a brief proposal for a presentation at the conference. For your effort, you will get to attend all the conference sessions, as well as help the audience with the needed reality checks on what it takes to conduct a selection process and follow through with implementation. I particularly want you to share your learning experiences: the good, the frustrating, and the lessons you have accrued. Professional speaking experience is not required – we want stories. [You’ll find my email on the “Contacts” page of the Gilbane site and you should also look at the speakers guidelines for additional information.]

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑