Archives for

Why is it so Hard to “Get” Semantics Inside the Enterprise?

Semantic Software Technologies: Landscape of High Value Applications for the Enterprise was published just over a year ago. Since then the marketplace has been increasingly active; new products emerge and discussion about what semantics might mean for the enterprise is constant. One thing that continues to strike me is the difficulty of explaining the meaning of, applications for, and context of semantic technologies.

Browsing through the topics in this excellent blog site, http://semanticweb.com , it struck me as the proverbial case of the blind men describing an elephant. A blog, any blog, is linear. While there are tools to give a blog dimension by clustering topics or presenting related information, it is difficult to understand the full relationships of any one blog post to another. Without a photographic memory, an individual does not easily connect ideas across a multi-year domain of blog entries. Semantic technologies can facilitate that process.

Those who embrace some concept of semantics are believers that search will benefit from “semantic technologies.” What is less clear is how evangelists, developers, searchers and the average technology user can coalesce around the applications that will semantically enable enterprise search.

On the Internet content that successfully drives interest, sales, opinion and individual promotion does so through a combination of expert crafting of metadata, search engine technology that “understands” the language of the inquirer and the content that can satisfy the inquiry. Good answers are reached when questions are understood first and then the right content is selected to meet expectations.

In the enterprise, the same care must be given to metadata, search engine “meaning” analysis tools and query interpretation for successful outcomes. Magic does not happen without people behind the scenes to meet these three criteria executing linguistic curation, content enhancement and computational linguistic programming.

Three recent meeting events illustrate various states of semantic development and adoption, even as the next conference, Semantic Tech & Business Conference – Washington, D.C. on November 29 – is upon us:

Event 1 – A relatively new group, the IKS-Community funded by the EU has been supporting open source software developers since 2009. In July they held a workshop in Paris just past the mid-point of their life cycle. Attendees were primarily entrepreneurs and independent open source developers seeking pathways for their semantically “tuned” content management solutions. I was asked to suggest where opportunities and needs exist in US markets. They were an enthusiastic audience and are poised to meet the tough market realities of packaging highly sophisticated software for audiences that will rarely understand how complex the stuff “under the hood” really is. My principal charge to them was to create tools that “make it really easy” to work with vocabulary management and content metadata capture, updates, and enhancements.

Event 2. – On this side of the pond, UK firm Linguamatics hosted its user group meeting in Boston in October. Having interviewed a number of their customers last year to better understand their I2E product line, I was happy to meet people I had spoken with and see the enthusiasm of a user community vested in such complex technology. Most impressive is the respectful tone and thoughtful sharing between Linguamatics principals and their customers. They share the knowledge of how hard it is to continually improve search technology that delivers answers to semantically complex questions using highly specialized language. Content contributors and inquirers are all highly educated specialists seeking answers to questions that have never been asked before. Think about it, search engines designed to deliver results for frequently asked questions or to find content on popular topics is hard enough, but finding the answer to a brand new question is a quantum leap of difficulty in comparison.

To make matters even more complicated, answers to semantic (natural language) questions may be found in internal content, in published licensed content or some combination of both. In the latter case, only the seeker may be able to put the two together to derive or infer an answer.

Publishers of content for licensing play a convoluted game of how they will license their content to enterprises for semantic indexing in combination with internal content. The Linguamatics user community is primarily in life sciences; this is one more hurdle for them to overcome to effectively leverage the vast published repositories of biological and medical literature. Rigorous pricing may be good business strategy, but research using semantic search could make more headway with more reasonable royalties that reflect the need for collaborative use across teams and partners.

Content wants to be found and knowledge requires outlets to enable innovation to flourish. In too many cases technology is impaired by lack of business resources by buyers or arcane pricing models of sellers that hold vital information captive for a well-funded few. Semantically excellent retrieval depends on an engine’s indexing access to all contextually relevant content.

Event 3. – Leslie Owens of Forrester Research, at the Fall 2011 Enterprise Search Summit conducted a very interesting interactive session that further affirms the elephant and blind men metaphor. Leslie is a champion of metadata best practices and writes about the competencies and expertise needed to make valuable content accessible. She engaged the audience with a series of questions about its wants, needs, beliefs and plans for semantic technologies. As described in an earlier paragraph about how well semantics serves us on the Web, most of the audience puts its faith in that model but is doubtful of how or when similar benefits will accrue to enterprise search. Leslie and a couple of others made the point that a lot more work has to be done on the back-end on content in the enterprise to get these high-value outcomes.

We’ll keep making the point until more adopters of semantic technologies get serious and pay attention to content, content enhancement, expert vocabulary management and metadata. If it is automatic understanding of your content that you are seeking, the vocabulary you need is one that you build out and enhance for your enterprise’s relevance. Semantic tools need to know the special language you use to give the answers you need.

Read More

Enterprise Search 2008 Wrap-Up

It would be presumptuous to think that I could adequately summarize a very active year of evolution among a huge inventory of search technologies. This entry is more about what I have learned and what I opine about the state-of-the-market, than an analytical study and forecast.

The weak link in the search market is product selection methods. My first thought is that we are in a state of technological riches without clear guideposts for which search models work best in any given enterprise. Those tasked to select and purchase products are not well-educated about the marketplace but are usually not given budget or latitude to purchase expert analysis when it is available. It is a sad commentary to view how organizations grant travel budgets to attend conferences where only limited information can be gathered about products but will not spend a few hundred dollars on in-depth comparative expert analyses of a large array of products.

My sources for this observation are numerous, confirmed by speakers in our Gilbane conference search track sessions in Boston and San Francisco. As they related their personal case histories for selecting products, speakers shared no tales of actually doing literature searches or in-depth research using resources with a cost associated. This underscores another observation, those procuring search do not know how to search and operate in the belief that they can find “good enough” information using only “free stuff.” Even their review of material gathered is limited to skimming rather than a systematic reading for concrete facts. This does not make for well-reasoned selections. As noted in an earlier entry, a widely published chart stating that product X is a leader does nothing to enlighten your enterprise’s search for search. In one case, product leadership is determined primarily by the total software sales for the “leader” of which search is a miniscule portion.

Don’t expect satisfaction with search products to rise until buyers develop smarter methods for selection and better criteria for making a buy decision that suits a particular business need.

Random Thoughts. It will be a very long time before we see a universally useful, generic search function embedded in Microsoft (MS) product suites as a result of the FAST acquisition. Asked earlier in the year by a major news organization whether I though MS had paid too much for FAST, I responded “no” if what they wanted was market recognition but “yes” if they thought they were getting state-of-the-art-technology. My position holds; the financial and legal mess in Norway only complicates the road to meshing search technology from FAST with Microsoft customer needs.

I’ve wondered what has happened to the OmniFind suite of search offerings from IBM. One source tells me it makes IBM money because none of the various search products in the line-up are standalone, nor do they provide an easy transition path from one level of product to another for upward scaling and enhancements. IBM can embed any search product with any bundled platform of other options and charge for lots of services to bring it on-line with heavy customization.

Three platform vendors seem to be penetrating the market slowly but steadily by offering more cohesive solutions to retrieval. Native search solutions are bundled with complete content capture, publishing and search suites, purposed for various vertical and horizontal applications. These are Oracle, EMC, and OpenText. None of these are out-of-the-box offerings and their approach tends to appeal to larger organizations with staff for administration. At least they recognize the scope and scale of enterprise content and search demands, and customer needs.

On User Presentations at the Boston Gilbane Conference, I was very pleased with all sessions, the work and thought the speakers put into their talks. There were some noteworthy comments in those on Semantic Search and Text Technologies, Open Source and Search Appliances.

On the topic of semantic (contextual query and retrieval) search, text mining and analytics, the speakers covered the range of complexities in text retrieval, leaving the audience with a better understanding of how diverse this domain has become. Different software application solutions need to be employed based on point business problems to be solved. This will not change, and enterprises will need to discriminate about which aspects of their businesses need some form of semantically enabled retrieval and then match expectations to offerings. Large organizations will procure a number of solutions, all worthy and useful. Jeff Catlin of Lexalytics gave a clear set of definitions within this discipline, industry analyst Curt Monash provoked us with where to set expectations for various applications, and Win Carus of Information Extraction Systems illustrated the tasks extraction tools can perform to find meaning in a heap of content. The story has yet to be written on how semantic search is and will impact our use of information within organizations.

Leslie Owens of Forrester and Sid Probstein of Attivio helped to ground the discussion of when and why open source software is appropriate. The major take-way for me was an understanding of the type of organization that benefits most as a contributor and user of open source software. Simply put, you need to be heavily vested and engaged on the technical side to get out of open source what you need, to mold it to your purpose. If you do not have the developers to tackle coding, or the desire to share in a community of development, your enterprise’s expectations will not be met and disappointment is sure to follow.

Finally, several lively discussions about search appliance adoption and application (Google Search Appliance and Thunderstone) strengthen my case for doing homework and making expenditures on careful evaluations before jumping into procurement. While all the speakers seem to be making positive headway with their selected solutions, the path to success has involved more diversions and changes of course than necessary for some because the vetting and selecting process was too “quick and dirty” or dependent on too few information sources. This was revealed: true plug and play is an appliance myth.

What will 2009 bring? I’m looking forward to seeing more applications of products that interest me from companies that have impressed me with thoughtful and realistic approaches to their customers and target audiences. Here is an uncommon clustering of search products.

Multi-repository search across database applications, content collaboration stores document management systems and file shares: Coveo, Autonomy, Dieselpoint, dtSearch, Endeca, Exalead, Funnelback, Intellisearch, ISYS, Oracle, Polyspot, Recommind, Thunderstone, Vivisimo, and X1. In this list is something for every type of enterprise and budget.

Business and analytics focused software with intelligence gathering search: Attensity, Attivio, Basis Technology, ChartSearch, Lexalytics, SAS, and Temis.

Comprehensive solutions for capture, storage, metadata management and search for high quality management of content for targeted audiences: Access Innovations, Cuadra Associates, Inmagic, InQuira, Knova, Nstein, OpenText, ZyLAB.

Search engines with advanced semantic processing or natural language processing for high quality, contextually relevant retrieval when quantity of content makes human metadata indexing prohibitive: Cognition Technologies, Connotate, Expert System, Linguamatics, Semantra, and Sinequa

Content Classifier, thesaurus management, metadata server products have interplay with other search engines and a few have impressed me with their vision and thoughtful approach to the technologies: MarkLogic, MultiTes, Nstein, Schemalogic, Seaglex, and Siderean.

Search with a principal focus on SharePoint repositories: BA-Insight, Interse, Kroll Ontrack, and SurfRay.

Finally, some unique search applications are making serious inroads. These include Documill for visual and image, Eyealike for image and people, Krugle for source code, and Paglo for IT infrastructure search.

This is the list of companies that interest me because I think they are on track to provide good value and technology, many still small but with promise. As always, the proof will be in how they grow and how well they treat their customers.

That’s it for a wrap on Year 2 of the Enterprise Search Practice at the Gilbane Group. Check out our search studies at http://gilbane.com/Research-Reports.html and PLEASE let me hear your thoughts on my thoughts or any other search related topic via the contact information at http://gilbane.com/

Read More