Tag: Semantic search (page 1 of 2)

How Far Does Semantic Software Really Go?

A discussion that began with a graduate scholar at George Washington University in November, 2010 about semantic software technologies prompted him to follow up with some questions for clarification from me. With his permission, I am sharing three questions from Evan Faber and the gist of my comments to him. At the heart of the conversation we all need to keep having is, how far does this technology go and does it really bring us any gains in retrieving information?

1. Have AI or semantic software demonstrated any capability to ask new and interesting questions about the relationships among information that they process?

In several recent presentations and the Gilbane Group study on Semantic Software Technologies, I share a simple diagram of the nominal setup for the relationship of content to search and the semantic core, namely a set of terminology rules or terminology with relationships. Semantic search operates best when it focuses on a topical domain of knowledge. The language that defines that domain may range from simple to complex, broad or narrow, deep or shallow. The language may be applied to the task of semantic search from a taxonomy (usually shallow and simple), a set of language rules (numbering thousands to millions) or from an ontology of concepts to a semantic net with millions of terms and relationships among concepts.

The question Evan asks is a good one with a simple answer, “Not without configuration.” The configuration needs human work in two regions:

  • Management of the linguistic rules or ontology
  • Design of search engine indexing and retrieval mechanisms

When a semantic search engine indexes content for natural language retrieval, it looks to the rules or semantic nets to find concepts that match those in the content. When it finds concepts in the content with no equivalent language in the semantic net, it must find a way to understand where the concepts belong in the ontological framework. This discovery process for clarification, disambiguation, contextual relevance, perspective, meaning or tone is best accompanied with an interface making it easy for a human curator or editor to update or expand the ontology. A subject matter expert is required for specialized topics. Through a process of automated indexing that both categorizes and exposes problem areas, the semantic engine becomes a search engine and a questioning engine.

The entire process is highly iterative. In a sense, the software is asking the questions: “What is this?”, “How does it relate to the things we already know about?”, “How is the language being used in this context?” and so on.

2. In other words, once they [the software] have established relationships among data, can they use that finding to proceed – without human intervention- to seek new relationships?

Yes, in the manner described for the previous question. It is important to recognize that the original set of rules, ontologies, or semantic nets that are being applied were crafted by human beings with subject matter expertise. It is unrealistic to think that any team of experts would be able to know or anticipate every use of the human language to codify it in advance for total accuracy. The term AI is, for this reason, a misnomer because the algorithms are not thinking; they are only looking up “known-knowns” and applying them. The art of the software is in recognizing when something cannot be discerned or clearly understood; then the concept (in context) is presented for the expert to “teach” the software what to do with the information.

State-of-the-art software will have a back-end process for enabling implementer/administrators to use the results of search (direct commentary from users or indirectly by analyzing search logs) to discover where language has been misunderstood as evidenced by invalid results. Over time, more passes to update linguistic definitions, grammar rules, and concept relationships will continue to refine and improve the accuracy and comprehensiveness of search results.

3. It occurs to me that the key value added of semantic technologies to decision-making is their capacity to link sources by context and meaning, which increases situational awareness and decision space. But can they probe further on their own?

Good point on the value and in a sense, yes, they can. Through extensive algorithmic operations, instructions can be embedded (and probably are for high-value situations like intelligence work), instructing the software what to do with newly discovered concepts. Instructions might then place these new discoveries into categories of relevance, importance, or associations. It would not be unreasonable to then pass documents with confounding information off to other semantic tools for further examination. Again, without human analysis along the continuum and at the end point, no certainty about the validity of the software’s decision-making can be asserted.

I can hypothesize a case in which a corpus of content contains random documents in foreign languages. From my research, I know that some of the semantic packages have semantic nets in multiple languages. If the corpus contains material in English, French, German and Arabic, these materials might be sorted and routed off to four different software applications. Each batch would be subject to further linguistic analysis, followed by indexing with some middleware applied to the returned results for normalization, and final consolidation into a unified index. Does this exist in the real world now? Probably there are variants but it would take more research to find the cases, and they may be subject to restrictions that would require the correct clearances.

Discussions with experts who have actually employed enterprise specific semantic software, underscores the need for subject expertise, and some computational linguistics training coupled with an aptitude for creative inquiry. These scientists informed me that individuals, who are highly multi-disciplinary and facile with electronic games and tools, did the best job of interacting with the software and getting excellent results. Tuning and configuration over time by the right human players is still a fundamental requirement.

Data Mining for Energy Independence

Mining content for facts and information relationships is a focal point of many semantic technologies. Among the text analytics tools are those for mining content in order to process it for further analysis and understanding, and indexing for semantic search. This will move enterprise search to a new level of research possibilities.

Research for a forthcoming Gilbane report on semantic software technologies turned up numerous applications used in the life sciences and publishing. Neither semantic technologies nor text mining are mentioned in this recent article Rare Sharing of Data Leads to Progress on Alzheimer’s in the New York Times but I am pretty certain that these technologies had some role in enabling scientists to discover new data relationships and synthesize new ideas about Alzheimer’s biomarkers. The sheer volume of data from all the referenced data sources demands computational methods to distill and analyze.

One vertical industry poised for potential growth of semantic technologies is the energy field. It is a special interest of mine because it is a topical area in which I worked as a subject indexer and searcher early in my career. Beginning with the 1st energy crisis, oil embargo of the mid-1970s, I worked in research organizations that involved both fossil fuel exploration and production, and alternative energy development.

A hallmark of technical exploratory and discovery work is the time gaps between breakthroughs; there are often significant plateaus between major developments. This happens if research reaches a point that an enabling technology is not available or commercially viable to move to the next milestone of development. I observed that the starting point in the quest for innovative energy technologies often began with decades-old research that stopped before commercialization.

Building on what we have already discovered, invented or learned is one key to success for many “new” breakthroughs. Looking at old research from a new perspective to lower costs or improve efficiency for such things as photovoltaic materials or electrochemical cells (batteries) is what excellent companies do.
How does this relate to semantic software technologies and data mining? We need to begin with content that was generated by research in the last century; much of this is just now being made electronic. Even so, most of the conversion from paper, or micro formats like fîche, is to image formats. In order to make the full transition to enable data mining, content must be further enhanced through optical character recognition (OCR). This will put it into a form that can be semantically parsed, analyzed and explored for facts and new relationships among data elements.

Processing of old materials is neither easy nor inexpensive. There are government agencies, consortia, associations, and partnerships of various types of institutions that often serve as a springboard for making legacy knowledge assets electronically available. A great first step would be having DOE and some energy industry leaders collaborating on this activity.

A future of potential man-made disasters, even when knowledge exists to prevent them, is not a foregone conclusion. Intellectually, we know that energy independence is prudent, economically and socially mandatory for all types of stability. We have decades of information and knowledge assets in energy related fields (e.g. chemistry, materials science, geology, and engineering) that semantic technologies can leverage to move us toward a future of energy independence. Finding nuggets of old information in unexpected relationships to content from previously disconnected sources is a role for semantic search that can stimulate new ideas and technical research.

A beginning is a serious program of content conversion capped off with use of semantic search tools to aid the process of discovery and development. It is high time to put our knowledge to work with state-of-the-art semantic software tools and by committing human and collaborative resources to the effort. Coupling our knowledge assets of the past with the ingenuity of the present we can achieve energy advances using semantic technologies already embraced by the life sciences.

Leveraging Language in Enterprise Search Deployments

It is not news that enterprise search has been relegated to the long list of failed technologies by some. We are at the point where many analysts and business writers have called for a moratorium on the use of the term. Having worked in a number of markets and functional areas (knowledge management/KM, special libraries, and integrated library software systems) that suffered the death knell, even while continuing to exist, I take these pronouncements as a game of sorts.

Yes, we have seen the demise of vinyl phonograph records, cassette tapes and probably soon musical CD albums, but those are explicit devices and formats. When you can’t buy or play them any longer, except in a museum or collector’s garage, they are pretty dead in the marketplace. This is not true of search in the enterprise, behind the firewall, or wherever it needs to function for business purposes. People have always needed to find “stuff” to do their work. KM methods and processes, special libraries and integrated library systems still exist, even as they were re-labeled for PR and marketing purposes.

What is happening to search in the enterprise is that it is finding its purpose, or more precisely its hundreds of purposes. It is not a monolithic software product, a one-size-fits-all. It comes in dozens of packages, models, and price ranges. It may be embedded in other software or standalone. It may be procured for a point solution to support retrieval of content for one business unit operating in a very narrow topical range, or it may be selected to give access to a broad range of documents that exist in numerous enterprise domains on many subjects.

Large enterprises typically have numerous search solutions in operation, implementation, and testing, all at the same time. They are discovering how to deploy and leverage search systems and they are refining their use cases based on what they learn incrementally through their many implementations. Teams of search experts are typically involved in selecting, deploying and maintaining these applications based on their subject expertise and growing understanding of what various search engines can do and how they operate.

After years of hearing about “the semantic Web,” the long sought after “holy grail” of Web search, there is a serious ramping of technology solutions. Most of these applications can also make search more semantically relevant behind the firewall. These technologies have been evolving for decades beginning with so-called artificial intelligence, and now supported by some categories of computational linguistics such as specific algorithms for parsing content and disambiguating terms. A soon to-be released study featuring some of noteworthy applications reveals just how much is being done in enterprises for specific business purposes.

With this “teaser” on what is about to be published, I leave you with one important thought, meaningful search technologies depend on rich linguistically-based technologies. Without a cornucopia of software tools to build terminology maps and dictionaries, analyze content linguistically in context to elicit meaning, parse and evaluate unstructured text data sources, and manage vocabularies of ever more complex topical domains, semantic search could not exist.

Language complexities are challenging and even vexing. Enterprises will be finding solutions to leverage what they know only when they put human resources into play to work with the lingo of their most valuable domains.

Search Engines – Architecture Meets Adoption

Trying to summarize a technology space as varied as that covered in two days at the Search Engines Meeting in Boston, April 26-27, is a challenge and opportunity. Avoiding the challenge of trying to represent the full spectrum, I’ll stick with the opportunity. Telling you that search is everywhere, in every technology we use and has a multitude of cousins and affiliated companion technologies is important.

The Gilbane Group focuses on content technologies. In its early history this included Web content management, document management, and CMS systems for publishers and enterprises. We now track related technologies expanding to areas including standards like DITA and XML, adoption of social tools, plus rapid growth in the drive to localize and globalize content; Gilbane has kept up with these trends.

My area, search and more specifically “enterprise search” or search “behind the firewall,” was added just over three years ago. It seemed logical to give attention to the principal reason for creating, managing and manipulating content, namely finding it. When I pay attention to search engines, I am also thinking about adjoining content technologies. My recent interest is helping readers learn about how technology on both the search side and content management/manipulation side need better context; that means relating the two.

If one theme ran consistently through all the talks at Enterprise Search Meeting, it was the need to define search in relationship to so many other content technologies. The speakers, for the most part, did a fine job of making these connections.

Here are just some snippets:

Bipin Patel CIO of ProQuest, shared the technology challenges of maintaining a 24/7 service while driving improvements to the search usability interface. The goal is to deliver command line search precision to users who do not have the expertise to (or patience) to construct elaborate queries. Balancing the tension between expert searchers (usually librarians) with everyone else who seeks content underscores the importance of human factors. My take-away: underlying algorithms and architecture are worth little if usability is neglected.

Martin Baumgartel spoke on the Theseus project for the semantic search marketplace, a European collaborative initiative. An interesting point for me is their use of SMILA (SeMantic Information Logistics Architecture) from Eclipse. By following some links on the Eclipse site I found this interesting presentation from the International Theseus Convention in 2009. The application of this framework model underscores the interdependency of many semantically related technologies to improve search.

Tamas Doszkocs of the National Library of Medicine told a well-annotated story of the decades of search and content enhancement technologies that are evolving to contribute to semantically richer search experiences. His metaphors in the evolutionary process were fun and spot-on at a very practical level: Libraries as knowledge bases > Librarians as search engines > the Web as the knowledge base > Search engines as librarians > moving toward understanding, content, context, and people to bring us semantic search. A similar presentation is posted on the Web.

David Evans noted that there is currently no rigorous evaluation methodology yet for mobile search but is it very different than what we do with desktop search. One slide that I found most interesting was the Human Language Technologies (HLT) that contribute to a richer mobile search experience, essentially numerous semantic tools. Again, this underscores that the challenges of integrating sophisticated hardware, networking and search engine architectures for mobile search are just a piece of the solution. Adoption will depend on tools that enhance content findability and usability.

Jeff Fried of Microsoft/Fast talked about “social search” and put forth this important theme: that people like to connect to content through other people. He made me recognize how social tools are teaching us that the richness of this experience is a self-reinforcing mechanism toward “the best way to search.” It has lessons for enterprises as they struggle to adopt social tools in mindful ways in tandem with improving search experiences.

Shekhar Pradhan of Docunexus shared this relevant thought about a failure of interface architecture and that is (to paraphrase): the ubiquitous search box fails because it does not demand context or mechanisms for resolving ambiguity. Obviously, this breaks down adoption for enterprise search when it is the only option offered.

Many more talks from this meeting will get rolled up in future reports and blogs.

I want to learn your experiences and observations about semantic search and semantic technologies, as well. Please note that we have posted a brief survey for a short time at: Semantic Technology Survey. If you have any involvement with semantic technologies, please take it.

Layering Technologies to Support the Enterprise with Semantic Search

Semantic search is a composite beast like many enterprise software applications. Most packages are made up of multiple technology components and often from multiple vendors. This raises some interesting thoughts as we prepare for Gilbane Boston 2009 to be held this week.

As part of a panel on semantic search, moderated by Hadley Reynolds of IDC, with Jeff Fried of Microsoft and Chris Lamb of the OpenCalais Initiative at Thomson Reuters, I wanted to give a high level view of semantic technologies currently in the marketplace. I contacted about a dozen vendors and selected six to highlight for the variety of semantic search offerings and business models.

One case study involves three vendors, each with a piece of the ultimate, customer-facing, product. My research took me to one company that I had reviewed a couple of years ago, and they sent me to their “customer” and to the customer’s customer. It took me a couple of conversations and emails to sort out the connections; in the end the relationships made perfect sense.

On one hand we have conglomerate software companies offering “solutions” to every imaginable enterprise business need. On the other, we see very unique, specialized point solutions to universal business problems with multiple dimensions and twists. Teaming by vendors, each with a solution to one dimension of a need, create compound product offerings that are adding up to a very large semantic search marketplace.

Consider an example of data gathering by a professional services firm. Let’s assume that my company has tens of thousands of documents collected in the course of research for many clients over many years. Researchers may move on to greater responsibility or other firms, leaving content unorganized except around confidential work for individual clients. We now want to exploit this corpus of content to create new products or services for various vertical markets. To understand what we have, we need to mine the content for themes and concepts.

The product of the mining exercise may have multiple uses: help us create a taxonomy of controlled terms, preparing a navigation scheme for a content portal, providing a feed to some business or text analytics tools that will help us create visual objects reflecting various configurations of content. A text mining vendor may be great at the mining aspect while other firms have better tools for analyzing, organizing and re-shaping the output.

Doing business with two or three vendors, experts in their own niches, may help us reach a conclusion about what to do with our information-rich pile of documents much faster. A multi-faceted approach can be a good way to bring a product or service to market more quickly than if we struggle with generic products from just one company.

When partners each have something of value to contribute, together they offer the benefits of the best of all options. This results in a new problem for businesses looking for the best in each area, namely, vendor relationship management. But it also saves organizations from dealing with huge firms offering many acquired products that have to be managed through a single point of contact, a generalist in everything and a specialist in nothing. Either way, you have to manage the players and how the components are going to work for you.

I really like what I see, semantic technology companies partnering with each other to give good-to-great solutions for all kinds of innovative applications. By the way, at the conference I am doing a quick snapshot on each: Cogito, Connotate (with Cormine and WorldTech), Lexalytics, Linguamatics, Sinequa and TEMIS.

March Madness in the Search Industry

In keeping with conventional wisdom, it looks like a number of entrepreneurs are using the economic downturn as opportunity time, judging from the larger than normal number of announcements in the enterprise search sector. The Microsoft acquisition of FAST, Autonomy’s foray into the document/content management market, and Google’s Search Appliance ramping its customer base are old news BUT we have a sweep of changes. Newcomers to the enterprise search marketplace and news of innovative releases of mature products really perked up in March. Here are my favorite announcements and events in chronological order and the reasons why I find them interesting:

Travis, Paul. March 2, 2009 Digital Reef Comes Out of Stealth Mode. 03/02/2009. Byteandswitch.com.

Startup offers content management platform to index unstructured data for use in e-discovery, risk mitigation, and storage optimization. Here is the first evidence that entrepreneurs see opportunity for filling a niche vacuum. In the legal market the options have been limited and pretty costly, especially for small firms. This will be an interesting one to watch. http://www.digitalreefinc.com/

Banking, Finance, and Investment Taxonomy Now Available from the the Taxonomy Experts at WAND. 03/02/2009, PR Web (press release), Ferndale,WA,USA

The taxonomy experts at WAND have made this financial taxonomy available now for integration into any enterprise search software. I have been talking with Ross Lehr, CEO at Wand, for over a year about his suite of vertical market taxonomies and how best to leverage them. I am delighted that Wand is now actively engaged with a number of enterprise search and content management firms, enabling them to better support their customers’ need for navigation. The Wand taxonomies offer a launching point from which organizations can customize and enhance the vocabulary to match their internal or customer interests. http://www.wandinc.com/main/default.aspx

Miller, Mark. Lucid Imagination » Add our Lucene Ecosystem Search Engine to Firefox. 03/02/2009

I predicted back in January that open source search and search appliances were going to spawn a whole new industry of services providers and expert integrators because there are just not enough search experts to staff in-house experts in all the companies that are adopting these two types of search products. Well, it is happening and these guys at Lucid are some of the smartest search technologists around. Here is an announcement that introduces you to a taste of what they can do. Check it out and check them out at http://www.lucidimagination.com/

To see the full article with commentary about: social search at NASA, QueSearch, MaxxCat, Aardvark on social search, Attivio, ConceptSearching, Google user-group, Simplexo, Endeca, Linguamatics, Coveo, dtSearch and ISYS.

Microsharing has benefits for NASA. 03/04/2009.

It has been about 18 months since I wrote on social search and this report reveals a program that takes the concept to a new level, integrating content management, expertise locators and search in a nifty model. To learn more about NASAsphere, read this report written by Celeste Merryman. Findings from the NASAsphere Pilot. Jet Propulsion Laboratory, California Institute of Technology Knowledge Arciteture (sic) and Technology Task [Force]. 08/20/2008. The success of the pilot project is underscored in this report recommendation: the NASAsphere pilot team recommends that NASAsphere be implemented as an “official” employee social networking and communication tool. This project is not about enterprise search per se, it just reflects how leveraging content and human expertise using social networks requires a “findability” component to have a successful outcome. Conversely, social tools play a huge role in improving findability.

March 16, 2009. QueSearch: Unlocking the Value of Structured Data with Universal Search really caught my eye with their claim to “universal search” (yes, another) for large and mid-size organizations.

This offering with a starting price of $19,500, is available immediately, with software and appliance deployment options. I tried to find out more about their founders and origins on their Web site without luck but did track down a Wikipedia article and a neat YouTube interview with the two founders, Steven Yaskin and Paul Tenberg. It explains how they are leveraging Google tools and open source to deliver solutions.

Stronger, Better, Faster — MaxxCat’s New Search Appliance Aspires to Be Google Search Appliance Killer, by Marketwire. 03/11/2009.

This statement explains why the announcement caught my attention: MaxxCat product developers cite “poor performance and intrinsic limitations of Google Mini and Google Search Appliance” as the impetus to develop the device. The enterprise search appliance, EX-5000, is over seven times faster than Google Search Appliance (GSA) and the small business search appliance, the XB-250, is 16 times faster than Google Mini. There is nothing like challenging the leading search appliance company with a statement like that to throw down the gauntlet. OK I’m watching and will be delighted to read or hear from early users.

Just one more take on “social search” as we learn about Aardvark: Answering the Tough Questions, David Hornik on VentureBlog. 03/12/2009

This week the Aardvark team is launching the fruits of that labor at South By Southwest (SXSW). They have built a “social search engine” that lives inside your IM and email. It allows you to ask questions of Aardvark, which then goes about determining who among your friends and friends of friends is most qualified to answer those questions. As the Aardvark team point out in their blog, Social Search is particularly well suited to answer subjective questions where “context” is important. I am not going to quibble now but I think I would have but this under my category of “semantic search” and natural language processing. Until we see it in action, who knows?

A new position at Attivio was announced on March 16th, Attivio Promotes John O’Neil to Chief Scientist, which tells me that they are still expanding at the end of their first official year in business.

Getting to the point, 03/18/2009, KMWorld. http://www.kmworld.com/Articles/ReadArticle.aspx?ArticleID=53070

Several announcements about Concept Searching’s release v. 4 of its flagship product, conceptClassifier for SharePoint highlight the fact that Microsoft’s acquisition of FAST has not slowed the number of enterprise search solution companies that continue to partner with or offer independent solutions for SharePoint. In this case the company offers its own standalone concept search solution applications for other content domains but is continuing to bank on lots of business from the SharePoint user community. This relationship is reflected in these statements: The company says features include a new installer that enables installation in a SharePoint environment in less than 20 minutes, requires no programmatic support and all functionality can be turned on or off using standard Microsoft SharePoint controls. Full integration with Microsoft Content Types and greater support for multiple taxonomies are also included in this release. Once the FAST search server becomes a staple for Microsoft SharePoint shops, there will undoubtedly be fallout for some of these partners.

Being invited to the Google Enterprise Search Summit in Cambridge, MA on March 19, 2009 was an opportunity for me to visit Google’s local offices and meet a bunch of customers.

They were a pretty enthusiastic crowd and are enjoying a lot of attention as this division of Google works to join the ranks of other enterprise application software companies. I suspect that it is a whole new venture for them to be entertaining customers in their offices in a “user-group like” forum but the Google speakers were energetic and clearly love the entrepreneurial aspects of being a newish run-away success within a run-away successful company. New customer announcements continue to flow from Google with SITA (The State Information Technology Agency in South Africa) acquiring GSA to drive an enterprise-wide research project. The solution will also be deployed and implemented by JSE-listed IT solutions and services company Faritec, and RR Donnelly. Several EMC users were represented at the meeting, which made me ask why they aren’t using the search tools being rolled out by the Documentum division…well, don’t ask.

Evans, Steve. Simplexo boosts public sector search options. Computer Business Review – UK. 03/18/2009.

This is interesting as an alternative to the Lucene/solr scene, UK-based open source enterprise search vendor Simplexo has launched a new search platform aimed at the public sector, which aims to enable central and local government departments to simultaneously search multiple disparate data sources across the organisation on demand. I have wondered when we would see some other open source offerings.

And all of the preceding is about just the startups (plus EMC at Google) and lesser known company activity. This was not a slow month. I don’t want all my contacts in the “established” search market to think that I am not paying attention because I am. I’ve exchanged communications with or been briefed by these known companies with news about new releases, advancing market share, or new executive teams. In no particular order these were the highlights of the month:

Endeca announced three new platforms on Mar 23, 2009: Endeca Announces the Endeca Publishing Suite, Giving Editors Unprecedented Control Over the Online Experience; Endeca Announces the Endeca Commerce Suite, Giving Retailers Continuous Targeted Merchandizing; and Endeca Unveils McKinley Release of the Information Access Platform, Allowing for Faster and Easier Deployment of Search Applications

Linguamatics Agile Text Mining Platform to Be Used by Novo Nordisk. 03/26/2009

I had a fine briefing by Coveo’s CEO Laurent Simoneau and Michel Besmer new VP of Global Marketing and see them making great strides capturing market share across numerous verticals where rapid deployment and implementation are a big selling point. They also just announced: Bell Mobility and Coveo Partner to Create Enterprise Search from Bell, an Exclusive Enterprise-Grade Mobile Search Solution.

A new Version 7.6 of a mainstay, plug-and-play search solution for SMBs since 1991, dtSearch, was just released. 3/24/2009

And finally, ISYS is having a great growth path with a new technology release, ISYS File Readers, new executives and a new project … completed in conjunction with ArnoldIT.com. Steve Arnold, industry expert and author of the Beyond Search blog, compiled more than a decade of Google patent documents. To offer a more powerful method for analyzing and mining this content, we produced the Google Patent Search Demonstration Site, powered by our ISYS: web application.

Weatherwise, March, 2009 is out like a lamb but hot, hot, hot when it comes to search.

Enterprise Search 2008 Wrap-Up

It would be presumptuous to think that I could adequately summarize a very active year of evolution among a huge inventory of search technologies. This entry is more about what I have learned and what I opine about the state-of-the-market, than an analytical study and forecast.

The weak link in the search market is product selection methods. My first thought is that we are in a state of technological riches without clear guideposts for which search models work best in any given enterprise. Those tasked to select and purchase products are not well-educated about the marketplace but are usually not given budget or latitude to purchase expert analysis when it is available. It is a sad commentary to view how organizations grant travel budgets to attend conferences where only limited information can be gathered about products but will not spend a few hundred dollars on in-depth comparative expert analyses of a large array of products.

My sources for this observation are numerous, confirmed by speakers in our Gilbane conference search track sessions in Boston and San Francisco. As they related their personal case histories for selecting products, speakers shared no tales of actually doing literature searches or in-depth research using resources with a cost associated. This underscores another observation, those procuring search do not know how to search and operate in the belief that they can find “good enough” information using only “free stuff.” Even their review of material gathered is limited to skimming rather than a systematic reading for concrete facts. This does not make for well-reasoned selections. As noted in an earlier entry, a widely published chart stating that product X is a leader does nothing to enlighten your enterprise’s search for search. In one case, product leadership is determined primarily by the total software sales for the “leader” of which search is a miniscule portion.

Don’t expect satisfaction with search products to rise until buyers develop smarter methods for selection and better criteria for making a buy decision that suits a particular business need.

Random Thoughts. It will be a very long time before we see a universally useful, generic search function embedded in Microsoft (MS) product suites as a result of the FAST acquisition. Asked earlier in the year by a major news organization whether I though MS had paid too much for FAST, I responded “no” if what they wanted was market recognition but “yes” if they thought they were getting state-of-the-art-technology. My position holds; the financial and legal mess in Norway only complicates the road to meshing search technology from FAST with Microsoft customer needs.

I’ve wondered what has happened to the OmniFind suite of search offerings from IBM. One source tells me it makes IBM money because none of the various search products in the line-up are standalone, nor do they provide an easy transition path from one level of product to another for upward scaling and enhancements. IBM can embed any search product with any bundled platform of other options and charge for lots of services to bring it on-line with heavy customization.

Three platform vendors seem to be penetrating the market slowly but steadily by offering more cohesive solutions to retrieval. Native search solutions are bundled with complete content capture, publishing and search suites, purposed for various vertical and horizontal applications. These are Oracle, EMC, and OpenText. None of these are out-of-the-box offerings and their approach tends to appeal to larger organizations with staff for administration. At least they recognize the scope and scale of enterprise content and search demands, and customer needs.

On User Presentations at the Boston Gilbane Conference, I was very pleased with all sessions, the work and thought the speakers put into their talks. There were some noteworthy comments in those on Semantic Search and Text Technologies, Open Source and Search Appliances.

On the topic of semantic (contextual query and retrieval) search, text mining and analytics, the speakers covered the range of complexities in text retrieval, leaving the audience with a better understanding of how diverse this domain has become. Different software application solutions need to be employed based on point business problems to be solved. This will not change, and enterprises will need to discriminate about which aspects of their businesses need some form of semantically enabled retrieval and then match expectations to offerings. Large organizations will procure a number of solutions, all worthy and useful. Jeff Catlin of Lexalytics gave a clear set of definitions within this discipline, industry analyst Curt Monash provoked us with where to set expectations for various applications, and Win Carus of Information Extraction Systems illustrated the tasks extraction tools can perform to find meaning in a heap of content. The story has yet to be written on how semantic search is and will impact our use of information within organizations.

Leslie Owens of Forrester and Sid Probstein of Attivio helped to ground the discussion of when and why open source software is appropriate. The major take-way for me was an understanding of the type of organization that benefits most as a contributor and user of open source software. Simply put, you need to be heavily vested and engaged on the technical side to get out of open source what you need, to mold it to your purpose. If you do not have the developers to tackle coding, or the desire to share in a community of development, your enterprise’s expectations will not be met and disappointment is sure to follow.

Finally, several lively discussions about search appliance adoption and application (Google Search Appliance and Thunderstone) strengthen my case for doing homework and making expenditures on careful evaluations before jumping into procurement. While all the speakers seem to be making positive headway with their selected solutions, the path to success has involved more diversions and changes of course than necessary for some because the vetting and selecting process was too “quick and dirty” or dependent on too few information sources. This was revealed: true plug and play is an appliance myth.

What will 2009 bring? I’m looking forward to seeing more applications of products that interest me from companies that have impressed me with thoughtful and realistic approaches to their customers and target audiences. Here is an uncommon clustering of search products.

Multi-repository search across database applications, content collaboration stores document management systems and file shares: Coveo, Autonomy, Dieselpoint, dtSearch, Endeca, Exalead, Funnelback, Intellisearch, ISYS, Oracle, Polyspot, Recommind, Thunderstone, Vivisimo, and X1. In this list is something for every type of enterprise and budget.

Business and analytics focused software with intelligence gathering search: Attensity, Attivio, Basis Technology, ChartSearch, Lexalytics, SAS, and Temis.

Comprehensive solutions for capture, storage, metadata management and search for high quality management of content for targeted audiences: Access Innovations, Cuadra Associates, Inmagic, InQuira, Knova, Nstein, OpenText, ZyLAB.

Search engines with advanced semantic processing or natural language processing for high quality, contextually relevant retrieval when quantity of content makes human metadata indexing prohibitive: Cognition Technologies, Connotate, Expert System, Linguamatics, Semantra, and Sinequa

Content Classifier, thesaurus management, metadata server products have interplay with other search engines and a few have impressed me with their vision and thoughtful approach to the technologies: MarkLogic, MultiTes, Nstein, Schemalogic, Seaglex, and Siderean.

Search with a principal focus on SharePoint repositories: BA-Insight, Interse, Kroll Ontrack, and SurfRay.

Finally, some unique search applications are making serious inroads. These include Documill for visual and image, Eyealike for image and people, Krugle for source code, and Paglo for IT infrastructure search.

This is the list of companies that interest me because I think they are on track to provide good value and technology, many still small but with promise. As always, the proof will be in how they grow and how well they treat their customers.

That’s it for a wrap on Year 2 of the Enterprise Search Practice at the Gilbane Group. Check out our search studies at http://gilbane.com/Research-Reports.html and PLEASE let me hear your thoughts on my thoughts or any other search related topic via the contact information at http://gilbane.com/

What is Semantic Technology Anyway?

Meaning is a very large concept in every aspect of search technology and dozens of search product sites include either the words “semantic” or “meaning” as a key element of the offered technology. This is not as far fetched as search product claims to “know” what the searcher wants to find, as if “knowing” can be attributable to non-human operations. However, how well a search engine indexes and retrieves content to meet a searcher’s intent, is truly in the eyes of the beholder. I can usually understand why, technically speaking, a piece of content turns up in a search result, but that does not mean that it was a valid scrap for my intent. My intent for a search cannot possibly be discernible by a search engine if, as is most often the case, I don’t explicitly and eloquently express what, why, and other contextual facts when entering a query.

The session we have set aside at Gilbane San Francisco for a discussion on current activity related to semantic technologies will undoubtedly reveal more meaning about technologies and art of leveraging tools to elicit semantically relevant content. I suspect that someone will also stipulate that what works requires a defined need and clear intent during the implementation process – but what about all those fuzzy situations? I hope to find out.

This is the last posting before the conference this week so I hope you will add this enterprise search session (EST-6: Semantic Technology – Breakdown or Breakthrough) being moderated by Colin Britton to your agenda on June 19th. He will be joined by speakers: Steve Carton, VP Content Technologies, Retrieval Systems Corp., Folksonomies: Just Good Enough for all Kinds of Thing, Prakesh Govindarajulu, President, RealTech Inc, Building Enterprise Taxonomies from the Ground Up, and Jack Jia, Founder & CEO, Baynote.

See you in San Francisco in person or virtually thereafter.

Ontologies and Semantic Search

Recent studies describe the negative effect of media including video, television and on-line content on attention spans and even comprehension. One such study suggests that the piling on of content accrued from multiple sources throughout our work and leisure hours has saturated us to the point of making us information filterers more than information “comprehenders”. Hold that thought while I present a second one.

Last week’s blog entry reflected on intellectual property (IP) and knowledge assets and the value of taxonomies as aids to organizing and finding these valued resources. The idea of making search engines better or more precise in finding relevant content is edging into our enterprises through semantic technologies. These are search tools that are better at finding concepts, synonymous terms, and similar or related topics when we execute a search. You’ll find an in depth discussion of some of these in the forthcoming publication, Beyond Search by Steve Arnold. However, semantic search requires more sophisticated concept maps than taxonomy. It requires ontology, rich representations of a web of concepts complete with all types of term relationships.
My first comment about a trend toward just browsing and filtering content for relevance to our work, and the second one about the idea of assembling semantically relevant content for better search precision are two sides of a business problem that hundreds of entrepreneurs are grappling with, semantic technologies.

Two weeks ago, I helped to moderate a meeting on the subject, entitled Semantic Web – Ripe for Commercialization? While the assumed audience was to be a broad business group of VCs, financiers, legal and business management professionals, it turned out to have a lot of technology types. They had some pretty heavy questions and comments about how search engines handle inference and its methods for extracting meaning from content. Semantic search engines need to understand both the query and the target content to retrieve contextually relevant content.

Keynote speakers and some of the panelists introduced the concept of ontologies as being an essential backbone to semantic search. From that came a lot of discussion about how and where these ontologies originate, how and who vets them for authoritativeness, and how their development in under-funded subject areas will occur. There were no clear answers.

Here I want to give a quick definition for ontology. It is a concept map of terminology which, when richly populated, reflects all the possible semantic relationships that might be inferred from different ways that terms are assembled in human language. A subject specific ontology is more easily understood in a graphical representation. Ontologies also help to inform semantic search engines by contributing to an automated deconstruction of a query (making sense out of what the searcher wants to know) and automated deconstruction of the content to be indexed and searched. Good semantic search, therefore, depends on excellent ontologies.

To see a very simple example of an ontology related to “roadway”, check out this image. Keep in mind that before you aspire to implementing a semantic search engine in your enterprise, you want to be sure that there is a trusted ontology somewhere in the mix of tools to help the search engine retrieve results relevant to your unique audience.

Older posts

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑