W3C launched the new Linked Data Platform (LDP) Working Group to promote the use of linked data on the Web. Per its charter, the group will explain how to use a core set of services and technologies to build powerful applications capable of integrating public data, secured enterprise data, and personal data. The platform will be based on proven Web technologies including HTTP for transport, and RDF and other Semantic Web standards for data integration and reuse. The group will produce supporting materials, such as a description of uses cases, a list of requirements, and a test suite and/or validation tools to help ensure interoperability and correct implementation.
A rarity these days – an announcement that used ‘data’ instead of ‘big data’! And the co-chairs are even from IBM and EMC.
Over the past three months I have had the pleasure of speaking with Kathleen Dahlgren, founder of Cognition, several times. I first learned about Cognition at the Boston Infonortics Search Engines meeting in 2009. That introduction led me to a closer look several months later when researching auto-categorization software. I was impressed with the comprehensive English language semantic net they had doggedly built over a 20+ year period.
A semantic net is a map of language that explicitly defines the many relationships among words and phrases. It might be very simple to illustrate something as fundamental as a small geographical locale and all named entities within it, or as complex as the entire base language of English with every concept mapped to illustrate all the ways that any one term is related to other terms, as illustrated in this tiny subset. Dr. Dahlgren and her team are among the few companies that have created a comprehensive semantic net for English.
In 2003, Dr. Dahlgren established Cognition as a software company to commercialize its semantic net, designing software to apply it to semantic search applications. As the Gilbane Group launched its new research on Semantic Software Technologies, Cognition signed on as a study co-sponsor and we engaged in several discussions with them that rounded out their history in this new marketplace. It was illustrative of pioneering in any new software domain.
Early adopters are key contributors to any software development. It is notable that Cognition has attracted experts in fields as diverse as medical research, legal e-discovery and Web semantic search. This gives the company valuable feedback for their commercial development. In any highly technical discipline, it is challenging and exciting to finding subject experts knowledgeable enough to contribute to product evolution and Cognition is learning from client experts where the best opportunities for growth lie.
Recent interviews with Cognition executives, and those of other sponsors, gave me the opportunity to get their reactions to my conclusions about this industry. These were the more interesting thoughts that came from Cognition after they had reviewed the Gilbane report:
• Feedback from current clients and attendees at 2010 conferences, where Dr. Dahlgren was a featured speaker, confirms escalating awareness of the field; she feels that “This is the year of Semantics.” It is catching the imagination of IT folks who understand the diverse and important business problems to which semantic technology can be applied.
• In addition to a significant upswing in semantics applied in life sciences, publishing, law and energy, Cognition sees specific opportunities for growth in risk assessment and risk management. Using semantics to detect signals, content salience, and measures of relevance are critical where the quantity of data and textual content is too voluminous for human filtering. There is not much evidence that financial services, banking and insurance are embracing semantic technologies yet, but it could dramatically improve their business intelligence and Cognition is well positioned to give support to leverage their already tested tools.
• Enterprise semantic search will begin to overcome the poor reputation that traditional “string search” has suffered. There is growing recognition among IT professionals that in the enterprise 80% of the queries are unique; these cannot be interpreted based on popularity or social commentary. Determining relevance or accuracy of retrieved results depends on the types of software algorithms that apply computational linguistics, not pattern matching or statistical models.
• In Dr. Dahlgren’s view, there is no question that a team approach to deploying semantic enterprise search is required. This means that IT professionals will work side-by-side with subject matter experts, search experts and vocabulary specialists to gain the best advantage from semantic search engines.
• The unique language aspects of an enterprise content domain are as important as the software a company employs. The Cognition baseline semantic net, out-of-the-box, will always give reliable and better results than traditional string search engines. However, it gives top performance when enhanced with enterprise language, embedding all the ways that subject experts talk about their topical domain, jargon, acronyms, code phrases, etc.
With elements of its software already embedded in some notable commercial applications like Bing, Cognition is positioned for delivering excellent semantic search for an enterprise. They are taking on opportunities in areas like risk management that have been slow to adopt semantic tools. They will deliver software to these customers together with services and expertise to coach their clients through the implementation, deployment and maintenance essential to successful use. The enthusiasm expressed to me by Kathleen Dahlgren about semantics confirms what I also heard from Cognition clients. They are confident that the technology coupled with thoughtful guidance from their support services will be the true value-added for any enterprise semantic search application using Cognition.
The free download of the Gilbane study and deep-dive on Cognition was announced on their Web site at this page.
Dr. Phil Hastings and Dr. David Milward spoke with me in June, 2010, as I was completing the Gilbane report, Semantic Software Technologies: A Landscape of High Value Applications for the Enterprise. My interest in a conversation was stimulated by several months of discussions with customers of numerous semantic software companies. Having heard perspectives from early adopters of Linguamatics’ I2E and other semantic software applications, I wanted to get some comments from two key officers of Linguamatics about what I heard from the field. Dr. Milward is a founder and CTO, and Dr. Hastings is the Director of Business Development.
A company with sustained profitability for nearly ten years in the enterprise semantic market space has credibility. Reactions from a maturing company to what users have to say are interesting and carry weight in any industry. My lines of inquiry and the commentary from the Linguamatics officers centered around their own view of the market and adoption experiences.
When asked about growth potential for the company outside of pharmaceuticals where Linguamatics already has high adoption and very enthusiastic users, Drs. Milward and Hastings asserted their ongoing principal focus in life sciences. They see a lot more potential in this market space, largely because of the vast amounts of unstructured content being generated, coupled with the very high-value problems that can be solved by text mining and semantically analyzing the data from those documents. Expanding their business further in the life sciences means that they will continue engaging in research projects with the academic community. It also means that Linguamatics semantic technology will be helping organizations solve problems related to healthcare and homeland security.
The wisdom of a measured and consistent approach comes through strongly when speaking with Linguamatics executives. They are highly focused and cite the pitfalls of trying to “do everything at once,” which would be the case if they were to pursue all markets overburdened with tons of unstructured content. While pharmaceutical terminology, a critical component of I2E, is complex and extensive, there are many aids to support it. The language of life sciences is in a constant state of being enriched through refinements to published thesauri and ontologies. However, in other industries with less technical language, Linguamatics can still provide important support to analyze content in the detection of signals and patterns of importance to intelligence and planning.
Much of the remainder of the interview centered on what I refer to as the “team competencies” of individuals who identify the need for any semantic software application; those are the people who select, implement and maintain it. When asked if this presents a challenge for Linguamatics or the market in general, Milward and Hastings acknowledged a learning curve and the need for a larger pool of experts for adoption. This is a professional growth opportunity for informatics and library science people. These professionals are often the first group to identify Linguamatics as a potential solutions provider for semantically challenging problems, leading business stakeholders to the company. They are also good advocates for selling the concept to management and explaining the strong benefits of semantic technology when it is applied to elicit value from otherwise under-leveraged content.
One Linguamatics core operating principal came through clearly when talking about the personnel issues of using I2E, which is the necessity of working closely with their customers. This means making sure that expectations about system requirements are correct, examples of deployments and “what the footprint might look like” are given, and best practices for implementations are shared. They want to be sure that their customers have a sense of being in a community of adopters and are not alone in the use of this pioneering technology. Building and sustaining close customer relationships is very important to Linguamatics, and that means an emphasis on services co-equally with selling licenses.
Linguamatics has come a long way since 2001. Besides a steady effort to improve and enhance their technology through regular product releases of I2E, there have been a lot of “show me” and “prove it” moments to which they have responded. Now, as confidence in and understanding of the technology ramps up, they are getting more complex and sophisticated questions from their customers and prospects. This is the exciting part as they are able to sell I2E’s ability to “synthesize new information from millions of sources in ways that humans cannot.” This is done by using the technology to keep track of and processing the voluminous connections among information resources that exceed human mental limits.
At this stage of growth, with early successes and excellent customer adoption, it was encouraging to hear the enthusiasm of two executives for the evolution of the industry and their opportunities in it.
The Gilbane report and a deep dive on Linguamatics are available through this Press Release on their Web site.
Meaning is a very large concept in every aspect of search technology and dozens of search product sites include either the words “semantic” or “meaning” as a key element of the offered technology. This is not as far fetched as search product claims to “know” what the searcher wants to find, as if “knowing” can be attributable to non-human operations. However, how well a search engine indexes and retrieves content to meet a searcher’s intent, is truly in the eyes of the beholder. I can usually understand why, technically speaking, a piece of content turns up in a search result, but that does not mean that it was a valid scrap for my intent. My intent for a search cannot possibly be discernible by a search engine if, as is most often the case, I don’t explicitly and eloquently express what, why, and other contextual facts when entering a query.
The session we have set aside at Gilbane San Francisco for a discussion on current activity related to semantic technologies will undoubtedly reveal more meaning about technologies and art of leveraging tools to elicit semantically relevant content. I suspect that someone will also stipulate that what works requires a defined need and clear intent during the implementation process – but what about all those fuzzy situations? I hope to find out.
This is the last posting before the conference this week so I hope you will add this enterprise search session (EST-6: Semantic Technology – Breakdown or Breakthrough) being moderated by Colin Britton to your agenda on June 19th. He will be joined by speakers: Steve Carton, VP Content Technologies, Retrieval Systems Corp., Folksonomies: Just Good Enough for all Kinds of Thing, Prakesh Govindarajulu, President, RealTech Inc, Building Enterprise Taxonomies from the Ground Up, and Jack Jia, Founder & CEO, Baynote.
See you in San Francisco in person or virtually thereafter.