Category: Semantic technology (page 1 of 2)

Gilbane Conference workshops

In case you missed it last week while on vacation the Gilbane Conference workshop schedule and descriptions were posted. The half-day workshops tale place at the Intercontinental Boston Waterfront Hotel on Tuesday, November 27, 9:00 am to 4:00 pm:

Save the date and check http://gilbaneboston.com for further information about the main conference schedule & conference program as they become available.

W3C Launches Linked Data Platform Working Group

W3C launched the new Linked Data Platform (LDP) Working Group to promote the use of linked data on the Web. Per its charter, the group will explain how to use a core set of services and technologies to build powerful applications capable of integrating public data, secured enterprise data, and personal data. The platform will be based on proven Web technologies including HTTP for transport, and RDF and other Semantic Web standards for data integration and reuse. The group will produce supporting materials, such as a description of uses cases, a list of requirements, and a test suite and/or validation tools to help ensure interoperability and correct implementation.

A rarity these days – an announcement that used ‘data’ instead of ‘big data’! And the co-chairs are even from IBM and EMC.

First group of Gilbane sponsors posted for Boston conference

Conference planning is starting to ramp up. See our first group of Gilbane sponsors, and don’t forget the call for papers!

Why is it so Hard to “Get” Semantics Inside the Enterprise?

Semantic Software Technologies: Landscape of High Value Applications for the Enterprise was published just over a year ago. Since then the marketplace has been increasingly active; new products emerge and discussion about what semantics might mean for the enterprise is constant. One thing that continues to strike me is the difficulty of explaining the meaning of, applications for, and context of semantic technologies.

Browsing through the topics in this excellent blog site, http://semanticweb.com , it struck me as the proverbial case of the blind men describing an elephant. A blog, any blog, is linear. While there are tools to give a blog dimension by clustering topics or presenting related information, it is difficult to understand the full relationships of any one blog post to another. Without a photographic memory, an individual does not easily connect ideas across a multi-year domain of blog entries. Semantic technologies can facilitate that process.

Those who embrace some concept of semantics are believers that search will benefit from “semantic technologies.” What is less clear is how evangelists, developers, searchers and the average technology user can coalesce around the applications that will semantically enable enterprise search.

On the Internet content that successfully drives interest, sales, opinion and individual promotion does so through a combination of expert crafting of metadata, search engine technology that “understands” the language of the inquirer and the content that can satisfy the inquiry. Good answers are reached when questions are understood first and then the right content is selected to meet expectations.

In the enterprise, the same care must be given to metadata, search engine “meaning” analysis tools and query interpretation for successful outcomes. Magic does not happen without people behind the scenes to meet these three criteria executing linguistic curation, content enhancement and computational linguistic programming.

Three recent meeting events illustrate various states of semantic development and adoption, even as the next conference, Semantic Tech & Business Conference – Washington, D.C. on November 29 – is upon us:

Event 1 – A relatively new group, the IKS-Community funded by the EU has been supporting open source software developers since 2009. In July they held a workshop in Paris just past the mid-point of their life cycle. Attendees were primarily entrepreneurs and independent open source developers seeking pathways for their semantically “tuned” content management solutions. I was asked to suggest where opportunities and needs exist in US markets. They were an enthusiastic audience and are poised to meet the tough market realities of packaging highly sophisticated software for audiences that will rarely understand how complex the stuff “under the hood” really is. My principal charge to them was to create tools that “make it really easy” to work with vocabulary management and content metadata capture, updates, and enhancements.

Event 2. – On this side of the pond, UK firm Linguamatics hosted its user group meeting in Boston in October. Having interviewed a number of their customers last year to better understand their I2E product line, I was happy to meet people I had spoken with and see the enthusiasm of a user community vested in such complex technology. Most impressive is the respectful tone and thoughtful sharing between Linguamatics principals and their customers. They share the knowledge of how hard it is to continually improve search technology that delivers answers to semantically complex questions using highly specialized language. Content contributors and inquirers are all highly educated specialists seeking answers to questions that have never been asked before. Think about it, search engines designed to deliver results for frequently asked questions or to find content on popular topics is hard enough, but finding the answer to a brand new question is a quantum leap of difficulty in comparison.

To make matters even more complicated, answers to semantic (natural language) questions may be found in internal content, in published licensed content or some combination of both. In the latter case, only the seeker may be able to put the two together to derive or infer an answer.

Publishers of content for licensing play a convoluted game of how they will license their content to enterprises for semantic indexing in combination with internal content. The Linguamatics user community is primarily in life sciences; this is one more hurdle for them to overcome to effectively leverage the vast published repositories of biological and medical literature. Rigorous pricing may be good business strategy, but research using semantic search could make more headway with more reasonable royalties that reflect the need for collaborative use across teams and partners.

Content wants to be found and knowledge requires outlets to enable innovation to flourish. In too many cases technology is impaired by lack of business resources by buyers or arcane pricing models of sellers that hold vital information captive for a well-funded few. Semantically excellent retrieval depends on an engine’s indexing access to all contextually relevant content.

Event 3. – Leslie Owens of Forrester Research, at the Fall 2011 Enterprise Search Summit conducted a very interesting interactive session that further affirms the elephant and blind men metaphor. Leslie is a champion of metadata best practices and writes about the competencies and expertise needed to make valuable content accessible. She engaged the audience with a series of questions about its wants, needs, beliefs and plans for semantic technologies. As described in an earlier paragraph about how well semantics serves us on the Web, most of the audience puts its faith in that model but is doubtful of how or when similar benefits will accrue to enterprise search. Leslie and a couple of others made the point that a lot more work has to be done on the back-end on content in the enterprise to get these high-value outcomes.

We’ll keep making the point until more adopters of semantic technologies get serious and pay attention to content, content enhancement, expert vocabulary management and metadata. If it is automatic understanding of your content that you are seeking, the vocabulary you need is one that you build out and enhance for your enterprise’s relevance. Semantic tools need to know the special language you use to give the answers you need.

How Far Does Semantic Software Really Go?

A discussion that began with a graduate scholar at George Washington University in November, 2010 about semantic software technologies prompted him to follow up with some questions for clarification from me. With his permission, I am sharing three questions from Evan Faber and the gist of my comments to him. At the heart of the conversation we all need to keep having is, how far does this technology go and does it really bring us any gains in retrieving information?

1. Have AI or semantic software demonstrated any capability to ask new and interesting questions about the relationships among information that they process?

In several recent presentations and the Gilbane Group study on Semantic Software Technologies, I share a simple diagram of the nominal setup for the relationship of content to search and the semantic core, namely a set of terminology rules or terminology with relationships. Semantic search operates best when it focuses on a topical domain of knowledge. The language that defines that domain may range from simple to complex, broad or narrow, deep or shallow. The language may be applied to the task of semantic search from a taxonomy (usually shallow and simple), a set of language rules (numbering thousands to millions) or from an ontology of concepts to a semantic net with millions of terms and relationships among concepts.

The question Evan asks is a good one with a simple answer, “Not without configuration.” The configuration needs human work in two regions:

  • Management of the linguistic rules or ontology
  • Design of search engine indexing and retrieval mechanisms

When a semantic search engine indexes content for natural language retrieval, it looks to the rules or semantic nets to find concepts that match those in the content. When it finds concepts in the content with no equivalent language in the semantic net, it must find a way to understand where the concepts belong in the ontological framework. This discovery process for clarification, disambiguation, contextual relevance, perspective, meaning or tone is best accompanied with an interface making it easy for a human curator or editor to update or expand the ontology. A subject matter expert is required for specialized topics. Through a process of automated indexing that both categorizes and exposes problem areas, the semantic engine becomes a search engine and a questioning engine.

The entire process is highly iterative. In a sense, the software is asking the questions: “What is this?”, “How does it relate to the things we already know about?”, “How is the language being used in this context?” and so on.

2. In other words, once they [the software] have established relationships among data, can they use that finding to proceed – without human intervention- to seek new relationships?

Yes, in the manner described for the previous question. It is important to recognize that the original set of rules, ontologies, or semantic nets that are being applied were crafted by human beings with subject matter expertise. It is unrealistic to think that any team of experts would be able to know or anticipate every use of the human language to codify it in advance for total accuracy. The term AI is, for this reason, a misnomer because the algorithms are not thinking; they are only looking up “known-knowns” and applying them. The art of the software is in recognizing when something cannot be discerned or clearly understood; then the concept (in context) is presented for the expert to “teach” the software what to do with the information.

State-of-the-art software will have a back-end process for enabling implementer/administrators to use the results of search (direct commentary from users or indirectly by analyzing search logs) to discover where language has been misunderstood as evidenced by invalid results. Over time, more passes to update linguistic definitions, grammar rules, and concept relationships will continue to refine and improve the accuracy and comprehensiveness of search results.

3. It occurs to me that the key value added of semantic technologies to decision-making is their capacity to link sources by context and meaning, which increases situational awareness and decision space. But can they probe further on their own?

Good point on the value and in a sense, yes, they can. Through extensive algorithmic operations, instructions can be embedded (and probably are for high-value situations like intelligence work), instructing the software what to do with newly discovered concepts. Instructions might then place these new discoveries into categories of relevance, importance, or associations. It would not be unreasonable to then pass documents with confounding information off to other semantic tools for further examination. Again, without human analysis along the continuum and at the end point, no certainty about the validity of the software’s decision-making can be asserted.

I can hypothesize a case in which a corpus of content contains random documents in foreign languages. From my research, I know that some of the semantic packages have semantic nets in multiple languages. If the corpus contains material in English, French, German and Arabic, these materials might be sorted and routed off to four different software applications. Each batch would be subject to further linguistic analysis, followed by indexing with some middleware applied to the returned results for normalization, and final consolidation into a unified index. Does this exist in the real world now? Probably there are variants but it would take more research to find the cases, and they may be subject to restrictions that would require the correct clearances.

Discussions with experts who have actually employed enterprise specific semantic software, underscores the need for subject expertise, and some computational linguistics training coupled with an aptitude for creative inquiry. These scientists informed me that individuals, who are highly multi-disciplinary and facile with electronic games and tools, did the best job of interacting with the software and getting excellent results. Tuning and configuration over time by the right human players is still a fundamental requirement.

Semantically Focused and Building on a Successful Customer Base

Dr. Phil Hastings and Dr. David Milward spoke with me in June, 2010, as I was completing the Gilbane report, Semantic Software Technologies: A Landscape of High Value Applications for the Enterprise. My interest in a conversation was stimulated by several months of discussions with customers of numerous semantic software companies. Having heard perspectives from early adopters of Linguamatics’ I2E and other semantic software applications, I wanted to get some comments from two key officers of Linguamatics about what I heard from the field. Dr. Milward is a founder and CTO, and Dr. Hastings is the Director of Business Development.

A company with sustained profitability for nearly ten years in the enterprise semantic market space has credibility. Reactions from a maturing company to what users have to say are interesting and carry weight in any industry. My lines of inquiry and the commentary from the Linguamatics officers centered around their own view of the market and adoption experiences.

When asked about growth potential for the company outside of pharmaceuticals where Linguamatics already has high adoption and very enthusiastic users, Drs. Milward and Hastings asserted their ongoing principal focus in life sciences. They see a lot more potential in this market space, largely because of the vast amounts of unstructured content being generated, coupled with the very high-value problems that can be solved by text mining and semantically analyzing the data from those documents. Expanding their business further in the life sciences means that they will continue engaging in research projects with the academic community. It also means that Linguamatics semantic technology will be helping organizations solve problems related to healthcare and homeland security.

The wisdom of a measured and consistent approach comes through strongly when speaking with Linguamatics executives. They are highly focused and cite the pitfalls of trying to “do everything at once,” which would be the case if they were to pursue all markets overburdened with tons of unstructured content. While pharmaceutical terminology, a critical component of I2E, is complex and extensive, there are many aids to support it. The language of life sciences is in a constant state of being enriched through refinements to published thesauri and ontologies. However, in other industries with less technical language, Linguamatics can still provide important support to analyze content in the detection of signals and patterns of importance to intelligence and planning.

Much of the remainder of the interview centered on what I refer to as the “team competencies” of individuals who identify the need for any semantic software application; those are the people who select, implement and maintain it. When asked if this presents a challenge for Linguamatics or the market in general, Milward and Hastings acknowledged a learning curve and the need for a larger pool of experts for adoption. This is a professional growth opportunity for informatics and library science people. These professionals are often the first group to identify Linguamatics as a potential solutions provider for semantically challenging problems, leading business stakeholders to the company. They are also good advocates for selling the concept to management and explaining the strong benefits of semantic technology when it is applied to elicit value from otherwise under-leveraged content.

One Linguamatics core operating principal came through clearly when talking about the personnel issues of using I2E, which is the necessity of working closely with their customers. This means making sure that expectations about system requirements are correct, examples of deployments and “what the footprint might look like” are given, and best practices for implementations are shared. They want to be sure that their customers have a sense of being in a community of adopters and are not alone in the use of this pioneering technology. Building and sustaining close customer relationships is very important to Linguamatics, and that means an emphasis on services co-equally with selling licenses.

Linguamatics has come a long way since 2001. Besides a steady effort to improve and enhance their technology through regular product releases of I2E, there have been a lot of “show me” and “prove it” moments to which they have responded. Now, as confidence in and understanding of the technology ramps up, they are getting more complex and sophisticated questions from their customers and prospects. This is the exciting part as they are able to sell I2E’s ability to “synthesize new information from millions of sources in ways that humans cannot.” This is done by using the technology to keep track of and processing the voluminous connections among information resources that exceed human mental limits.

At this stage of growth, with early successes and excellent customer adoption, it was encouraging to hear the enthusiasm of two executives for the evolution of the industry and their opportunities in it.

The Gilbane report and a deep dive on Linguamatics are available through this Press Release on their Web site.

Semantic Technology: Sharing a Large Market Space

It is always interesting to talk shop with the experts in a new technology arena. My interview with Luca Scagliarini, VP of Strategy and Business Development for Expert System, and Brooke Aker, CEO of Expert System USA was no exception. They had been digesting my research on Semantic Software Technologies and last week we had a discussion about what is in the Gilbane report.

When asked if they were surprised by anything in my coverage of the market, the simple answer was “not really, nothing we did not already know.” The longer answer related to the presentation of our research illustrating the scope and depth of the marketplace. These two veterans of the semantic industry admitted that the number of players, applications and breadth of semantic software categories is impressive when viewed in one report. Mr. Scagliarini commented on the huge amount of potential still to be explored by vendors and users.

Our conversation then focused on where we think the industry is headed and they emphasized that this is still an early stage and evolving area. Both acknowledged the need for simplification of products to ease their adoption. It must be straightforward for buyers to understand what they are licensing, the value they can expect for the price they pay; implementation, packaging and complementary services need to be easily understood.
Along the lines of simplicity, they emphasized the specialized nature of most of the successful semantic software applications, noting that these are not coming from the largest software companies. State-of-the-art tools are being commercialized and deployed for highly refined applications out of companies with a small footprint of experienced experts.

Expert System knows about the need for expertise in such areas as ontologies, search, and computational linguistic applications. For years they have been cultivating a team of people for their development and support operations. It has not always been easy to find these competencies, especially right out of academia. Aker and Scagliarini pointed out the need for a lot of pragmatism, coupled with subject expertise, to apply semantic tools for optimal business outcomes. It was hard in the early years for them to find people who could leverage their academic research experiences for a corporate mission.

Human resource barriers have eased in recent years as younger people who have grown up with a variety of computing technologies seem to grasp and understand the potential for semantic software tools more quickly.
Expert System itself is gaining traction in large enterprises that have segmented groups within IT that are dedicated to “learning” applications, and formalized ways of experimenting with, testing and evaluating new technologies. When they become experts in tool use, they are much better at proving value and making the right decisions about how and when to apply the software.

Having made good strides in energy, life sciences, manufacturing and homeland security vertical markets, Expert System is expanding its presence with the Cogito product line in other government agencies and publishing. The executives reminded me that they have semantic nets built out in Italian, Arabic and German, as well as English. This is unique among the community of semantic search companies and will position them for some interesting opportunities where other companies cannot perform.

I enjoyed listening and exchanging commentary about the semantic software technology field. However, Expert System and Gilbane both know that the semantic space is complex and they are sharing a varied landscape with a lot of companies competing for a strong position in a young industry. They have a significant share already.
For more about Expert System and the release of this sponsored research you can view their recent Press Release.

Data Mining for Energy Independence

Mining content for facts and information relationships is a focal point of many semantic technologies. Among the text analytics tools are those for mining content in order to process it for further analysis and understanding, and indexing for semantic search. This will move enterprise search to a new level of research possibilities.

Research for a forthcoming Gilbane report on semantic software technologies turned up numerous applications used in the life sciences and publishing. Neither semantic technologies nor text mining are mentioned in this recent article Rare Sharing of Data Leads to Progress on Alzheimer’s in the New York Times but I am pretty certain that these technologies had some role in enabling scientists to discover new data relationships and synthesize new ideas about Alzheimer’s biomarkers. The sheer volume of data from all the referenced data sources demands computational methods to distill and analyze.

One vertical industry poised for potential growth of semantic technologies is the energy field. It is a special interest of mine because it is a topical area in which I worked as a subject indexer and searcher early in my career. Beginning with the 1st energy crisis, oil embargo of the mid-1970s, I worked in research organizations that involved both fossil fuel exploration and production, and alternative energy development.

A hallmark of technical exploratory and discovery work is the time gaps between breakthroughs; there are often significant plateaus between major developments. This happens if research reaches a point that an enabling technology is not available or commercially viable to move to the next milestone of development. I observed that the starting point in the quest for innovative energy technologies often began with decades-old research that stopped before commercialization.

Building on what we have already discovered, invented or learned is one key to success for many “new” breakthroughs. Looking at old research from a new perspective to lower costs or improve efficiency for such things as photovoltaic materials or electrochemical cells (batteries) is what excellent companies do.
How does this relate to semantic software technologies and data mining? We need to begin with content that was generated by research in the last century; much of this is just now being made electronic. Even so, most of the conversion from paper, or micro formats like fîche, is to image formats. In order to make the full transition to enable data mining, content must be further enhanced through optical character recognition (OCR). This will put it into a form that can be semantically parsed, analyzed and explored for facts and new relationships among data elements.

Processing of old materials is neither easy nor inexpensive. There are government agencies, consortia, associations, and partnerships of various types of institutions that often serve as a springboard for making legacy knowledge assets electronically available. A great first step would be having DOE and some energy industry leaders collaborating on this activity.

A future of potential man-made disasters, even when knowledge exists to prevent them, is not a foregone conclusion. Intellectually, we know that energy independence is prudent, economically and socially mandatory for all types of stability. We have decades of information and knowledge assets in energy related fields (e.g. chemistry, materials science, geology, and engineering) that semantic technologies can leverage to move us toward a future of energy independence. Finding nuggets of old information in unexpected relationships to content from previously disconnected sources is a role for semantic search that can stimulate new ideas and technical research.

A beginning is a serious program of content conversion capped off with use of semantic search tools to aid the process of discovery and development. It is high time to put our knowledge to work with state-of-the-art semantic software tools and by committing human and collaborative resources to the effort. Coupling our knowledge assets of the past with the ingenuity of the present we can achieve energy advances using semantic technologies already embraced by the life sciences.

Leveraging Language in Enterprise Search Deployments

It is not news that enterprise search has been relegated to the long list of failed technologies by some. We are at the point where many analysts and business writers have called for a moratorium on the use of the term. Having worked in a number of markets and functional areas (knowledge management/KM, special libraries, and integrated library software systems) that suffered the death knell, even while continuing to exist, I take these pronouncements as a game of sorts.

Yes, we have seen the demise of vinyl phonograph records, cassette tapes and probably soon musical CD albums, but those are explicit devices and formats. When you can’t buy or play them any longer, except in a museum or collector’s garage, they are pretty dead in the marketplace. This is not true of search in the enterprise, behind the firewall, or wherever it needs to function for business purposes. People have always needed to find “stuff” to do their work. KM methods and processes, special libraries and integrated library systems still exist, even as they were re-labeled for PR and marketing purposes.

What is happening to search in the enterprise is that it is finding its purpose, or more precisely its hundreds of purposes. It is not a monolithic software product, a one-size-fits-all. It comes in dozens of packages, models, and price ranges. It may be embedded in other software or standalone. It may be procured for a point solution to support retrieval of content for one business unit operating in a very narrow topical range, or it may be selected to give access to a broad range of documents that exist in numerous enterprise domains on many subjects.

Large enterprises typically have numerous search solutions in operation, implementation, and testing, all at the same time. They are discovering how to deploy and leverage search systems and they are refining their use cases based on what they learn incrementally through their many implementations. Teams of search experts are typically involved in selecting, deploying and maintaining these applications based on their subject expertise and growing understanding of what various search engines can do and how they operate.

After years of hearing about “the semantic Web,” the long sought after “holy grail” of Web search, there is a serious ramping of technology solutions. Most of these applications can also make search more semantically relevant behind the firewall. These technologies have been evolving for decades beginning with so-called artificial intelligence, and now supported by some categories of computational linguistics such as specific algorithms for parsing content and disambiguating terms. A soon to-be released study featuring some of noteworthy applications reveals just how much is being done in enterprises for specific business purposes.

With this “teaser” on what is about to be published, I leave you with one important thought, meaningful search technologies depend on rich linguistically-based technologies. Without a cornucopia of software tools to build terminology maps and dictionaries, analyze content linguistically in context to elicit meaning, parse and evaluate unstructured text data sources, and manage vocabularies of ever more complex topical domains, semantic search could not exist.

Language complexities are challenging and even vexing. Enterprises will be finding solutions to leverage what they know only when they put human resources into play to work with the lingo of their most valuable domains.

Layering Technologies to Support the Enterprise with Semantic Search

Semantic search is a composite beast like many enterprise software applications. Most packages are made up of multiple technology components and often from multiple vendors. This raises some interesting thoughts as we prepare for Gilbane Boston 2009 to be held this week.

As part of a panel on semantic search, moderated by Hadley Reynolds of IDC, with Jeff Fried of Microsoft and Chris Lamb of the OpenCalais Initiative at Thomson Reuters, I wanted to give a high level view of semantic technologies currently in the marketplace. I contacted about a dozen vendors and selected six to highlight for the variety of semantic search offerings and business models.

One case study involves three vendors, each with a piece of the ultimate, customer-facing, product. My research took me to one company that I had reviewed a couple of years ago, and they sent me to their “customer” and to the customer’s customer. It took me a couple of conversations and emails to sort out the connections; in the end the relationships made perfect sense.

On one hand we have conglomerate software companies offering “solutions” to every imaginable enterprise business need. On the other, we see very unique, specialized point solutions to universal business problems with multiple dimensions and twists. Teaming by vendors, each with a solution to one dimension of a need, create compound product offerings that are adding up to a very large semantic search marketplace.

Consider an example of data gathering by a professional services firm. Let’s assume that my company has tens of thousands of documents collected in the course of research for many clients over many years. Researchers may move on to greater responsibility or other firms, leaving content unorganized except around confidential work for individual clients. We now want to exploit this corpus of content to create new products or services for various vertical markets. To understand what we have, we need to mine the content for themes and concepts.

The product of the mining exercise may have multiple uses: help us create a taxonomy of controlled terms, preparing a navigation scheme for a content portal, providing a feed to some business or text analytics tools that will help us create visual objects reflecting various configurations of content. A text mining vendor may be great at the mining aspect while other firms have better tools for analyzing, organizing and re-shaping the output.

Doing business with two or three vendors, experts in their own niches, may help us reach a conclusion about what to do with our information-rich pile of documents much faster. A multi-faceted approach can be a good way to bring a product or service to market more quickly than if we struggle with generic products from just one company.

When partners each have something of value to contribute, together they offer the benefits of the best of all options. This results in a new problem for businesses looking for the best in each area, namely, vendor relationship management. But it also saves organizations from dealing with huge firms offering many acquired products that have to be managed through a single point of contact, a generalist in everything and a specialist in nothing. Either way, you have to manage the players and how the components are going to work for you.

I really like what I see, semantic technology companies partnering with each other to give good-to-great solutions for all kinds of innovative applications. By the way, at the conference I am doing a quick snapshot on each: Cogito, Connotate (with Cormine and WorldTech), Lexalytics, Linguamatics, Sinequa and TEMIS.

Older posts

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑