Year: 2010 (page 1 of 2)

Focused on Unifying Content to Reduce Information Overload

A theme running through the sessions I attended at Enterprise Search Summit and KMWorld 2010 in Washington, DC last month was the diversity of ways in which organizations are focused on getting answers to stakeholders more quickly. Enterprises deploying content technologies, all with enterprise search as the end game, seek to narrow search results accurately to retrieve and display the best and most relevant content.

Whether the process is referred to as unified indexing, federating content or information integration, each constitutes a similar focus among the vendors I took time to engage with at the conference. Each is positioned to solve different information retrieval problems, and were selected to underscore what I have tried to express in my recent Gilbane Beacon, Establishing a Successful Enterprise Search Program: Five Best Practices, namely the need to first establish a strategic business need. The best practices include the need for understanding how existing technologies and content structures function is the enterprise before settling on any one product or strategy. The essential activity of conducting a proof of concept (POC) or pilot project to confirm product suitability for the targeted business challenge is clearly mandated.

These products, in alphabetic order, are all notable for their unique solutions tailored to different audiences of users and business requirements. All embody an approach to unifying enterprise content for a particular business function:

Access Innovations (AI) was at KMWorld to demonstrate the aptly named product suite, Data Harmony. AI products cover a continuum of tools to build and maintain controlled vocabularies (AKA taxonomies and thesauri), add content metadata through processes tightly integrated with the corresponding vocabularies, search and navigation. Its vocabulary and content management tools can be layered to integrate with existing CMS and enterprise search systems.

Attivio, a company providing a platform solution known as Active Intelligence Engine (AIE), has developers specializing in open source tools for content retrieval solutions with excellent retrieval as the end point. AIE is a platform for enterprises seeking to unify structured and unstructured content across the enterprise, and from the web. By leveraging open source components they provide their customers with a platform that can be developed to enhance search for a particular solution, including bringing Web 2.0 social content into unity with enterprise content for further business intelligence analysis.

Coveo has steadily marched into a dominant position across all vertical industries with its efficiently packaged and reasonably priced enterprise search solutions, since I was first introduced to them in 2007. Their customers are always enthusiastic presenters at KMWorld, representing a population of implementers who seek to make enterprise search available to users quickly, and with a minimum of fuss. This year, Shelley Norton from Children’s Hospital Boston did not disappoint. She ticked off steps in an efficient selection, implementation and deployment process for getting enterprise search up and running smoothly to deliver trustworthy and accurate results to the hospital’s constituents. I always value and respect customer story-telling.

Darwin Awareness Engine was named the KMWorld Promise Award Winner for 2010. Since their founder is local to our home-base and a frequent participant in the Boston KM Forum (KMF) meetings, we are pretty happy for their official arrival on the scene and the recognition. It was just a year ago that they presented the prototype at the KMF. Our members were excited to see the tool exposing layers of news feeds to hone in on topics of interest to see what was aggregated and connected in really “real-time.” Darwin content presentation is unique in that the display reveals relationships and patterns among topics in the Web 2.0 sphere that are suddenly apparent due to their visual connections in the display architecture. The public views are only an example of what a very large enterprise might reveal about its own internal communications through social tools within the organization.

The newest newcomer, RAMP, was introduced to me by Nate Treloar in the closing hours of KMWorld. Nate came to this start-up from Microsoft and the FAST group and is excited about this new venture. Neither exhibiting, nor presenting, Nate was anxious to reach out to analysts and potential partners to share the RAMP vision for converting speech from audio and video feeds to reliable searchable text. This would enable the unification of audio, video and other content to finally be searched from its “full text” on the Web in a single pass. Now, we depend on the contribution of explicit metadata by contributors of non-text content. Long awaiting excellence in speech to indexing for search, I was “all ears” during our conversation and look forward to seeing more of RAMP at future meetings.

Whatever the strategic business need, the ability to deliver a view of information that is unified, cohesive and contextually understandable will be a winning outcome. With the Beacon as a checklist for your decision process, information integration is attainable by making the right software selection for your enterprise application.

Coherence and Augmentation: KM-Search Connection

This space is not normally used to comment on knowledge management (KM), one of my areas of consulting, but a recent conference gives me an opening to connect the dots between KM and search. Dave Snowden and Tom Stewart always have worthy commentary on KM and as keynote speakers they did not disappoint at KMWorld. It may seem a stretch but by taking a few of their thoughts out of context, I can synthesize a relationship between KM and search.

KMWorld, Enterprise Search Summit, SharePoint Symposium and Taxonomy Boot Camp moved to Washington D.C. for the 2010 Fall Conference earlier this month. I attended to teach a workshop on building a semantic platform, and to participate in a panel discussion to wrap up the conference with two other analysts, Leslie Owen and Tony Byrne with Jane Dysart moderating.

Comments from the first and last keynote speakers of the conference inspired my final panel comments, counseling attendees to lead by thoughtfully leveraging technology only to enhance knowledge. But there were other snippets that prompt me to link search and KM.

Tom Stewart’s talk was entitled, Knowledge Driven Enterprises: Strategies & Future Focus, which he couched in the context of achieving a “coherent” winning organization. He explained that to reach the coherence destination requires understanding of different types of knowledge and how we need to behave for attaining each type (e.g. “knowable complicated “knowledge calls for experts and research; “emergent complex” knowledge calls for leadership and “sense-making.”).

Stewart describes successful organizations as those in which “the opportunities outside line up with the capabilities inside.” He explains that those “companies who do manage to reestablish focus around an aligned set of key capabilities” use their “intellectual capital” to identify their intangible assets,” human capability, structural capital, and customer capital. They build relationship capital from among these capabilities to create a coherent company. Although Stewart does not mention “search,” it is important to note that one means to identify intangible assets is well-executed enterprise search with associated analytical tools.

Dave Snowden also referenced “coherence,” (messy coherence), even as he spoke about how failures tend to be more teachable (memorable) than successes. If you follow Snowden, you know that he founded the Cognitive Edge and has developed a model for applying cognitive learning to help build resilient organizations. He has taught complexity analysis and sense-making for many years and his interest in human learning behaviors is deep.

To follow the entire thread of Snowden’s presentation on the “The Resilient Organization” follow this link. I was particularly impressed with his statement about the talk, “one of the most heart-felt I have given in recent years.” It was one of his best but two particular comments bring me to the connection between KM and search.

Dave talked about technology as “cognitive augmentation,” its only truly useful function. He also puts forth what he calls the “three Golden rules: Use of distributed cognition, wisdom but not foolishness of crowds; finely grained objects, information and organizational; and disintermediation, putting decision makers in direct contact with raw data.”

Taking these fragments of Snowden’s talk, a technique he seems to encourage, I put forth a synthesized view of how knowledge and search technologies need to be married for consequential gain.

We live and work in a highly chaotic information soup, one in which we are fed a steady diet of fragments (links, tweets, analyzed content) from which we are challenged as thinkers to derive coherence. The best knowledge practitioners will leverage this messiness by detecting weak signals and seek out more fragments, coupling them thoughtfully with “raw data” to synthesize new innovations, whether they be practices, inventions or policies. Managing shifting technologies, changing information inputs, and learning from failures (our own, our institution’s and others) contributes to building a resilient organization.

So where does “search” come in? Search is a human operation and begins with the workforce. Going back to Stewart who commented on the need to recognize different kinds of knowledge, I posit that different kinds of knowledge demand different kinds of search. This is precisely what so many “enterprise search” initiatives fail to deliver. Implementers fail to account for all the different kinds of search, search for facts, search for expertise, search for specific artifacts, search for trends, search for missing data, etc.

When Dave Snowden states that “all of your workforce is a human scanner,” this could also imply the need for multiple, co-occurring search initiatives. Just as each workforce member brings a different perspective and capability to sensory information gathering, so too must enterprise search be set up to accommodate all the different kinds of knowledge gathering. And when Snowden notes that “There are limits to semantic technologies: Language is constantly changing so there is a requirement for constant tuning to sustain the same level of good results,” he is reminding us that technology is only good for cognitive augmentation. Technology is not a “plug ‘n play,” install and reap magical cognitive insights. It requires constant tuning to adapt to new kinds of knowledge.

The point is one I have made before; it is the human connection, human scanner and human understanding of all the kinds of knowledge we need in order to bring coherence to an organization. The better we balance these human capabilities, the more resilient we’ll be and the better skilled at figuring out what kinds of search technologies really make sense for today, and tomorrow we had better be ready for another tool for new fragments and new knowledge synthesis.

Understanding the Smart Content Technology Landscape

If you have been following recent XML Technologies blog entries, you will notice we have been talking a lot lately about XML Smart Content, what it is and the benefits it can bring to an organization. These include flexible, dynamic assembly for delivery to different audiences, search optimization to improve customer experience, and improvements for distributed collaboration. Great targets to aim for, but you may ask are we ready to pursue these opportunities? It might help to better understand the technology landscape involved in creating and delivering smart content.

The figure below illustrates the technology landscape for smart content. At the center are fundamental XML technologies for creating modular content, managing it as discrete chunks (with or without a formal content management system), and publishing it in an organized fashion. These are the basic technologies for “one source, one output” applications, sometimes referred to as Singe Source Publishing (SSP) systems.

XML and Smart Content Landscape

The innermost ring contains capabilities that are needed even when using a dedicated word processor or layout tool, including editing, rendering, and some limited content storage capabilities. In the middle ring are the technologies that enable single-sourcing content components for reuse in multiple outputs. They include a more robust content management environment, often with workflow management tools, as well as multi-channel formatting and delivery capabilities and structured editing tools. The outermost ring includes the technologies for smart content applications, which are described below in more detail.

It is good to note that smart content solutions rely on structured editing, component management, and multi-channel delivery as foundational capabilities, augmented with content enrichment, topic component assembly, and social publishing capabilities across a distributed network. Descriptions of the additional capabilities needed for smart content applications follow.

Content Enrichment / Metadata Management: Once a descriptive metadata taxonomy is created or adopted, its use for content enrichment will depend on tools for analyzing and/or applying the metadata. These can be manual dialogs, automated scripts and crawlers, or a combination of approaches. Automated scripts can be created to interrogate the content to determine what it is about and to extract key information for use as metadata. Automated tools are efficient and scalable, but generally do not apply metadata with the same accuracy as manual processes. Manual processes, while ensuring better enrichment, are labor intensive and not scalable for large volumes of content. A combination of manual and automated processes and tools is the most likely approach in a smart content environment. Taxonomies may be extensible over time and can require administrative tools for editorial control and term management.

Component Discovery / Assembly: Once data has been enriched, tools for searching and selecting content based on the enrichment criteria will enable more precise discovery and access. Search mechanisms can use metadata to improve search results compared to full text searching. Information architects and organizers of content can use smart searching to discover what content exists, and what still needs to be developed to proactively manage and curate the content. These same discovery and searching capabilities can be used to automatically create delivery maps and dynamically assemble content organized using them.

Distributed Collaboration / Social Publishing: Componentized information lends itself to a more granular update and maintenance process, enabling several users to simultaneously access topics that may appear in a single deliverable form to reduce schedules. Subject matter experts, both remote and local, may be included in review and content creation processes at key steps. Users of the information may want to “self-organize” the content of greatest interest to them, and even augment or comment upon specific topics. A distributed social publishing capability will enable a broader range of contributors to participate in the creation, review and updating of content in new ways.

Federated Content Management / Access: Smart content solutions can integrate content without duplicating it in multiple places, rather accessing it across the network in the original storage repository. This federated content approach requires the repositories to have integration capabilities to access content stored in other systems, platforms, and environments. A federated system architecture will rely on interoperability standards (such as CMIS), system agnostic expressions of data models (such as XML Schemas), and a robust network infrastructure (such as the Internet).

These capabilities address a broader range of business activity and, therefore, fulfill more business requirements than single-source content solutions. Assessing your ability to implement these capabilities is essential in evaluating your organizations readiness for a smart content solution.

Lucene Open Source Community Commits to a Future in Search

It has been nearly two years since I commented on an article in Information Week, Open Source, Its Time has Come, Nov. 2008. My main point was the need for deep expertise to execute enterprise search really well. I predicted the growth of service companies with that expertise, particularly for open source search. Not long after I announced that, Lucid Imagination was launched, with its focus on building and supporting solutions based on Lucene and, its more turnkey version, Solr.

It has not taken long for Lucid Imagination (LI) to take charge of the Lucene/Solr community of practice (CoP), and to launch its own platform built on Solr, Lucidworks Enterprise. Open source depends on deep and sustained collaboration; LI stepped into the breach to ensure that the hundreds of contributors, users and committers have a forum. I am pretty committed to CoPs myself and know that nurturing a community for the long haul takes dedicated leadership. In this case it is undoubtedly enlightened self-interest that is driving LI. They are poised to become the strongest presence for driving continuous improvements to open source search, with Apache Lucene as the foundation.

Two weeks ago LI hosted Lucene Revolution, the first such conference in the US. It was attended by over 300 in Boston, October 7-8 and I can report that this CoP is vibrant, enthusiastic. Moderated by Steve Arnold, the program ran smoothly and with excellent sessions. Those I attended reflected a respectful exchange of opinions and ideas about tools, methods, practices and priorities. While there were allusions to vigorous debate among committers about priorities for code changes and upgrades, the mood was collaborative in spirit and tinged with humor, always a good way to operate when emotions and convictions are on stage.

From my 12 pages of notes come observations about the three principal categories of sessions:

  1. Discussions, debates and show-cases for significant changes or calls for changes to the code
  2. Case studies based on enterprise search applications and experiences
  3. Case studies based on the use of Lucene and Solr embedded in commercial applications

Since the first category was more technical in nature, I leave the reader with my simplistic conclusions: core Apache Lucene and Solr will continue to evolve in a robust and aggressive progression. There are sufficient committers to make a serious contribution. Many who have decades of search experience are driving the charge and they have cut their teeth on the more difficult problems of implementing enterprise solutions. In announcing Lucidworks Enterprise, LI is clearly bidding to become a new force in the enterprise search market.

New and sustained build-outs of Lucene/Solr will be challenged by developers with ideas for diverging architectures, or “forking” code, on which Eric Gries, LI CEO, commented in the final panel. He predicted that forking will probably be driven by the need to solve specific search problems that current code does not accommodate. This will probably be more of a challenge for the spinoffs than the core Lucene developers, and the difficulty of sustaining separate versions will ultimately fail.

Enterprise search cases reflected those for whom commercial turnkey applications will not or cannot easily be selected; for them open source will make sense. Coming from LI’s counterpart in the Linux world, Red Hat, are these earlier observations about why enterprises should seek to embrace open source solutions, in short the sorry state of quality assurance and code control in commercial products. Add to that the cost of services to install, implement and customize commercial search products. The argument would be to go with open source for many institutions when there is an imperative or call for major customization.

This appears to be the case for two types of enterprises that were featured on the program: educational institutions and government agencies. Both have procurement issues when it comes to making large capital expenditures. For them it is easier to begin with something free, like open source software, then make incremental improvements and customize over time. Labor and services are cost variables that can be distributed more creatively using multiple funding options. Featured on the program were the Smithsonian, Adhere Solutions doing systems integration work for a number of government agencies, MITRE (a federally funded research laboratory), U. of Michigan, and Yale. CISCO also presented, a noteworthy commercial enterprise putting Lucene/Solr to work.

The third category of presenters was, by far, the largest contingent of open source search adopters, producers of applications that leverage Lucene and Solr (and other open source software) into their offerings. They are solidly entrenched because they are diligent committers, and share in this community of like-minded practitioners who serve as an extended enterprise of technical resources that keeps their overhead low. I can imagine the attractiveness of a lean business that can run with an open source foundation, and operates in a highly agile mode. This must be enticing and exciting for developers who wilt at the idea of working in a constrained environment with layers of management and political maneuvering.

Among the companies building applications on Lucene that presented were: Access Innovations, Twitter, LinkedIn, Acquia, RivetLogic and These stand out as relatively mature adopters with traction in the marketplace. There were also companies present that contribute their value through Lucene/Solr partnerships in which their products or tools are complementary including: Basis Technology, Documill, and Loggly.

Links to presentations by organizations mentioned above will take you to conference highlights. Some will appeal to the technical reader for there was a lot of code sharing and technical tips in the slides. The diversity and scale of applications that are being supported by Lucene and Solr was impressive. Lucid Imagination and the speakers did a great job of illustrating why and how open source has a serious future in enterprise search. This was a confidence building exercise for the community.

Two sentiments at the end summed it up for me. On the technical front Eric Gries observed that it is usually clear what needs to be core (to the code) and what does not belong. Then there is a lot of gray area, and that will contribute to constant debate in the community. For the user community, Charlie Hull, of flax opined that customers don’t care whether (the code) is in the open source core or in the special “secret sauce” application, as long as the product does what they want.

What an Analyst Needs to Do What We Do

Semantic Software Technologies: Landscape of High Value Applications for the Enterprise is now posted for you to download for free; please do so. The topic is one I’ve followed for many years and was convinced that the information about it needed to be captured in a single study as the number of players and technologies had expanded beyond my capacity for mental organization.

As a librarian, it was useful to employ a genre of publications known as “bibliography of bibliographies” on any given topic when starting a research project. As an analyst, gathering the baskets of emails, reports, and publications on the industry I follow, serves a similar purpose. Without a filtering and sifting of all this content, it had become overwhelming to understand and comment on the individual components in the semantic landscape.

Relating to the process of report development, it is important for readers to understand how analysts do research and review products and companies. Our first goal is to avoid bias toward one vendor or another. Finding users of products and understanding the basis for their use and experiences is paramount in the research and discovery process. With software as complex as semantic applications, we do not have the luxury of routine hands-on experience, testing real applications of dozens of products for comparison.

The most desirable contacts for learning about any product are customers with direct experience using the application. Sometimes we gain access to customers through vendor introductions but we also try very hard to get users to speak to us through surveys and interviews, often anonymously so that they do not jeopardize their relationship with a vendor. We want these discussions to be frank.

To get a complete picture of any product, I go through numerous iterations of looking at a company through its own printed and online information, published independent reviews and analysis, customer comments and direct interviews with employees, users, former users, etc. Finally, I like to share what I have learned with vendors themselves to validate conclusions and give them an opportunity to correct facts or clarify product usage and market positioning.

One of the most rewarding, interesting and productive aspects of research in a relatively young industry like semantic technologies is having direct access to innovators and seminal thinkers. Communicating with pioneers of new software who are seeking the best way to package, deploy and commercialize their offerings is exciting. There are many more potential products than those that actually find commercial success, but the process for getting from idea to buyer adoption is always a story worth hearing and from which to learn.

I receive direct and indirect comments from readers about this blog. What I don’t see enough of is posted commentary about the content. Perhaps you don’t want to share your thoughts publicly but any experiences or ideas that you want to share with me are welcomed. You’ll find my direct email contact information through and you can reach me on Twitter at lwmtech. My research depends on getting input from all types of users and developers of content software applications, so, please raise your hand and comment or volunteer to talk.

Repurposing Content vs. Creating Multipurpose Content

In our recently completed research on Smart Content in the Enterprise we explored how organizations are taking advantage of benefits from XML throughout the enterprise and not just in the documentation department. Our findings include several key issues that leading edge XML implementers are addressing including new delivery requirements, new ways of creating and managing content, and the use of standards to create rich, interoperable content. In our case studies we examined how some are breaking out of the documentation department silo and enabling others inside or even outside the organization to contribute and collaborate on content. Some are even using crowd sourcing and social publishing to allow consumers of the information to annotate it and participate in its development. We found that expectations for content creation and management have changed significantly and we need to think about how we organize and manage our data to support these new requirements. One key finding of the research is that organizations are taking a different approach to repurposing their content, a more proactive approach that might better be called “multipurposing”.

In the XML world we have been talking about repurposing content for decades. Repurposing content usually means content that is created for one type of use is reorganized, converted, transformed, etc. for another use. Many organizations have successfully deployed XML systems that optimize delivery in multiple formats using what is often referred to as a Single Source Publishing (SSP) process where a single source of content is created and transformed into all desired deliverable formats (e.g., HTML, PDF, etc.).

Traditional delivery of content in the form of documents, whether in HTML or PDF, can be very limiting to users who want to search across multiple documents, reorganize document content into a form that is useful to the particular task at hand, or share portions with collaborators. As the functionality on Web sites and mobile devices becomes more sophisticated, new ways of delivering content are needed to take advantage of these capabilities. Dynamic assembly of content into custom views can be optimized with delivery of content components instead of whole documents. Powerful search features can be enhanced with metadata and other forms of content enrichment.

SSP and repurposing content traditionally focuses on the content creation, authoring, management and workflow steps up to delivery. In order for organizations to keep up with the potential of delivery systems and the emerging expectations of users, it behooves us to take a broader view of requirements for content systems and the underlying data model. Developers need to expand the scope of activities they evaluate and plan for when designing the system and the underlying data model. They should consider what metadata might improve faceted searching or dynamic assembly. In doing so they can identify the multiple purposes the content is destined for throughout the ecosystem in which it is created, managed and consumed.

Multipurpose content is designed with additional functionality in mind including faceted search, distributed collaboration and annotation, localization and translation, indexing, and even provisioning and other supply chain transactions. In short, multipurposing content focuses on the bigger picture to meet a broader set of business drivers throughout the enterprise, and even beyond to the needs of the information consumers.

It is easy to get carried away with data modeling and an overly complex data model usually requires more development, maintenance, and training than would otherwise be required to meet a set of business needs. You definitely want to avoid using specific processing terminology when naming elements (e.g., specific formatting, element names that describe processing actions instead of defining the role of the content). You can still create data models that address the broader range of activities without using specific commands or actions. Knowing a chunk of text is a “definition” instead of an “error message” is useful and far more easy to reinterpret for other uses than an “h2” element name or an attribute for display=’yes’. Breaking chapters into individual topics eases custom, dynamic assembly. Adding keywords and other enrichment can improve search results and the active management of the content. In short, multipurpose data models can and should be comprehensive and remain device agnostic to meet enterprise requirements for the content.

The difference between repurposing content and multipurpose content is a matter of degree and scope, and requires generic, agnostic components and element names. But most of all, multipurposing requires understanding the requirements of all processes in the desired enterprise environment up front when designing a system to make sure the model is sufficient to deliver designed outcomes and capabilities. Otherwise repurposing content will continue to be done as an afterthought process and possibly limit the usefulness of the content for some applications.

Leveraging Two Decades of Computational Linguistics for Semantic Search

Over the past three months I have had the pleasure of speaking with Kathleen Dahlgren, founder of Cognition, several times. I first learned about Cognition at the Boston Infonortics Search Engines meeting in 2009. That introduction led me to a closer look several months later when researching auto-categorization software. I was impressed with the comprehensive English language semantic net they had doggedly built over a 20+ year period.
A semantic net is a map of language that explicitly defines the many relationships among words and phrases. It might be very simple to illustrate something as fundamental as a small geographical locale and all named entities within it, or as complex as the entire base language of English with every concept mapped to illustrate all the ways that any one term is related to other terms, as illustrated in this tiny subset. Dr. Dahlgren and her team are among the few companies that have created a comprehensive semantic net for English.

In 2003, Dr. Dahlgren established Cognition as a software company to commercialize its semantic net, designing software to apply it to semantic search applications. As the Gilbane Group launched its new research on Semantic Software Technologies, Cognition signed on as a study co-sponsor and we engaged in several discussions with them that rounded out their history in this new marketplace. It was illustrative of pioneering in any new software domain.

Early adopters are key contributors to any software development. It is notable that Cognition has attracted experts in fields as diverse as medical research, legal e-discovery and Web semantic search. This gives the company valuable feedback for their commercial development. In any highly technical discipline, it is challenging and exciting to finding subject experts knowledgeable enough to contribute to product evolution and Cognition is learning from client experts where the best opportunities for growth lie.

Recent interviews with Cognition executives, and those of other sponsors, gave me the opportunity to get their reactions to my conclusions about this industry. These were the more interesting thoughts that came from Cognition after they had reviewed the Gilbane report:

  • Feedback from current clients and attendees at 2010 conferences, where Dr. Dahlgren was a featured speaker, confirms escalating awareness of the field; she feels that “This is the year of Semantics.” It is catching the imagination of IT folks who understand the diverse and important business problems to which semantic technology can be applied.
  • In addition to a significant upswing in semantics applied in life sciences, publishing, law and energy, Cognition sees specific opportunities for growth in risk assessment and risk management. Using semantics to detect signals, content salience, and measures of relevance are critical where the quantity of data and textual content is too voluminous for human filtering. There is not much evidence that financial services, banking and insurance are embracing semantic technologies yet, but it could dramatically improve their business intelligence and Cognition is well positioned to give support to leverage their already tested tools.
  • Enterprise semantic search will begin to overcome the poor reputation that traditional “string search” has suffered. There is growing recognition among IT professionals that in the enterprise 80% of the queries are unique; these cannot be interpreted based on popularity or social commentary. Determining relevance or accuracy of retrieved results depends on the types of software algorithms that apply computational linguistics, not pattern matching or statistical models.

In Dr. Dahlgren’s view, there is no question that a team approach to deploying semantic enterprise search is required. This means that IT professionals will work side-by-side with subject matter experts, search experts and vocabulary specialists to gain the best advantage from semantic search engines.

The unique language aspects of an enterprise content domain are as important as the software a company employs. The Cognition baseline semantic net, out-of-the-box, will always give reliable and better results than traditional string search engines. However, it gives top performance when enhanced with enterprise language, embedding all the ways that subject experts talk about their topical domain, jargon, acronyms, code phrases, etc.

With elements of its software already embedded in some notable commercial applications like Bing, Cognition is positioned for delivering excellent semantic search for an enterprise. They are taking on opportunities in areas like risk management that have been slow to adopt semantic tools. They will deliver software to these customers together with services and expertise to coach their clients through the implementation, deployment and maintenance essential to successful use. The enthusiasm expressed to me by Kathleen Dahlgren about semantics confirms what I also heard from Cognition clients. They are confident that the technology coupled with thoughtful guidance from their support services will be the true value-added for any enterprise semantic search application using Cognition.

The free download of the Gilbane study and deep-dive on Cognition was announced on their Web site at this page.

Semantically Focused and Building on a Successful Customer Base

Dr. Phil Hastings and Dr. David Milward spoke with me in June, 2010, as I was completing the Gilbane report, Semantic Software Technologies: A Landscape of High Value Applications for the Enterprise. My interest in a conversation was stimulated by several months of discussions with customers of numerous semantic software companies. Having heard perspectives from early adopters of Linguamatics’ I2E and other semantic software applications, I wanted to get some comments from two key officers of Linguamatics about what I heard from the field. Dr. Milward is a founder and CTO, and Dr. Hastings is the Director of Business Development.

A company with sustained profitability for nearly ten years in the enterprise semantic market space has credibility. Reactions from a maturing company to what users have to say are interesting and carry weight in any industry. My lines of inquiry and the commentary from the Linguamatics officers centered around their own view of the market and adoption experiences.

When asked about growth potential for the company outside of pharmaceuticals where Linguamatics already has high adoption and very enthusiastic users, Drs. Milward and Hastings asserted their ongoing principal focus in life sciences. They see a lot more potential in this market space, largely because of the vast amounts of unstructured content being generated, coupled with the very high-value problems that can be solved by text mining and semantically analyzing the data from those documents. Expanding their business further in the life sciences means that they will continue engaging in research projects with the academic community. It also means that Linguamatics semantic technology will be helping organizations solve problems related to healthcare and homeland security.

The wisdom of a measured and consistent approach comes through strongly when speaking with Linguamatics executives. They are highly focused and cite the pitfalls of trying to “do everything at once,” which would be the case if they were to pursue all markets overburdened with tons of unstructured content. While pharmaceutical terminology, a critical component of I2E, is complex and extensive, there are many aids to support it. The language of life sciences is in a constant state of being enriched through refinements to published thesauri and ontologies. However, in other industries with less technical language, Linguamatics can still provide important support to analyze content in the detection of signals and patterns of importance to intelligence and planning.

Much of the remainder of the interview centered on what I refer to as the “team competencies” of individuals who identify the need for any semantic software application; those are the people who select, implement and maintain it. When asked if this presents a challenge for Linguamatics or the market in general, Milward and Hastings acknowledged a learning curve and the need for a larger pool of experts for adoption. This is a professional growth opportunity for informatics and library science people. These professionals are often the first group to identify Linguamatics as a potential solutions provider for semantically challenging problems, leading business stakeholders to the company. They are also good advocates for selling the concept to management and explaining the strong benefits of semantic technology when it is applied to elicit value from otherwise under-leveraged content.

One Linguamatics core operating principal came through clearly when talking about the personnel issues of using I2E, which is the necessity of working closely with their customers. This means making sure that expectations about system requirements are correct, examples of deployments and “what the footprint might look like” are given, and best practices for implementations are shared. They want to be sure that their customers have a sense of being in a community of adopters and are not alone in the use of this pioneering technology. Building and sustaining close customer relationships is very important to Linguamatics, and that means an emphasis on services co-equally with selling licenses.

Linguamatics has come a long way since 2001. Besides a steady effort to improve and enhance their technology through regular product releases of I2E, there have been a lot of “show me” and “prove it” moments to which they have responded. Now, as confidence in and understanding of the technology ramps up, they are getting more complex and sophisticated questions from their customers and prospects. This is the exciting part as they are able to sell I2E’s ability to “synthesize new information from millions of sources in ways that humans cannot.” This is done by using the technology to keep track of and processing the voluminous connections among information resources that exceed human mental limits.

At this stage of growth, with early successes and excellent customer adoption, it was encouraging to hear the enthusiasm of two executives for the evolution of the industry and their opportunities in it.

The Gilbane report and a deep dive on Linguamatics are available through this Press Release on their Web site.

Semantic Technology: Sharing a Large Market Space

It is always interesting to talk shop with the experts in a new technology arena. My interview with Luca Scagliarini, VP of Strategy and Business Development for Expert System, and Brooke Aker, CEO of Expert System USA was no exception. They had been digesting my research on Semantic Software Technologies and last week we had a discussion about what is in the Gilbane report.

When asked if they were surprised by anything in my coverage of the market, the simple answer was “not really, nothing we did not already know.” The longer answer related to the presentation of our research illustrating the scope and depth of the marketplace. These two veterans of the semantic industry admitted that the number of players, applications and breadth of semantic software categories is impressive when viewed in one report. Mr. Scagliarini commented on the huge amount of potential still to be explored by vendors and users.

Our conversation then focused on where we think the industry is headed and they emphasized that this is still an early stage and evolving area. Both acknowledged the need for simplification of products to ease their adoption. It must be straightforward for buyers to understand what they are licensing, the value they can expect for the price they pay; implementation, packaging and complementary services need to be easily understood.
Along the lines of simplicity, they emphasized the specialized nature of most of the successful semantic software applications, noting that these are not coming from the largest software companies. State-of-the-art tools are being commercialized and deployed for highly refined applications out of companies with a small footprint of experienced experts.

Expert System knows about the need for expertise in such areas as ontologies, search, and computational linguistic applications. For years they have been cultivating a team of people for their development and support operations. It has not always been easy to find these competencies, especially right out of academia. Aker and Scagliarini pointed out the need for a lot of pragmatism, coupled with subject expertise, to apply semantic tools for optimal business outcomes. It was hard in the early years for them to find people who could leverage their academic research experiences for a corporate mission.

Human resource barriers have eased in recent years as younger people who have grown up with a variety of computing technologies seem to grasp and understand the potential for semantic software tools more quickly.
Expert System itself is gaining traction in large enterprises that have segmented groups within IT that are dedicated to “learning” applications, and formalized ways of experimenting with, testing and evaluating new technologies. When they become experts in tool use, they are much better at proving value and making the right decisions about how and when to apply the software.

Having made good strides in energy, life sciences, manufacturing and homeland security vertical markets, Expert System is expanding its presence with the Cogito product line in other government agencies and publishing. The executives reminded me that they have semantic nets built out in Italian, Arabic and German, as well as English. This is unique among the community of semantic search companies and will position them for some interesting opportunities where other companies cannot perform.

I enjoyed listening and exchanging commentary about the semantic software technology field. However, Expert System and Gilbane both know that the semantic space is complex and they are sharing a varied landscape with a lot of companies competing for a strong position in a young industry. They have a significant share already.
For more about Expert System and the release of this sponsored research you can view their recent Press Release.

Data Mining for Energy Independence

Mining content for facts and information relationships is a focal point of many semantic technologies. Among the text analytics tools are those for mining content in order to process it for further analysis and understanding, and indexing for semantic search. This will move enterprise search to a new level of research possibilities.

Research for a forthcoming Gilbane report on semantic software technologies turned up numerous applications used in the life sciences and publishing. Neither semantic technologies nor text mining are mentioned in this recent article Rare Sharing of Data Leads to Progress on Alzheimer’s in the New York Times but I am pretty certain that these technologies had some role in enabling scientists to discover new data relationships and synthesize new ideas about Alzheimer’s biomarkers. The sheer volume of data from all the referenced data sources demands computational methods to distill and analyze.

One vertical industry poised for potential growth of semantic technologies is the energy field. It is a special interest of mine because it is a topical area in which I worked as a subject indexer and searcher early in my career. Beginning with the 1st energy crisis, oil embargo of the mid-1970s, I worked in research organizations that involved both fossil fuel exploration and production, and alternative energy development.

A hallmark of technical exploratory and discovery work is the time gaps between breakthroughs; there are often significant plateaus between major developments. This happens if research reaches a point that an enabling technology is not available or commercially viable to move to the next milestone of development. I observed that the starting point in the quest for innovative energy technologies often began with decades-old research that stopped before commercialization.

Building on what we have already discovered, invented or learned is one key to success for many “new” breakthroughs. Looking at old research from a new perspective to lower costs or improve efficiency for such things as photovoltaic materials or electrochemical cells (batteries) is what excellent companies do.
How does this relate to semantic software technologies and data mining? We need to begin with content that was generated by research in the last century; much of this is just now being made electronic. Even so, most of the conversion from paper, or micro formats like fîche, is to image formats. In order to make the full transition to enable data mining, content must be further enhanced through optical character recognition (OCR). This will put it into a form that can be semantically parsed, analyzed and explored for facts and new relationships among data elements.

Processing of old materials is neither easy nor inexpensive. There are government agencies, consortia, associations, and partnerships of various types of institutions that often serve as a springboard for making legacy knowledge assets electronically available. A great first step would be having DOE and some energy industry leaders collaborating on this activity.

A future of potential man-made disasters, even when knowledge exists to prevent them, is not a foregone conclusion. Intellectually, we know that energy independence is prudent, economically and socially mandatory for all types of stability. We have decades of information and knowledge assets in energy related fields (e.g. chemistry, materials science, geology, and engineering) that semantic technologies can leverage to move us toward a future of energy independence. Finding nuggets of old information in unexpected relationships to content from previously disconnected sources is a role for semantic search that can stimulate new ideas and technical research.

A beginning is a serious program of content conversion capped off with use of semantic search tools to aid the process of discovery and development. It is high time to put our knowledge to work with state-of-the-art semantic software tools and by committing human and collaborative resources to the effort. Coupling our knowledge assets of the past with the ingenuity of the present we can achieve energy advances using semantic technologies already embraced by the life sciences.

Older posts

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑