Tag: Text mining

Collaboration, Convergence and Adoption

Here we are, half way through 2011, and on track for a banner year in the adoption of enterprise search, text mining/text analytics, and their integration with collaborative content platforms. You might ask for evidence; what I can offer is anecdotal observations. Others track industry growth in terms of dollars spent but that makes me leery when, over the past half dozen years, there has been so much disappointment expressed with the failures of legacy software applications to deliver satisfactory results. My antenna tells me we are on the cusp of expectations beginning to match reality as enterprises are finding better ways to select, procure, implement, and deploy applications that meet business needs.

What follows are my happy observations, after attending the 2011 Enterprise Search Summit in New York and 2011 Text Analytics Summit in Boston. Other inputs for me continue to be a varied reading list of information industry publications, business news, vendor press releases and web presentations, and blogs, plus conversations with clients and software vendors. While this blog is normally focused on enterprise search, experiencing and following content management technologies, and system integration tools contribute valuable insights into all applications that contribute to search successes and frustrations.

Collaboration tools and platforms gained early traction in the 1990s as technology offerings to the knowledge management crowd. The idea was that teams and workgroups needed ways to share knowledge through contribution of work products (documents) to “places” for all to view. Document management systems inserted themselves into the landscape for managing the development of work products (creating, editing, collaborative editing, etc.). However, collaboration spaces and document editing and version control activities remained applications more apart than synchronized.

The collaboration space has been redefined largely because SharePoint now dominates current discussions about collaboration platforms and activities. While early collaboration platforms were carefully structured to provide a thoughtfully bounded environment for sharing content, their lack of provision for idiosyncratic and often necessary workflows probably limited market dominance.

SharePoint changed the conversation to one of build-it-to-do-anything-you-want-the way-you-want (BITDAYWTWYW). What IT clearly wants is single vendor architecture that delivers content creation, management, collaboration, and search. What end-users want is workflow efficiency and reliable search results. This introduces another level of collaborative imperative, since the BITDAYWTWYW model requires expertise that few enterprise IT support people carry and fewer end-users would trust to their IT departments. So, third-party developers or software offerings become the collaborative option. SharePoint is not the only collaboration software but, because of its dominance, a large second tier of partner vendors is turning SharePoint adopters on to its potential. Collaboration of this type in the marketplace is ramping wildly.

Convergence of technologies and companies is on the rise, as well. The non-Microsoft platform companies, OpenText, Oracle, and IBM are placing their strategies on tightly integrating their solid cache of acquired mature products. These acquisitions have plugged gaps in text mining, analytics, and vocabulary management areas. Google and Autonomy are also entering this territory although they are still short on the maturity model. The convergence of document management, electronic content management, text and data mining, analytics, e-discovery, a variety of semantic tools, and search technologies are shoring up the “big-platform” vendors to deal with “big-data.”

Sitting on the periphery is the open source movement. It is finding ways to alternatively collaborate with the dominant commercial players, disrupt select application niches (e. g. WCM ), and contribute solutions where neither the SharePoint model nor the big platform, tightly integrated models can win easy adoption. Lucene/Solr is finding acceptance in the government and non-profit sectors but also appeal to SMBs.

All of these factors were actively on display at the two meetings but the most encouraging outcomes that I observed were:

  • Rise in attendance at both meetings
  • More knowledgeable and experienced attendees
  • Significant increase in end-user presentations

The latter brings me back to the adoption issue. Enterprises, which previously sent people to learn about technologies and products to earlier meetings, are now in the implementation and deployment stages. Thus, they are now able to contribute presentations with real experience and commentary about products. Presenters are commenting on adoption issues, usability, governance, successful practices and pitfalls or unresolved issues.

Adoption is what will drive product improvements in the marketplace because experienced adopters are speaking out on their activities. Public presentations of user experiences can and should establish expectations for better tools, better vendor relationship experiences, more collaboration among products and ultimately, reduced complexity in the implementation and deployment of products.

Classifying Searchers – What Really Counts?

I continue to be impressed by the new ways in which enterprise search companies differentiate and package their software for specialized uses. This is a good thing because it underscores their understanding of different search audiences. Just as important is recognition that search happens in a context, for example:

  • Personal interest (enlightenment or entertainment)
  • Product selection (evaluations by independent analysts vs. direct purchasing information)
  • Work enhancement (finding data or learning a new system, process or product)
  • High-level professional activities (e-discovery to strategic planning)

Vendors understand that there is a limited market for a product or suite of products that will satisfy every budget, search context and the enterprise’s hierarchy of search requirements. Those who are the best focus on the technological strengths of their search tools to deliver products packaged for a niche in which they can excel.

However, for any market niche excellence begins with six basics:

  • Customer relationship cultivation, including good listening
  • Professional customer support and services
  • Ease of system installation, implementation, tuning and administration
  • Out-of-the box integration with complementary technologies that will improve search
  • Simple pricing for licensing and support packages
  • Ease of doing business, contracting and licensing, deliveries and upgrades

While any mature and worthy company will have continually improved on these attributes, there are contextual differentiators that you should seek in your vertical market:

  • Vendor subject matter expertise
  • Vendor industry expertise
  • Vendor knowledge of how professional specialists perform their work functions
  • Vendor understanding of retrieval and content types that contribute the highest value

At a recent client discussion the application of a highly specialized taxonomy was the topic. Their target content will be made available on a public facing web site and also to internal staff. We began by discussing the various categories of terminology already extracted from a pre-existing system.

As we differentiated how internal staff needed to access content for research purposes and how the public is expected to search, patterns emerged for how differently content needs to be packaged for each constituency. For you who have specialized collections to be used by highly diverse audiences, this is no surprise. Before proceeding with decisions about term curation and determining the granularity of their metadata vocabulary, what has become a high priority is how the search mechanisms will work for different audiences.

For this institution, internal users must have pinpoint precision in retrieval on multiple facets of content to get to exactly the right record. They will be coming to search with knowledge of the collection and more certainty about what they can expect to find. They will also want to find their target(s) quickly. On the other hand, the public facing audience needs to be guided in a way that leads them on a path of discovery, navigating through a map of terms that takes them from their “key term” query through related possibilities without demanding arcane Boolean operations or lengthy explanations for advanced searching.

There is a clear lesson here for seeking enterprise search solutions. Systems that favor one audience over another will always be problematic. Therefore, establishing who needs what and how each goes about searching needs to be answered, and then matched to the product that can provide for all target groups.

We are in the season for conferences; there are a few next month that will be featuring various search and content technologies. After many years of walking exhibit halls and formulating strategies for systematic research and avoiding a swamp of technology overload, I try now to have specific questions formulated that will discover the “must have” functions and features for any particular client requirement. If you do the same, describing a search user scenario to each candidate vendor, you can then proceed to ask: Is this a search problem your product will handle? What other technologies (e.g. CMS, vocabulary management) need to be in place to ensure quality search results? Can you demonstrate something similar? What would you estimate the implementation schedule to look like? What integration services are recommended?

These are starting points for a discussion and will enable you to begin to know whether this vendor meets the fundamental criteria laid out earlier in this post. It will also give you a sense of whether the vendor views all searchers and their searches as generic equivalents or knows that different functions and features are needed for special groups.

Look for vendors for enterprise search and search related technologies to interview at the following upcoming meetings:

Enterprise Search Summit, New York, May 10 – 11 […where you will learn strategies and build the skill sets you need to make your organization’s content not only searchable but “findable” and actionable so that it delivers value to the bottom line.] This is the largest seasonal conference dedicated to enterprise search. The sessions are preceded by separate workshops with in-depth tutorials related to search. During the conference, focus on case studies of enterprises similar to yours for better understanding of issues, which you may need to address.

Text Analytics Summit, Boston, May 18 – 19 I spoke with Seth Grimes, who kicks off the meeting with a keynote, asking whether he sees a change in emphasis this year from straight text mining and text analytics. You’ll have to attend to get his full speech but Seth shared that he see a newfound recognition that “Big Data” is coming to grips with text source information as an asset that has special requirements (and value). He also noted that unstructured document complexities can benefit from text analytics to create semantic understanding that improves search, and that text analytics products are rising to challenge for providing dynamic semantic analysis, particularly around massive amounts of social textual content.

Lucene Revolution, San Francisco, May 23 – 24 […hear from … the foremost experts on open source search technology to a broad cross-section of users that have implemented Lucene, Solr, or LucidWorks Enterprise to improve search application performance, scalability, flexibility, and relevance, while lowering their costs.] I attended this new meeting last year when it was in Boston. For any enterprise considering or leaning toward implementing open source search, particularly Lucene or Solr, this meeting will set you on a path for understanding what that journey entails.

Semantically Focused and Building on a Successful Customer Base

Dr. Phil Hastings and Dr. David Milward spoke with me in June, 2010, as I was completing the Gilbane report, Semantic Software Technologies: A Landscape of High Value Applications for the Enterprise. My interest in a conversation was stimulated by several months of discussions with customers of numerous semantic software companies. Having heard perspectives from early adopters of Linguamatics’ I2E and other semantic software applications, I wanted to get some comments from two key officers of Linguamatics about what I heard from the field. Dr. Milward is a founder and CTO, and Dr. Hastings is the Director of Business Development.

A company with sustained profitability for nearly ten years in the enterprise semantic market space has credibility. Reactions from a maturing company to what users have to say are interesting and carry weight in any industry. My lines of inquiry and the commentary from the Linguamatics officers centered around their own view of the market and adoption experiences.

When asked about growth potential for the company outside of pharmaceuticals where Linguamatics already has high adoption and very enthusiastic users, Drs. Milward and Hastings asserted their ongoing principal focus in life sciences. They see a lot more potential in this market space, largely because of the vast amounts of unstructured content being generated, coupled with the very high-value problems that can be solved by text mining and semantically analyzing the data from those documents. Expanding their business further in the life sciences means that they will continue engaging in research projects with the academic community. It also means that Linguamatics semantic technology will be helping organizations solve problems related to healthcare and homeland security.

The wisdom of a measured and consistent approach comes through strongly when speaking with Linguamatics executives. They are highly focused and cite the pitfalls of trying to “do everything at once,” which would be the case if they were to pursue all markets overburdened with tons of unstructured content. While pharmaceutical terminology, a critical component of I2E, is complex and extensive, there are many aids to support it. The language of life sciences is in a constant state of being enriched through refinements to published thesauri and ontologies. However, in other industries with less technical language, Linguamatics can still provide important support to analyze content in the detection of signals and patterns of importance to intelligence and planning.

Much of the remainder of the interview centered on what I refer to as the “team competencies” of individuals who identify the need for any semantic software application; those are the people who select, implement and maintain it. When asked if this presents a challenge for Linguamatics or the market in general, Milward and Hastings acknowledged a learning curve and the need for a larger pool of experts for adoption. This is a professional growth opportunity for informatics and library science people. These professionals are often the first group to identify Linguamatics as a potential solutions provider for semantically challenging problems, leading business stakeholders to the company. They are also good advocates for selling the concept to management and explaining the strong benefits of semantic technology when it is applied to elicit value from otherwise under-leveraged content.

One Linguamatics core operating principal came through clearly when talking about the personnel issues of using I2E, which is the necessity of working closely with their customers. This means making sure that expectations about system requirements are correct, examples of deployments and “what the footprint might look like” are given, and best practices for implementations are shared. They want to be sure that their customers have a sense of being in a community of adopters and are not alone in the use of this pioneering technology. Building and sustaining close customer relationships is very important to Linguamatics, and that means an emphasis on services co-equally with selling licenses.

Linguamatics has come a long way since 2001. Besides a steady effort to improve and enhance their technology through regular product releases of I2E, there have been a lot of “show me” and “prove it” moments to which they have responded. Now, as confidence in and understanding of the technology ramps up, they are getting more complex and sophisticated questions from their customers and prospects. This is the exciting part as they are able to sell I2E’s ability to “synthesize new information from millions of sources in ways that humans cannot.” This is done by using the technology to keep track of and processing the voluminous connections among information resources that exceed human mental limits.

At this stage of growth, with early successes and excellent customer adoption, it was encouraging to hear the enthusiasm of two executives for the evolution of the industry and their opportunities in it.

The Gilbane report and a deep dive on Linguamatics are available through this Press Release on their Web site.

Turbo Search Engines in Cars; it is not the whole solution.

In my quest to analyze the search tools that are available to the enterprise, I spend a lot of time searching. These searches use conventional on-line search tools, and my own database of citations that link to articles, long forgotten. But true insights about products and markets usually come through the old-fashioned route, the serendipity of routine life. For me search also includes the ordinary things I do everyday:
> Looking up a fact (e.g. phone number, someone’s birthday, woodchuck deterrents), which I may find in an electronic file or hardcopy
> Retrieving a specific document (e.g. an expense form, policy statement, or ISO standard), which may be on-line or in my file cabinet
> Finding evidence (e.g. examining search logs to understand how people are using a search engine, looking for a woodchuck hole near my garden, examining my tires for uneven tread wear), which requires viewing electronic files or my physical environment
> Discovering who the experts are on a topic or what expertise my associates have (e.g. looking up topics to see who has written or spoken, reading resumes or biographies to uncover experience), which is more often done on-line but may be buried in a 20-year old professional directory on the shelf
> Learning about a subject I want or need to understand (e.g. How are search and text analytics being used together in business enterprises? what is the meaning of the tag line “Turbo Search Engine” on an Acura ad?), which were partially answered with online search but also by attending conferences like the Text Analytics Summit 2007 this week
This list illustrates several things. First search is about finding facts, evidence, aggregated information (documents). It is also about discovering, learning and uncovering information that we can then analyze for any number of decisions or potential actions.
Second, search enables us to function more efficiently in all of our worldly activities, execute our jobs, increase our own expertise and generally feed our brains.
Third, search does not require the use of electronic technology, nor sophisticated tools, just our amazing senses: sight, hearing, touch, smell and taste.
Fourth, that what Google now defines as “cloud computing” and MIT geeks began touting as “wearable” technology a few years ago have converged to bring us cars embedded with what Acura defines as “turbo search engines.” On this fourth point, I needed to discover the point. In small print on the full page ad in Newsweek were phrases like “linked to over 7,000,000 destinations” and “knows where traffic is.” In even tinier print was the statement, “real-time traffic monitoring available in select markets…” I thought I understood that they were promoting the pervasiveness of search potential through the car’s extensive technological features. Then I searched the Internet for the phrase “turbo search engine” coupled with “Acura” only to learn that there was more to it. Notably, there is the “…image-tagging campaign that enables the targeted audience to use their fully-integrated mobile devices to be part of the promotion.” You can read the context yourself.
Well, I am still trying to get my head around this fourth point to understand how important it is to helping companies find solid, practical search solutions to problems they face in business enterprises. I don’t believe that a parking lot full of Acura’s is something I will recommend.
Fifth, I experienced some additional thoughts about the place for search technology this week. Technology experts like Sue Feldman of IDC and Fern Halper of Hurwitz & Associates appeared on a panel at the Text Analytics Summit. While making clear the distinctions between search and text analytics, and text analytics and text mining, Sue also made clear that algorithmic techniques employed by the various tools being demonstrated are distinct for each solving different problems in different business situations. She and others acknowledge that finally, having embraced search, enterprises are now adopting significant applications using text analytic techniques to make better sense of all the found content.
Integration was a recurring theme at the conference, even as it was also obvious that no one product embodies the full range of text search, mining and analytics that any one enterprise might need. When tools and technologies are procured in silos, good integration is a tough proposition, and a costly one. Tacking on one product after another and trying to retrofit to provide a seamless continuum from capturing, storing, and organizing content to retrieving and analyzing the text in it, takes forethought and intelligent human design. Even if you can’t procure the whole solution to all your problems at once, and who can, you do need a vision of where you are going to end up so that each deployment is a building block to the whole architecture.
There is a lot to discover at conferences that can’t be learned through search, like what you absorb in a random mix of presentations, discussions and demos that can lead to new insights or just a confirmation of the optimal path to a cohesive plan.

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑