Tag: Text analytics

Gilbane Conference workshops

In case you missed it last week while on vacation the Gilbane Conference workshop schedule and descriptions were posted. The half-day workshops tale place at the Intercontinental Boston Waterfront Hotel on Tuesday, November 27, 9:00 am to 4:00 pm:

Save the date and check http://gilbaneboston.com for further information about the main conference schedule & conference program as they become available.

Collaboration, Convergence and Adoption

Here we are, half way through 2011, and on track for a banner year in the adoption of enterprise search, text mining/text analytics, and their integration with collaborative content platforms. You might ask for evidence; what I can offer is anecdotal observations. Others track industry growth in terms of dollars spent but that makes me leery when, over the past half dozen years, there has been so much disappointment expressed with the failures of legacy software applications to deliver satisfactory results. My antenna tells me we are on the cusp of expectations beginning to match reality as enterprises are finding better ways to select, procure, implement, and deploy applications that meet business needs.

What follows are my happy observations, after attending the 2011 Enterprise Search Summit in New York and 2011 Text Analytics Summit in Boston. Other inputs for me continue to be a varied reading list of information industry publications, business news, vendor press releases and web presentations, and blogs, plus conversations with clients and software vendors. While this blog is normally focused on enterprise search, experiencing and following content management technologies, and system integration tools contribute valuable insights into all applications that contribute to search successes and frustrations.

Collaboration tools and platforms gained early traction in the 1990s as technology offerings to the knowledge management crowd. The idea was that teams and workgroups needed ways to share knowledge through contribution of work products (documents) to “places” for all to view. Document management systems inserted themselves into the landscape for managing the development of work products (creating, editing, collaborative editing, etc.). However, collaboration spaces and document editing and version control activities remained applications more apart than synchronized.

The collaboration space has been redefined largely because SharePoint now dominates current discussions about collaboration platforms and activities. While early collaboration platforms were carefully structured to provide a thoughtfully bounded environment for sharing content, their lack of provision for idiosyncratic and often necessary workflows probably limited market dominance.

SharePoint changed the conversation to one of build-it-to-do-anything-you-want-the way-you-want (BITDAYWTWYW). What IT clearly wants is single vendor architecture that delivers content creation, management, collaboration, and search. What end-users want is workflow efficiency and reliable search results. This introduces another level of collaborative imperative, since the BITDAYWTWYW model requires expertise that few enterprise IT support people carry and fewer end-users would trust to their IT departments. So, third-party developers or software offerings become the collaborative option. SharePoint is not the only collaboration software but, because of its dominance, a large second tier of partner vendors is turning SharePoint adopters on to its potential. Collaboration of this type in the marketplace is ramping wildly.

Convergence of technologies and companies is on the rise, as well. The non-Microsoft platform companies, OpenText, Oracle, and IBM are placing their strategies on tightly integrating their solid cache of acquired mature products. These acquisitions have plugged gaps in text mining, analytics, and vocabulary management areas. Google and Autonomy are also entering this territory although they are still short on the maturity model. The convergence of document management, electronic content management, text and data mining, analytics, e-discovery, a variety of semantic tools, and search technologies are shoring up the “big-platform” vendors to deal with “big-data.”

Sitting on the periphery is the open source movement. It is finding ways to alternatively collaborate with the dominant commercial players, disrupt select application niches (e. g. WCM ), and contribute solutions where neither the SharePoint model nor the big platform, tightly integrated models can win easy adoption. Lucene/Solr is finding acceptance in the government and non-profit sectors but also appeal to SMBs.

All of these factors were actively on display at the two meetings but the most encouraging outcomes that I observed were:

  • Rise in attendance at both meetings
  • More knowledgeable and experienced attendees
  • Significant increase in end-user presentations

The latter brings me back to the adoption issue. Enterprises, which previously sent people to learn about technologies and products to earlier meetings, are now in the implementation and deployment stages. Thus, they are now able to contribute presentations with real experience and commentary about products. Presenters are commenting on adoption issues, usability, governance, successful practices and pitfalls or unresolved issues.

Adoption is what will drive product improvements in the marketplace because experienced adopters are speaking out on their activities. Public presentations of user experiences can and should establish expectations for better tools, better vendor relationship experiences, more collaboration among products and ultimately, reduced complexity in the implementation and deployment of products.

Classifying Searchers – What Really Counts?

I continue to be impressed by the new ways in which enterprise search companies differentiate and package their software for specialized uses. This is a good thing because it underscores their understanding of different search audiences. Just as important is recognition that search happens in a context, for example:

  • Personal interest (enlightenment or entertainment)
  • Product selection (evaluations by independent analysts vs. direct purchasing information)
  • Work enhancement (finding data or learning a new system, process or product)
  • High-level professional activities (e-discovery to strategic planning)

Vendors understand that there is a limited market for a product or suite of products that will satisfy every budget, search context and the enterprise’s hierarchy of search requirements. Those who are the best focus on the technological strengths of their search tools to deliver products packaged for a niche in which they can excel.

However, for any market niche excellence begins with six basics:

  • Customer relationship cultivation, including good listening
  • Professional customer support and services
  • Ease of system installation, implementation, tuning and administration
  • Out-of-the box integration with complementary technologies that will improve search
  • Simple pricing for licensing and support packages
  • Ease of doing business, contracting and licensing, deliveries and upgrades

While any mature and worthy company will have continually improved on these attributes, there are contextual differentiators that you should seek in your vertical market:

  • Vendor subject matter expertise
  • Vendor industry expertise
  • Vendor knowledge of how professional specialists perform their work functions
  • Vendor understanding of retrieval and content types that contribute the highest value

At a recent client discussion the application of a highly specialized taxonomy was the topic. Their target content will be made available on a public facing web site and also to internal staff. We began by discussing the various categories of terminology already extracted from a pre-existing system.

As we differentiated how internal staff needed to access content for research purposes and how the public is expected to search, patterns emerged for how differently content needs to be packaged for each constituency. For you who have specialized collections to be used by highly diverse audiences, this is no surprise. Before proceeding with decisions about term curation and determining the granularity of their metadata vocabulary, what has become a high priority is how the search mechanisms will work for different audiences.

For this institution, internal users must have pinpoint precision in retrieval on multiple facets of content to get to exactly the right record. They will be coming to search with knowledge of the collection and more certainty about what they can expect to find. They will also want to find their target(s) quickly. On the other hand, the public facing audience needs to be guided in a way that leads them on a path of discovery, navigating through a map of terms that takes them from their “key term” query through related possibilities without demanding arcane Boolean operations or lengthy explanations for advanced searching.

There is a clear lesson here for seeking enterprise search solutions. Systems that favor one audience over another will always be problematic. Therefore, establishing who needs what and how each goes about searching needs to be answered, and then matched to the product that can provide for all target groups.

We are in the season for conferences; there are a few next month that will be featuring various search and content technologies. After many years of walking exhibit halls and formulating strategies for systematic research and avoiding a swamp of technology overload, I try now to have specific questions formulated that will discover the “must have” functions and features for any particular client requirement. If you do the same, describing a search user scenario to each candidate vendor, you can then proceed to ask: Is this a search problem your product will handle? What other technologies (e.g. CMS, vocabulary management) need to be in place to ensure quality search results? Can you demonstrate something similar? What would you estimate the implementation schedule to look like? What integration services are recommended?

These are starting points for a discussion and will enable you to begin to know whether this vendor meets the fundamental criteria laid out earlier in this post. It will also give you a sense of whether the vendor views all searchers and their searches as generic equivalents or knows that different functions and features are needed for special groups.

Look for vendors for enterprise search and search related technologies to interview at the following upcoming meetings:

Enterprise Search Summit, New York, May 10 – 11 […where you will learn strategies and build the skill sets you need to make your organization’s content not only searchable but “findable” and actionable so that it delivers value to the bottom line.] This is the largest seasonal conference dedicated to enterprise search. The sessions are preceded by separate workshops with in-depth tutorials related to search. During the conference, focus on case studies of enterprises similar to yours for better understanding of issues, which you may need to address.

Text Analytics Summit, Boston, May 18 – 19 I spoke with Seth Grimes, who kicks off the meeting with a keynote, asking whether he sees a change in emphasis this year from straight text mining and text analytics. You’ll have to attend to get his full speech but Seth shared that he see a newfound recognition that “Big Data” is coming to grips with text source information as an asset that has special requirements (and value). He also noted that unstructured document complexities can benefit from text analytics to create semantic understanding that improves search, and that text analytics products are rising to challenge for providing dynamic semantic analysis, particularly around massive amounts of social textual content.

Lucene Revolution, San Francisco, May 23 – 24 […hear from … the foremost experts on open source search technology to a broad cross-section of users that have implemented Lucene, Solr, or LucidWorks Enterprise to improve search application performance, scalability, flexibility, and relevance, while lowering their costs.] I attended this new meeting last year when it was in Boston. For any enterprise considering or leaning toward implementing open source search, particularly Lucene or Solr, this meeting will set you on a path for understanding what that journey entails.

Data Mining for Energy Independence

Mining content for facts and information relationships is a focal point of many semantic technologies. Among the text analytics tools are those for mining content in order to process it for further analysis and understanding, and indexing for semantic search. This will move enterprise search to a new level of research possibilities.

Research for a forthcoming Gilbane report on semantic software technologies turned up numerous applications used in the life sciences and publishing. Neither semantic technologies nor text mining are mentioned in this recent article Rare Sharing of Data Leads to Progress on Alzheimer’s in the New York Times but I am pretty certain that these technologies had some role in enabling scientists to discover new data relationships and synthesize new ideas about Alzheimer’s biomarkers. The sheer volume of data from all the referenced data sources demands computational methods to distill and analyze.

One vertical industry poised for potential growth of semantic technologies is the energy field. It is a special interest of mine because it is a topical area in which I worked as a subject indexer and searcher early in my career. Beginning with the 1st energy crisis, oil embargo of the mid-1970s, I worked in research organizations that involved both fossil fuel exploration and production, and alternative energy development.

A hallmark of technical exploratory and discovery work is the time gaps between breakthroughs; there are often significant plateaus between major developments. This happens if research reaches a point that an enabling technology is not available or commercially viable to move to the next milestone of development. I observed that the starting point in the quest for innovative energy technologies often began with decades-old research that stopped before commercialization.

Building on what we have already discovered, invented or learned is one key to success for many “new” breakthroughs. Looking at old research from a new perspective to lower costs or improve efficiency for such things as photovoltaic materials or electrochemical cells (batteries) is what excellent companies do.
How does this relate to semantic software technologies and data mining? We need to begin with content that was generated by research in the last century; much of this is just now being made electronic. Even so, most of the conversion from paper, or micro formats like fîche, is to image formats. In order to make the full transition to enable data mining, content must be further enhanced through optical character recognition (OCR). This will put it into a form that can be semantically parsed, analyzed and explored for facts and new relationships among data elements.

Processing of old materials is neither easy nor inexpensive. There are government agencies, consortia, associations, and partnerships of various types of institutions that often serve as a springboard for making legacy knowledge assets electronically available. A great first step would be having DOE and some energy industry leaders collaborating on this activity.

A future of potential man-made disasters, even when knowledge exists to prevent them, is not a foregone conclusion. Intellectually, we know that energy independence is prudent, economically and socially mandatory for all types of stability. We have decades of information and knowledge assets in energy related fields (e.g. chemistry, materials science, geology, and engineering) that semantic technologies can leverage to move us toward a future of energy independence. Finding nuggets of old information in unexpected relationships to content from previously disconnected sources is a role for semantic search that can stimulate new ideas and technical research.

A beginning is a serious program of content conversion capped off with use of semantic search tools to aid the process of discovery and development. It is high time to put our knowledge to work with state-of-the-art semantic software tools and by committing human and collaborative resources to the effort. Coupling our knowledge assets of the past with the ingenuity of the present we can achieve energy advances using semantic technologies already embraced by the life sciences.

Convergence of Enterprise Search and Text Analytics is Not New

Prompted by the news item about IBM’s bid for SPSS and similar acquisitions by Oracle, SAP and Microsoft made me think about the predictions of more business intelligence (BI) capabilities being conjoined with enterprise search. But why now and what is new about pairing search and BI? They have always been complementary, not only for numeric applications but also for text analysis. Another article by John Harney in KMWorld referred to the “relatively new technology of text analytics” for analyzing unstructured text. The article is a good summary of some newer tools but the technology itself has had a long shelf life, too long for reasons which I’ll explore later.

Like other topics in this blog this one requires a readjustment in thinking by technology users. One of the great things about digitizing text was the promise of ways in which it could be parsed, sorted and analyzed. With heavy adoption of databases that specialized in textual, as well as numeric and date data fields for business applications in the 1960s and 70s, it became much easier for non-technical workers to look at all kinds of data in new ways. Early database applications leveraged their data stores using command languages; the better ones featured statistical analysis and publication quality report builders. Three that I was familiar with were DRS from ADM, Inc., BASIS from Battelle Columbus Labs and INQUIRE from IBM.

Tools that accompanied database back-ends had the ability to extract, slice and dice the database content, including very large text fields to report: word counts, phrase counts (breaking on any delimiter), transaction counts, relationships among data elements across associated record types, ability to create relationships on the fly, report expert activity and working documents, and describe distribution of resources. These are just a few examples of how new content assets could be created for export in minutes. In particular, a sort command with DRS had histogram controls that were invaluable to my clients managing corporate document and records collections, news clippings files, photographs, patents, etc. They could evaluate their collections by topic, date ranges, distribution, source, and so on, at any time.

So, there existed years ago the ability to connect data structures and use a command language to formulate new data models that informed and elucidated how information was being used in the organization, or to illustrate where there were holes in topics related to business initiatives. What were the barriers to wide-spread adoption? Upon reflection, I came to realize that extracting meaningful content from database in new and innovative formats requires a level of abstract thinking for which most employees are not well-trained. Putting descriptive data into a database via a screen form, then performing a transaction on the object of that data on another form, and then adding more data about another similar but different object are isolated in the database user’s experience and memory. The typical user is not trained to think about how the pieces of data might be connected in the database and therefore is not likely to form new ideas of how it can all be extracted in a report with new information about the content. There is a level of abstraction that eludes most workers whose jobs consist of a lot of compartmentalized tasks.

It was exciting to encounter prospects that really grasped the power of these tools and were excited to push the limits of the command language and reporting applications, but they were scarce. It turned out that our greatest use came in applying text analytics to the extraction of valuable information from our customer support database. A rigorously disciplined staff populated it after every support call with not only demographic information about the nature of the call, linked to a customer record that had been created back at the first contact during the sales process (with appropriate updates along the way in the procurement process) but also a textual description of the entire transaction. Over time this database was linked to a “wish list” database and another “fixes” database and the entire networked structure provided extremely valuable reports that guided both development work and documentation production. We also issued weekly summary reports to the entire staff so everyone was kept informed about product conditions and customer relationships. The reporting tools provided transparency to all staff about company activity and enabled an early version of “social search collaboration.”

Current text analytics products have significantly more algorithmic horsepower than the old command languages. But making the most of their potential and transforming them into utilities that any knowledge worker can leverage will remain a challenge for vendors in the face of poor abstract reasoning among much of the work force. The tools have improved but maybe not in all the ways they need to for widespread adoption. Workers should not have to be dependent on IT folks to create that unique analysis report that reveals a pattern or uncovers product flaws described by multiple customers. We expect workers to multitask, have many aptitudes and skills, and be self-servicing in so many aspects of their work, but for them to flourish the tools fall short too often. I’m putting in a big plug for text analytics for the masses, soon, so that enterprise search begins to deliver more than personalized lists of results for one person at a time. Give more reporting power to the user.

What’s in a Name: Information Access Software vs. Search?

This one almost slipped right past me but I see we are in another shoot-out in the naming of search market segments. Probably it is because we have too many offerings in the search industry. When any industry reaches a critical mass, players need to find a way to differentiate what they sell. Products have to be positioned as, well, “something else.”
In my consulting practice “knowledge management” has been hot (1980s and 90s), dead (late ’90s and early 2000s), relevant again (now). In my analyst role for “enterprise search” Gilbane has been told by experts that the term is meaningless and should be replaced with “behind the firewall search,” as if that clarifies everything. Of course, marketing directories might struggle with that as a category heading.
For the record, “search” has two definitions in my book. The first is a verb referring to the activity of looking for anything. The second, newer, definition is a noun referring to technologies that support finding “content.” Both are sufficiently broad to cover a lot of activities, technologies and stuff. “Enterprises” are organizations of any type in which business, for-profit, non-for-profit, or government, is being conducted. Let us quibble no more.
But I digress; Endeca has broadened its self-classification in any number of press releases to referring to its products that were “search” products last year, as “information access software.” This is the major category used by IDC to include “search.” That’s what we called library systems in the 1970s and 80s. New products still aim for accessing content, albeit with richer functions and features but where are we going to put them in our family of software lists? One could argue that Endeca’s products are really a class of “search,” search on steroids, a specialized form of search. What are the defining differentiators between “search software” and “information access software?” When does a search product become more than it was or narrower, refined in scope? (This is a rhetorical question but I’m sure each vendor in this new category will break-it out for me in their own terms.)
Having just finished reviewing the market for enterprise search, I believe that many of the products are reaching for the broader scope of functionality defined by IDC as being: search and retrieval, text analytics, and BI. But are they really going to claim to be content management and data warehousing software, as well? Those are included in IDC’s definition of “information access software.” May-be we are going back to single-vendor platforms with everything bundled and integrated. Sigh… it makes me tired, trying to keep up with all this categorizing and re-redefining.

Turbo Search Engines in Cars; it is not the whole solution.

In my quest to analyze the search tools that are available to the enterprise, I spend a lot of time searching. These searches use conventional on-line search tools, and my own database of citations that link to articles, long forgotten. But true insights about products and markets usually come through the old-fashioned route, the serendipity of routine life. For me search also includes the ordinary things I do everyday:
> Looking up a fact (e.g. phone number, someone’s birthday, woodchuck deterrents), which I may find in an electronic file or hardcopy
> Retrieving a specific document (e.g. an expense form, policy statement, or ISO standard), which may be on-line or in my file cabinet
> Finding evidence (e.g. examining search logs to understand how people are using a search engine, looking for a woodchuck hole near my garden, examining my tires for uneven tread wear), which requires viewing electronic files or my physical environment
> Discovering who the experts are on a topic or what expertise my associates have (e.g. looking up topics to see who has written or spoken, reading resumes or biographies to uncover experience), which is more often done on-line but may be buried in a 20-year old professional directory on the shelf
> Learning about a subject I want or need to understand (e.g. How are search and text analytics being used together in business enterprises? what is the meaning of the tag line “Turbo Search Engine” on an Acura ad?), which were partially answered with online search but also by attending conferences like the Text Analytics Summit 2007 this week
This list illustrates several things. First search is about finding facts, evidence, aggregated information (documents). It is also about discovering, learning and uncovering information that we can then analyze for any number of decisions or potential actions.
Second, search enables us to function more efficiently in all of our worldly activities, execute our jobs, increase our own expertise and generally feed our brains.
Third, search does not require the use of electronic technology, nor sophisticated tools, just our amazing senses: sight, hearing, touch, smell and taste.
Fourth, that what Google now defines as “cloud computing” and MIT geeks began touting as “wearable” technology a few years ago have converged to bring us cars embedded with what Acura defines as “turbo search engines.” On this fourth point, I needed to discover the point. In small print on the full page ad in Newsweek were phrases like “linked to over 7,000,000 destinations” and “knows where traffic is.” In even tinier print was the statement, “real-time traffic monitoring available in select markets…” I thought I understood that they were promoting the pervasiveness of search potential through the car’s extensive technological features. Then I searched the Internet for the phrase “turbo search engine” coupled with “Acura” only to learn that there was more to it. Notably, there is the “…image-tagging campaign that enables the targeted audience to use their fully-integrated mobile devices to be part of the promotion.” You can read the context yourself.
Well, I am still trying to get my head around this fourth point to understand how important it is to helping companies find solid, practical search solutions to problems they face in business enterprises. I don’t believe that a parking lot full of Acura’s is something I will recommend.
Fifth, I experienced some additional thoughts about the place for search technology this week. Technology experts like Sue Feldman of IDC and Fern Halper of Hurwitz & Associates appeared on a panel at the Text Analytics Summit. While making clear the distinctions between search and text analytics, and text analytics and text mining, Sue also made clear that algorithmic techniques employed by the various tools being demonstrated are distinct for each solving different problems in different business situations. She and others acknowledge that finally, having embraced search, enterprises are now adopting significant applications using text analytic techniques to make better sense of all the found content.
Integration was a recurring theme at the conference, even as it was also obvious that no one product embodies the full range of text search, mining and analytics that any one enterprise might need. When tools and technologies are procured in silos, good integration is a tough proposition, and a costly one. Tacking on one product after another and trying to retrofit to provide a seamless continuum from capturing, storing, and organizing content to retrieving and analyzing the text in it, takes forethought and intelligent human design. Even if you can’t procure the whole solution to all your problems at once, and who can, you do need a vision of where you are going to end up so that each deployment is a building block to the whole architecture.
There is a lot to discover at conferences that can’t be learned through search, like what you absorb in a random mix of presentations, discussions and demos that can lead to new insights or just a confirmation of the optimal path to a cohesive plan.

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑