Conference planning is starting to ramp up. See our first group of sponsors, and don’t forget the call for papers!
Read MoreHere we are, half way through 2011, and on track for a banner year in the adoption of enterprise search, text mining/text analytics, and their integration with collaborative content platforms. You might ask for evidence; what I can offer is anecdotal observations. Others track industry growth in terms of dollars spent but that makes me leery when, over the past half dozen years, there has been so much disappointment expressed with the failures of legacy software applications to deliver satisfactory results. My antenna tells me we are on the cusp of expectations beginning to match reality as enterprises are finding better ways to select, procure, implement, and deploy applications that meet business needs.
What follows are my happy observations, after attending the 2011 Enterprise Search Summit in New York and 2011 Text Analytics Summit in Boston. Other inputs for me continue to be a varied reading list of information industry publications, business news, vendor press releases and web presentations, and blogs, plus conversations with clients and software vendors. While this blog is normally focused on enterprise search, experiencing and following content management technologies, and system integration tools contribute valuable insights into all applications that contribute to search successes and frustrations.
Collaboration tools and platforms gained early traction in the 1990s as technology offerings to the knowledge management crowd. The idea was that teams and workgroups needed ways to share knowledge through contribution of work products (documents) to “places” for all to view. Document management systems inserted themselves into the landscape for managing the development of work products (creating, editing, collaborative editing, etc.). However, collaboration spaces and document editing and version control activities remained applications more apart than synchronized.
The collaboration space has been redefined largely because SharePoint now dominates current discussions about collaboration platforms and activities. While early collaboration platforms were carefully structured to provide a thoughtfully bounded environment for sharing content, their lack of provision for idiosyncratic and often necessary workflows probably limited market dominance.
SharePoint changed the conversation to one of build-it-to-do-anything-you-want-the way-you-want (BITDAYWTWYW). What IT clearly wants is single vendor architecture that delivers content creation, management, collaboration, and search. What end-users want is workflow efficiency and reliable search results. This introduces another level of collaborative imperative, since the BITDAYWTWYW model requires expertise that few enterprise IT support people carry and fewer end-users would trust to their IT departments. So, third-party developers or software offerings become the collaborative option. SharePoint is not the only collaboration software but, because of its dominance, a large second tier of partner vendors is turning SharePoint adopters on to its potential. Collaboration of this type in the marketplace is ramping wildly.
Convergence of technologies and companies is on the rise, as well. The non-Microsoft platform companies, OpenText, Oracle, and IBM are placing their strategies on tightly integrating their solid cache of acquired mature products. These acquisitions have plugged gaps in text mining, analytics, and vocabulary management areas. Google and Autonomy are also entering this territory although they are still short on the maturity model. The convergence of document management, electronic content management, text and data mining, analytics, e-discovery, a variety of semantic tools, and search technologies are shoring up the “big-platform” vendors to deal with “big-data.”
Sitting on the periphery is the open source movement. It is finding ways to alternatively collaborate with the dominant commercial players, disrupt select application niches (e. g. WCM ), and contribute solutions where neither the SharePoint model nor the big platform, tightly integrated models can win easy adoption. Lucene/Solr is finding acceptance in the government and non-profit sectors but also appeal to SMBs.
All of these factors were actively on display at the two meetings but the most encouraging outcomes that I observed were:
- Rise in attendance at both meetings
- More knowledgeable and experienced attendees
- Significant increase in end-user presentations
The latter brings me back to the adoption issue. Enterprises, which previously sent people to learn about technologies and products to earlier meetings, are now in the implementation and deployment stages. Thus, they are now able to contribute presentations with real experience and commentary about products. Presenters are commenting on adoption issues, usability, governance, successful practices and pitfalls or unresolved issues.
Adoption is what will drive product improvements in the marketplace because experienced adopters are speaking out on their activities. Public presentations of user experiences can and should establish expectations for better tools, better vendor relationship experiences, more collaboration among products and ultimately, reduced complexity in the implementation and deployment of products.
Read MoreThis space is not normally used to comment on knowledge management (KM), one of my areas of consulting, but a recent conference gives me an opening to connect the dots between KM and search. Dave Snowden and Tom Stewart always have worthy commentary on KM and as keynote speakers they did not disappoint at KMWorld. It may seem a stretch but by taking a few of their thoughts out of context, I can synthesize a relationship between KM and search.
KMWorld, Enterprise Search Summit, SharePoint Symposium and Taxonomy Boot Camp moved to Washington D.C. for the 2010 Fall Conference earlier this month. I attended to teach a workshop on building a semantic platform, and to participate in a panel discussion to wrap up the conference with two other analysts, Leslie Owen and Tony Byrne with Jane Dysart moderating.
Comments from the first and last keynote speakers of the conference inspired my final panel comments, counseling attendees to lead by thoughtfully leveraging technology only to enhance knowledge. But there were other snippets that prompt me to link search and KM.
Tom Stewart’s talk was entitled, Knowledge Driven Enterprises: Strategies & Future Focus, which he couched in the context of achieving a “coherent” winning organization. He explained that to reach the coherence destination requires understanding of different types of knowledge and how we need to behave for attaining each type (e.g. “knowable complicated “knowledge calls for experts and research; “emergent complex” knowledge calls for leadership and “sense-making.”).
Stewart describes successful organizations as those in which “the opportunities outside line up with the capabilities inside.” He explains that those “companies who do manage to reestablish focus around an aligned set of key capabilities” use their “intellectual capital” to identify their intangible assets,” human capability, structural capital, and customer capital. They build relationship capital from among these capabilities to create a coherent company. Although Stewart does not mention “search,” it is important to note that one means to identify intangible assets is well-executed enterprise search with associated analytical tools.
Dave Snowden also referenced “coherence,” (messy coherence), even as he spoke about how failures tend to be more teachable (memorable) than successes. If you follow Snowden, you know that he founded the Cognitive Edge and has developed a model for applying cognitive learning to help build resilient organizations. He has taught complexity analysis and sense-making for many years and his interest in human learning behaviors is deep.
To follow the entire thread of Snowden’s presentation on the “The Resilient Organization” follow this link. I was particularly impressed with his statement about the talk, “one of the most heart-felt I have given in recent years.” It was one of his best but two particular comments bring me to the connection between KM and search.
Dave talked about technology as “cognitive augmentation,” its only truly useful function. He also puts forth what he calls the “three Golden rules: Use of distributed cognition, wisdom but not foolishness of crowds; finely grained objects, information and organizational; and disintermediation, putting decision makers in direct contact with raw data.”
Taking these fragments of Snowden’s talk, a technique he seems to encourage, I put forth a synthesized view of how knowledge and search technologies need to be married for consequential gain.
We live and work in a highly chaotic information soup, one in which we are fed a steady diet of fragments (links, tweets, analyzed content) from which we are challenged as thinkers to derive coherence. The best knowledge practitioners will leverage this messiness by detecting weak signals and seek out more fragments, coupling them thoughtfully with “raw data” to synthesize new innovations, whether they be practices, inventions or policies. Managing shifting technologies, changing information inputs, and learning from failures (our own, our institution’s and others) contributes to building a resilient organization.
So where does “search” come in? Search is a human operation and begins with the workforce. Going back to Stewart who commented on the need to recognize different kinds of knowledge, I posit that different kinds of knowledge demand different kinds of search. This is precisely what so many “enterprise search” initiatives fail to deliver. Implementers fail to account for all the different kinds of search, search for facts, search for expertise, search for specific artifacts, search for trends, search for missing data, etc.
When Dave Snowden states that “all of your workforce is a human scanner,” this could also imply the need for multiple, co-occurring search initiatives. Just as each workforce member brings a different perspective and capability to sensory information gathering, so too must enterprise search be set up to accommodate all the different kinds of knowledge gathering. And when Snowden notes that “There are limits to semantic technologies: Language is constantly changing so there is a requirement for constant tuning to sustain the same level of good results,” he is reminding us that technology is only good for cognitive augmentation. Technology is not a “plug ‘n play,” install and reap magical cognitive insights. It requires constant tuning to adapt to new kinds of knowledge.
The point is one I have made before; it is the human connection, human scanner and human understanding of all the kinds of knowledge we need in order to bring coherence to an organization. The better we balance these human capabilities, the more resilient we’ll be and the better skilled at figuring out what kinds of search technologies really make sense for today, and tomorrow we had better be ready for another tool for new fragments and new knowledge synthesis.
Read MoreIt has been nearly two years since I commented on an article in Information Week, Open Source, Its Time has Come, Nov. 2008. My main point was the need for deep expertise to execute enterprise search really well. I predicted the growth of service companies with that expertise, particularly for open source search. Not long after I announced that, Lucid Imagination was launched, with its focus on building and supporting solutions based on Lucene and, its more turnkey version, Solr.
It has not taken long for Lucid Imagination (LI) to take charge of the Lucene/Solr community of practice (CoP), and to launch its own platform built on Solr, Lucidworks Enterprise. Open source depends on deep and sustained collaboration; LI stepped into the breach to ensure that the hundreds of contributors, users and committers have a forum. I am pretty committed to CoPs myself and know that nurturing a community for the long haul takes dedicated leadership. In this case it is undoubtedly enlightened self-interest that is driving LI. They are poised to become the strongest presence for driving continuous improvements to open source search, with Apache Lucene as the foundation.
Two weeks ago LI hosted Lucene Revolution, the first such conference in the US. It was attended by over 300 in Boston, October 7-8 and I can report that this CoP is vibrant, enthusiastic. Moderated by Steve Arnold, the program ran smoothly and with excellent sessions. Those I attended reflected a respectful exchange of opinions and ideas about tools, methods, practices and priorities. While there were allusions to vigorous debate among committers about priorities for code changes and upgrades, the mood was collaborative in spirit and tinged with humor, always a good way to operate when emotions and convictions are on stage.
From my 12 pages of notes come observations about the three principal categories of sessions:
- Discussions, debates and show-cases for significant changes or calls for changes to the code
- Case studies based on enterprise search applications and experiences
- Case studies based on the use of Lucene and Solr embedded in commercial applications
Since the first category was more technical in nature, I leave the reader with my simplistic conclusions: core Apache Lucene and Solr will continue to evolve in a robust and aggressive progression. There are sufficient committers to make a serious contribution. Many who have decades of search experience are driving the charge and they have cut their teeth on the more difficult problems of implementing enterprise solutions. In announcing Lucidworks Enterprise, LI is clearly bidding to become a new force in the enterprise search market.
New and sustained build-outs of Lucene/Solr will be challenged by developers with ideas for diverging architectures, or “forking” code, on which Eric Gries, LI CEO, commented in the final panel. He predicted that forking will probably be driven by the need to solve specific search problems that current code does not accommodate. This will probably be more of a challenge for the spinoffs than the core Lucene developers, and the difficulty of sustaining separate versions will ultimately fail.
Enterprise search cases reflected those for whom commercial turnkey applications will not or cannot easily be selected; for them open source will make sense. Coming from LI’s counterpart in the Linux world, Red Hat, are these earlier observations about why enterprises should seek to embrace open source solutions, in short the sorry state of quality assurance and code control in commercial products. Add to that the cost of services to install, implement and customize commercial search products. The argument would be to go with open source for many institutions when there is an imperative or call for major customization.
This appears to be the case for two types of enterprises that were featured on the program: educational institutions and government agencies. Both have procurement issues when it comes to making large capital expenditures. For them it is easier to begin with something free, like open source software, then make incremental improvements and customize over time. Labor and services are cost variables that can be distributed more creatively using multiple funding options. Featured on the program were the Smithsonian, Adhere Solutions doing systems integration work for a number of government agencies, MITRE (a federally funded research laboratory), U. of Michigan, and Yale. CISCO also presented, a noteworthy commercial enterprise putting Lucene/Solr to work.
The third category of presenters was, by far, the largest contingent of open source search adopters, producers of applications that leverage Lucene and Solr (and other open source software) into their offerings. They are solidly entrenched because they are diligent committers, and share in this community of like-minded practitioners who serve as an extended enterprise of technical resources that keeps their overhead low. I can imagine the attractiveness of a lean business that can run with an open source foundation, and operates in a highly agile mode. This must be enticing and exciting for developers who wilt at the idea of working in a constrained environment with layers of management and political maneuvering.
Among the companies building applications on Lucene that presented were: Access Innovations, Twitter, LinkedIn, Acquia, RivetLogic and Salesforce.com. These stand out as relatively mature adopters with traction in the marketplace. There were also companies present that contribute their value through Lucene/Solr partnerships in which their products or tools are complementary including: Basis Technology, Documill, and Loggly.
Links to presentations by organizations mentioned above will take you to conference highlights. Some will appeal to the technical reader for there was a lot of code sharing and technical tips in the slides. The diversity and scale of applications that are being supported by Lucene and Solr was impressive. Lucid Imagination and the speakers did a great job of illustrating why and how open source has a serious future in enterprise search. This was a confidence building exercise for the community.
Two sentiments at the end summed it up for me. On the technical front Eric Gries observed that it is usually clear what needs to be core (to the code) and what does not belong. Then there is a lot of gray area, and that will contribute to constant debate in the community. For the user community, Charlie Hull, of flax opined that customers don’t care whether (the code) is in the open source core or in the special “secret sauce” application, as long as the product does what they want.
Read MoreIt is not news that enterprise search has been relegated to the long list of failed technologies by some. We are at the point where many analysts and business writers have called for a moratorium on the use of the term. Having worked in a number of markets and functional areas (knowledge management/KM, special libraries, and integrated library software systems) that suffered the death knell, even while continuing to exist, I take these pronouncements as a game of sorts.
Yes, we have seen the demise of vinyl phonograph records, cassette tapes and probably soon musical CD albums, but those are explicit devices and formats. When you can’t buy or play them any longer, except in a museum or collector’s garage, they are pretty dead in the marketplace. This is not true of search in the enterprise, behind the firewall, or wherever it needs to function for business purposes. People have always needed to find “stuff” to do their work. KM methods and processes, special libraries and integrated library systems still exist, even as they were re-labeled for PR and marketing purposes.
What is happening to search in the enterprise is that it is finding its purpose, or more precisely its hundreds of purposes. It is not a monolithic software product, a one-size-fits-all. It comes in dozens of packages, models, and price ranges. It may be embedded in other software or standalone. It may be procured for a point solution to support retrieval of content for one business unit operating in a very narrow topical range, or it may be selected to give access to a broad range of documents that exist in numerous enterprise domains on many subjects.
Large enterprises typically have numerous search solutions in operation, implementation, and testing, all at the same time. They are discovering how to deploy and leverage search systems and they are refining their use cases based on what they learn incrementally through their many implementations. Teams of search experts are typically involved in selecting, deploying and maintaining these applications based on their subject expertise and growing understanding of what various search engines can do and how they operate.
After years of hearing about “the semantic Web,” the long sought after “holy grail” of Web search, there is a serious ramping of technology solutions. Most of these applications can also make search more semantically relevant behind the firewall. These technologies have been evolving for decades beginning with so-called artificial intelligence, and now supported by some categories of computational linguistics such as specific algorithms for parsing content and disambiguating terms. A soon to-be released study featuring some of noteworthy applications reveals just how much is being done in enterprises for specific business purposes.
With this “teaser” on what is about to be published, I leave you with one important thought, meaningful search technologies depend on rich linguistically-based technologies. Without a cornucopia of software tools to build terminology maps and dictionaries, analyze content linguistically in context to elicit meaning, parse and evaluate unstructured text data sources, and manage vocabularies of ever more complex topical domains, semantic search could not exist.
Language complexities are challenging and even vexing. Enterprises will be finding solutions to leverage what they know only when they put human resources into play to work with the lingo of their most valuable domains.
Read MoreThis one almost slipped right past me but I see we are in another shoot-out in the naming of search market segments. Probably it is because we have too many offerings in the search industry. When any industry reaches a critical mass, players need to find a way to differentiate what they sell. Products have to be positioned as, well, “something else.”
In my consulting practice “knowledge management” has been hot (1980s and 90s), dead (late ’90s and early 2000s), relevant again (now). In my analyst role for “enterprise search” Gilbane has been told by experts that the term is meaningless and should be replaced with “behind the firewall search,” as if that clarifies everything. Of course, marketing directories might struggle with that as a category heading.
For the record, “search” has two definitions in my book. The first is a verb referring to the activity of looking for anything. The second, newer, definition is a noun referring to technologies that support finding “content.” Both are sufficiently broad to cover a lot of activities, technologies and stuff. “Enterprises” are organizations of any type in which business, for-profit, non-for-profit, or government, is being conducted. Let us quibble no more.
But I digress; Endeca has broadened its self-classification in any number of press releases to referring to its products that were “search” products last year, as “information access software.” This is the major category used by IDC to include “search.” That’s what we called library systems in the 1970s and 80s. New products still aim for accessing content, albeit with richer functions and features but where are we going to put them in our family of software lists? One could argue that Endeca’s products are really a class of “search,” search on steroids, a specialized form of search. What are the defining differentiators between “search software” and “information access software?” When does a search product become more than it was or narrower, refined in scope? (This is a rhetorical question but I’m sure each vendor in this new category will break-it out for me in their own terms.)
Having just finished reviewing the market for enterprise search, I believe that many of the products are reaching for the broader scope of functionality defined by IDC as being: search and retrieval, text analytics, and BI. But are they really going to claim to be content management and data warehousing software, as well? Those are included in IDC’s definition of “information access software.” May-be we are going back to single-vendor platforms with everything bundled and integrated. Sigh… it makes me tired, trying to keep up with all this categorizing and re-redefining.
Following on my last post in which I covered the unique value propositions offered by a variety of enterprise search products, this one takes a look at the evolution of enterprise search. The commentary by search company experts, executives, and analysts indicates some evolutionary technologies and the escalation of certain themes in enterprise search. Furthermore, the pursuit of organizations to strengthen the link between searching technologies and knowledge enablers has never been more prominently featured taking search to a whole new level beyond mere retrieval.
The following paraphrased comments from the Enterprise Search Keynote session are timely and revealing. When I asked, Will Web and Internet Search Technologies Drive the Enterprise (Internal) Search Tool Offerings or Will the Markets Diverge?, these were some thoughts from the panelists.
Matt Brown, Principal Analyst from Forrester Research, commented that enterprise search demands much different and richer content interpretation types of search technologies. What Web-based searching does is create such high visibility for search that enterprises are being primed to adopt it, but only when it comes with enhanced capabilities.
Echoing Matt’s remarks, Oracle search solution manager Bob Bocchino commented on the difficulty of making search operate well within the enterprise because it needs to deal with structured database content and unstructured files, while also applying sophisticated security features that let only authorized viewers see restricted content. Furthermore, security must be deployed in a way that does not degrade performance while supporting continuous updates to content and permissions.
Hadley Reynolds, VP & Director of the Center for Search Innovation at Fast Search & Transfer, noted that the Web isn’t really making a direct impact on enterprise search innovation but many of the social tools found on the Web are being adopted in enterprises to create new kinds of content (e.g. social networks, blogs and wikis) with which enterprise search engines must cope in richer contextual ways.
Don Dodge, Director of Business Development for the Emerging Business Team at Microsoft further noted that the Internet’s biggest problem is scale. That is a much easier problem to solve than in the enterprise where user standards for what qualifies as a good and valuable search results are much higher, therefore making the technology to deliver those results more difficult.
Among the other noteworthy comments in this session was a negative about taxonomies. The gist of it was that they require so much discipline that they might work for a while but can’t really be sustained. If this attitude becomes the norm, many of the semantic search engines which depend on some type of classification and categorization according to industry terminologies or locally maintained lists will be challenged to deliver enhanced search results. This is a subject to be taken up in a later blog entry.
A final conclusion about enterprise search was a remark about the evolution of adoption in the marketplace. Simply put, the marketplace is not monolithic in its requirements. The diversity of demands on search technologies has been a disincentive for vendors to focus on distinct niches and place more effort on areas like e-commerce. This seems to be shifting, especially with all the large software companies now seriously announcing products in the enterprise search market.
It has been a week since the annual Gilbane Boston 2007 Conference closed and I am still searching for the most important message that came out of Enterprise Search and Semantic Web Technology sessions. There were so many interesting case studies that I’ll begin with a search function that illustrates one major enterprise search requirement – aggregation.
Besides illustrating a business case for aggregating disparate content using search, the case studies shared three themes:
> Search is just a starting point for many business processes
> While few very large organizations present all of their organization’s content through a single portal, the technology options to manage such an ideal design are growing and up to supporting entire enterprises
> All systems were implemented and operational for delivering value in less than one year, underscoring the trend toward practical and more out-of-the box solutions
Here is a brief take on what came out of just the first two of seven sessions.
Small-medium solutions:
> Use of ISYS to manipulate search results and function as a back-office data analysis tool for DirectEDGAR, the complete SEC filings, presented by Prof. Burch Kealey of the University of Nebraska. Presentation
> Support for search by serendipity across the shareable content domains of members of a trade association (ARF) by finding results that satisfy the searcher in his pursuit of understanding with Exalead, presented by Alain Heurtebise CEO of Exalead. Presentation
> A knowledge portal enabling rapid and efficient retrieval of the complete technical documentation for field service engineers at Otis Elevator to meet rapid response goals when supporting customers using a customized implementation of dtSearch, presented by project consultant Rob Wiesenberg of Contegra Systems, Inc. Presentation
Large solutions calling for search across multi-million record domains:
> Hosted Vivisimo solution federating over 40 million documents across 22,000 government web sites accessible with search results clustered; it records over a half million page views per day on http://USA.gov and was deployed in 8 weeks, presented by Vivisimo co-founder Jerome Pesenti. Presenation
> Intranet knowledge portal for improving customer services by enabling access to internal knowledge assets (over half a million customer cases with all their associated documents) at USi (an AT&T company) using Endeca, a search product USi had experience deploying and hosting for very large e-commerce catalogs, presented by development leader Toby Ford of USi. With one developer it was running in six months. Presentation
> Within a large law firm (Morrison Foerster) and the legal departments of two multi-national pharmaceutical companies (Pfizer and Novartis), Recommind aggregates and indexes content for numerous internal application repositories, file shares and external content sources for unified search across millions of documents, contributing a direct ROI in saved labor by ensuring that required documents are retrieved in a single search process. Presentation
In each of these cases, content from numerous sources was aggregated through the crawling and indexing algorithms of a particular search engine pointed at a bounded and defined corpus of content, with or without associated metadata to solve a particular business problem. In each case, there were surrounding technologies, human architected design elements, and interfaces to present the search interface and results for a predefined audience. This is what we can expect from search in the coming months and years, deployments to meet specialized enterprise needs, an evolving array of features and tools to leverage search results, and a rapid scaling of capabilities to match the explosion of enterprise content that we all need to find and manipulate to do our jobs.
Next week, I will reconstruct more themes and messages from the conference.



