Sometimes it pays to be behind in reading industry news. The big news last week was Google’s new patents and plans to enhance search results using metadata and taxonomy embedded in content. This was followed by the news that Business Objects plans to acquire Inxight, a Xerox PARC spin-off that has produced a product line with terrific data visualization tools, highly valued in the business analytics (BI) marketplace. I had planned to write about the convergence of the enterprise search and BI markets this week until I caught up with industry news from April and early May. This triggered a couple of insights into these more recent announcements.
In April an Information Week article noted that Google has, uncharacteristically, contributed two significant enhancements to MySQL: improved replication procedures across multiple systems and expanded mirroring. Writer Babcock also noted that “Google doesn’t use MySQL in search” but YouTube does. I believe Google will come to be more tied to MySQL as they begin to deploy new search algorithms that take advantage of metadata and taxonomies. These need good text database structures to be managed efficiently and leveraged effectively to produce quality results from search on the scale that Google does it. Up to now Google results presentation has been influenced more by transaction processing than semantic and textual context. Look for more Google enhancements to MySQL to help it effectively manage all that meaningful text. The open source question is will more enhancements be released by Google for all to use? A lot of enterprises would benefit from being able to depend on continual enhancements to MySQL so they could (continue to) use it instead of Oracle or MS-SQL server as the database back-end for text searching.
The other older news (Information Week, May 7th) was that Business Objects was touting “business intelligence for ‘all individuals’” with some new offerings. BO’s acquisition announcement just last week, that they plan to acquire Inxight, only strengthens their position in this market. Inxight has been on the cusp of BI and enterprise search for several years and this portends more convergence of products in these growing markets. Twenty-five years ago when I was selling text software applications, a key differentiator was strong report building tool sets to support “slicing and dicing” database content in any desired format. It sounds like robust, intuitive reporting tools for all enterprise users of content applications is still a dream but much closer to reality for the high-end market.
With all the offerings and consolidation in BI and search, the next moves will surely begin to push some offerings with search/BI to a price point that small-medium businesses (SMBs) can afford. We know that Microsoft sees the opening (Information Week, May 14th) and let’s hope that others do as well.
Steve Arnold of ArnoldIT struck twice in a big way last week, once as a contributor to the Bear, Stearns & Co. research report on Google and once as a principal speaker at Enterprise Search in New York. I’ve read a copy of the Bear Stearns report, which contains information that should make IT people pay close attention to how they manage searchable enterprise content. I can verify that this blog summary of Steve’s New York speech by Larry Digman sounds like vintage Arnold, to the point and right on it. Steve, not for the first time, is making points that analysts and other search experts routinely observe about the lack of serious infrastructure vested in making content valuable by enhancing its searchability.
First is the Bear Stearns report, summarized for the benefit of government IT folks with admonitions about how to act on the technical guidance it provides in this article by Joab Jackson in GCN. The report’s appearance in the same week as Microsoft’s acquisition of aQuantive is newsworthy in itself. Google really ups the ante with their plans to change the rules for posting content results for Internet searches. If Webmasters actually begin to do more sophisticated content preparation to leverage what Google is calling its Programmable Search Engine (PSE), then results using Google search will continue to be several steps ahead of what Microsoft is currently rolling out. In other words, while Microsoft is making its most expensive acquisition to tweak Internet searching in one area, Google is investing its capital in its own IP development to make search richer in another. Experience looking at large software companies tells me that IP strategically developed to be totally in sync with existing products have a much better chance of quick success in the marketplace than companies that do acquisitions to play catch up. So, even though Microsoft, in an acquiring mode, may find IP to acquire in the semantic search space (and there is a lot out there that hasn’t been commercialized), its ability to absorb and integrate it in time to head off this Google initiative is a real tough proposition. I’m with Bear Stearn’s guidance on this one.
OK, on to Arnold’s comments at Enterprise Search, in which he continues a theme to jolt IT folks. As, already noted, I totally agree that IT in most organizations is loath to call on information search professionals to understand the best ways to exploit search engine adoption for getting good search results. But I am hoping that the economic side of search, Web content management for an organization’s public facing content, may cause a shift. Already, I am experiencing Web content managers who are enlightened about how to make content more findable through good metadata and taxonomy strategies. They have figured out how to make good stuff rise to the top with guidance from outside IT. When sales people complain that their prospects can’t find the company’s products online, it tends to spur marketing folks to adjust their Web content strategies accordingly.
It may take a while, but my observation is that when employees see search working well on their public sites, they begin to push for equal quality search internally. Now that we have Google paying serious attention to metadata for the purpose of giving search results semantic context, maybe the guys in-house will begin to get it, too.
Last week I commented on the richness of the search marketplace. However, diversity presents the enterprise buyer with pressure to be more focused on immediate and critical search needs.
The Enterprise Search Summit is being held in New York this week. Two years ago I found it a great place to see the companies offering search products, where I could easily see them all, and still attend every session in two days. This year, 2007, there were over 40 exhibitors, most offering solutions for highly differentiated enterprise search problems. Few of the offerings will serve the end-to-end needs of a large enterprise but many would be sufficient for medium to small organizations. The two major search engine categories used to be Web content keyword searching, and structured searching. Not only is my attention as an analyst being requested by major vendors offering solutions for different types of search but new products are being announced weekly. Newcomers include those describing their products as data mining engines, search and reporting “platforms,” BI intelligence engines, semantic and ontological search engines. This mix challenges me to determine if a product really solves a type of enterprise search problem before I pay attention.
You, on the other hand, need to do another type of analysis before considering specific options. Classifying search categories, taking a faceted approach will help you narrow down the field. Here is a checklist for categorizing what and how content needs to be found:
> Content types (e.g. HTML pages, PDFs, images)
> Content repositories (e.g. database applications, content management systems, collaboration applications, file locations)
> Types of search interfaces and navigation (e.g. simple search box, metadata, taxonomy)
> Types of search (e.g. keyword, phrase, date, topical navigation)
> Types of results presentation (e.g. aggregated, federated, normalized, citation)
> Platforms (e.g. hosted, intranet, desktop)
> Type of vendor (e.g. search-only, single purpose application with embedded search, software as service – SaS )
> Amount of content by type
> Number and type of users by need (personas)
Then use any tools or resources at hand to harvest an understanding of the mapping results to learn who needs what type of content, in what format and its criticality to business requirements. Prioritizing the facets produces a multidimensional view of enterprise search requirements. This will go a long way to narrowing down the vendor list and gives you a tool to keep discussions focused.
There are terrific options in the marketplace and they will only become richer in features and complexity. Your job is to find the most appropriate solution for the business search problem you need to solve today, at a cost that matches your budget. You also want a product that can be implemented rapidly with immediate benefit linking to a real business proposition.
This week, EMC announced a collaborative research network, with this headline: New EMC Innovation Network to Harness Worldwide Tech Resources, Accelerate Information Infrastructure Innovation. Among the areas that the research network will explore are Semantic Web, search, context, and ontological views.
There is a lot to feed on in this announcement but the most interesting aspect is the juxtaposition with other hardware giants’ forays into the world of document and content search software (e.g. IBM, CISCO), and recent efforts by software leaders Oracle and Microsoft to strengthen their offerings in the area of content and search.
One of the phrases in EMC’s announcement that struck me is the reference to “information infrastructure.” This phrase is used ubiquitously by IT folks to aggregate their hardware and network components with the assumption that because these systems store and transport data, they are information infrastructure. We need to recognize that there are two elements missing from this infrastructure view, skilled knowledge workers (e.g.content structure architects, taxonomists, specialist librarians) and software applications for content authoring, capture, organization, and retrieval. Judging from the language of EMC’s press release this might just be tacit recognition that hardware and networks do not make up an information infrastructure. But those of us in search and content management knew that all along; we don’t need a think tank to show us how the pieces fit together nor even how to innovate to create good infrastructure. Top notch professionals have been doing that for decades. Will this new network really reveal anything new?
EMC does not explicitly announce a plan to make search and information infrastructure product commodities but they do express the desire to build “commercial products” for this market. They have already acquired a few of the software components but have yet to demonstrate a tight integration with the rest of the company. Usually innovation comes from humble roots and grows organically through the sponsorship of a large organization, self-funding or other interested contributors. This effort to lead an innovation community to solutions for information infrastructure has the potential to spawn growth of truly innovative tools, methods and even standards for diverse needs and communities. Alternatively, it may simply be a push to bring a free-wheeling industry of multi-faceted components under central control with the result being tools and products that serve the lowest common denominator users.
From a search point of view, I for one am enjoying the richness of the marketplace and how varied the product offerings are for many specialized needs. For the time being, I remain skeptical that any hardware or software giant can sustain the richness of offerings that get to the heart of particular business search needs in a universal way. Commodity search solutions are a long way off for the community of organizations I encounter.