Month: February 2009

Federated Search: Setting User Expectations

In the past few months, it is rare that I am briefed on an enterprise search product without a claim to provide “federated search.” Having worked with the original ANSI standard, Z39.50, and on one of the many review committees for it back in the early 1990s, it is a topic that always catches my attention.

Some of the history of search federation is described in this rather sketchy article at Wikipedia. However, I want clarify the original call for such a standard. It comes from the days when public access to search technologies was available primarily through library on-line catalogs in pubic and academic institutional libraries. A demand for the ability to search not only one’s local library system and network (e.g. a university often standardized on one library system to include all the holdings of a number of its own libraries), but also the holdings of other universities or major public libraries. The problem was that the data structures and protocols from one library system product to the next varied in way that made it difficult for the search engine of the first system to penetrate the database of records in another system. Records might have been meta-tagged similarly, but the way the metadata were indexed and accessible to retrieval algorithms was not possible with a translating layer between systems. Thus, the Z39.50 standard was established, originally to let one library system’s user search from that library system into the contents of other libraries with different systems.

Ideally, results were presented to the searcher in a uniform citation format, organized to help the user easily recognize duplicated records, each marked with location and availability. Usually there was a very esoteric results presentation that could only be readily interpreted by librarians and research scholars.

Now we live in a digitized content environment in which the dissimilarities across content management systems, content repositories, publishers’ databases, and library catalogs have increased a hundred fold. The need for federating or translation layers to bring order to this metadata or metadata-less chaos has only become stronger. The ANSI standard is largely ignored by content platform vendors, thus leaving the federating solution to non-embedded search products. A buyer of search must do deep testing to determine if the enterprise search engine you have acquired actually stands up well under a load of retrieving across numerous disparate repositories. And you need a very astute and experienced searcher with expert familiarity of content in all the repositories to make an evaluation as to suitability for the circumstance in which the engine will be used.

So, let’s just recap what you need to know before you select and license a product claiming to support what you expect from search federation:

  • Federated search is a process for retrieving content either serially or concurrently from multiple targeted sources that are indexed separately, and then presenting results in a unified display. You can imagine that there will be a huge variation in how well those claims might be satisfied.
  • Federation is an expansion of the concept of content aggregation. It has play in a multi-domain environment of only internal sites OR a mix of internal and external sites that might include the deep (hidden) web. Across multiple domains complete federation supports at least four distinct functions:
    • Integration of the results from a number of targeted searchable domains, each with its own search engine
    • Disambiguation of content results when similar but non-identical pieces of content might be included
    • Normalization of search results so that content from different domains is presented similarly
    • Consolidation of the search operation (standardizing a query to each of the target search engines) and standardizing the results so they appear to be coming from a single search operation

In order to do this effectively and cleanly, the federating layer of software, which probably comes from a third-party like MuseGlobal, must have “connectors” that recognize the structures of all the repositories that will be targeted from the “home” search engine.

Why is this relevant? In short, because it is expected by users that when they search, all the results they are looking at represent all the content from all the repositories they believed they were searching in a format that makes sense to them. It is a very tall order for any search system to do this but when enterprise information managers are trying to meet a business manager’s or executive’s lofty expectations, anything less is viewed as the failure of enterprise search. Or else, they better set expectations lower.

From the FastForward Blogger: A Microsoft User Group Meeting

I was at FastForward last week, invited to be a participant in a panel of bloggers on the last day, tasked to react to three days of executive, partner and customer presentations to the FAST search user community. Four of us had more ideas than we could share in a 30 minute panel session. The other three fellows on the panel are regular bloggers on FastForward. Along with them, I had the pleasure of listening to and speaking with numerous other industry analysts and commentators over the three-day period in the “blogger/analyst lounge” where we gathered between sessions.
Before making some observations of my own, I will introduce you to a few of the folks who have had and will continue to have a continuing presence in the content and search arena, particularly as it relates to social tools and knowledge management, two tightly connected areas of interest.
Each of us was interviewed for a kind of video blog session during the meeting. Although you can’t view the panel from the final keynote session, I can share these links that will give you an idea of what my cohorts were thinking about the meeting and the state of FastForward in 2009. They are:
Jon Husband, social computing thought leader and architect. He has coined the term “wirearchy,” which aptly describes a flow of connectedness over the wires (and wireless) air waves. I really liked his observations about how social technologies encourage self-organizing around issues and make group action so much easier. His interview is a good listen and his blog is fun, too.
Jevon MacDonald, founder of Firestoker and FASTforward blog contributor, had some helpful comments in his interview about the usefulness of social media in aiding companies to be more responsive to their customers.
Euan Semple, independent advisor on social computing, elevated the discussion in favor of social tools improving the flow of knowledge, which is really the point of all this content and search related technology, as far as I am concerned. You’ll enjoy the interview with Euan in which he also comments on the ratio of men to women and the IT-centric audience at the meeting, something I observed, as well.
I was also interviewed by Josh-Michéle Ross and my thoughts dovetailed with the others in keeping with the social them of “engage your user,” the conference tag line. My mantra throughout the conference and after is that there was just not enough emphasis on how teams work together to build highly functional and easy-flowing search experiences for users. The process of creating a social platform in which search is present in subtle ways that assist connectedness among experts and their content requires human design; this is an art that can’t be left to “out-of-the-box” installed technologies. It is a task for those with an aptitude for what users really want, need and will use without being force-fed or artificially manipulated. Here are my comments in the interview.
Other interviews of interest can be found at the FastForward Bloggers page where a lot of thought leaders including Rob Paterson, Bill Ives, Clay Shirky, Charlene Li and Jim McGee among many others put forth some thoughtful comments about the state of technology.
Our panel moderator was, Perry Solomon, VP Business Development and General Manager, Worldwide Media Solutions – FAST. While on the big stage we did not get to speak on all the ideas he asked about in our preparatory session, I can bring them to light in the following. Solomon asked these questions followed by my thoughts after a few days to digest the meeting:
Q: How was the meeting balance in terms of search technology versus use?
LWM: The use cases were compelling and well presented. They were highly evocative of the best applications we can achieve with technology using all the social tools and content management options now available. This is appropriate in keynote/big theatre presentations but what I did not find in the few breakout sessions was more about the “nuts and bolts” of the human design and understanding needed to integrate components.
Among the attendees that I met during meals (system integrator partners from small firms and IT people who were struggling to build applications their internal customers wanted), there was a sense that not enough substantive information was being shared. They had hoped for more “how to” and concrete case studies that described the process of getting from purchasing licenses to deploying solutions. When I suggested to some of these Microsoft customers that it might be helpful to have more of their content managers and search administrators in the audience, they all agreed. None carried an attitude that they were going to design and implement these highly sophisticated content/search solutions with just members of the IT department. Business users were also notably absent from the meeting.
Q: What was the impact of the announcement about product news, FastSearch for Internet Business and FastSearch for SharePoint?
LWM: My own reaction was that it was a logical way to begin to roll out the FAST product with existing and evolving Microsoft products. It was not surprising, revolutionary or exciting. MS is clearly committed to making something of its huge investment in FAST; to align it with the rapidly evolving and highly popular SharePoint is smart business. The sentiment of others I spoke with was pretty much the same, sprinkled with a fair amount of skepticism about schedules for delivery and how well the products will be supported with services and documentation. Cost of ownership is always a big worry; what it will take to get the sizzle and super search results from this technology without a huge amount of human investment and skill on the part of customers or third-party integrators stimulates a deep “wait-and-see” attitude among most.
Q: What was missing or not addressed in the sessions?
LWM: The lack of presentations and involvement of non-IT people. While MS is highly responsive to the IT person’s desire for standardizing on a full-function platform and set of tools from a single supplier, this is not the reality in the marketplace. Content is created, manipulated and re-purposed with hundreds of applications that are used by business owners and content managers who bring a deep understanding of what needs to be applied to get the “social” workflow operational and productive in any given culture. My own bias is that the subtleties of organizational culture are often lost on many in IT but are more understood by those deeply immersed in engagement with both experts and their content. A “user-group” meeting must include these “others” and have sessions that support their professional interests so they come away learning substantive stuff from those others in similar situations.
Although “search” was the nominal reason for the meeting, there was no discussion about what it takes to get to the ultimate “user-engagement.” Search remains, “smoke and mirrors.” Search behind the firewall was still pretty thin as a concept and the emphasis was on e-commerce and monetization. There was a lot of talk about business & customer experiences engaging with search but not much substance as to how to actually create rich search experiences.
Q: What are we going to be talking about a year from now?
LWM: I hope the engagement will provide less visionary “hype,” which is not real high-value for the audience in heavy doses. If the meeting becomes more about getting customers to a successful outcome through the engagement of teams with IT, developers, content and business owners coming to a problem using a thoughtful design approach, attendees will leave with a higher commitment to embrace the technology.
Finally, I believe that, as FastSearch solutions are implemented and tested, customers will come to these meetings with higher expectations for helpful case studies that talk about “how the sausage is made,” the role of connectors and the actual tuning for higher relevancy. Much reference to search federation will give way to what federation really is and its many tiers of sophistication. Presentation of search results in ways that are compelling and trustworthy for users will need to be explained in more substantive sessions. I hope that we will be talking about social team interaction for implementing compelling search technology experiences for users.

New URL for newsfeeds

There is a new newsfeed URL you should use for this blog – http://gilbane.com/search_blog/atom.xml. It is not actually new, but it will be the feed we maintain going forward. The feed some of you are using – http://feeds.gilbane.com/EnterpriseSearchPracticeBlog – is being phased out.

Native Database Search vs. Commercial Search Engines

This topic is random and a short response to a question that popped up recently from a reader seeking technical research on the subject. Since none was available in the Gilbane library of studies, I decided to think about how to answer the subject with some practical suggestions.

The focus is on an enterprise with a substantive amount of content aggregated from a diverse universe of industry specific information, and what to do about searching it. If the information has been parsed and stored in an RDBMS database, is it not better to leverage the SQL query engine native to the RDBMS? Typical database engines might be: DB2, MS Access, MS SQL, MySQL, Oracle or Progress Software.

To be clear, I am not a developer but worked closely with software engineers for 20 years when I owned a software company. We worked with several DBMS products, three of them supported SQL queries and the application we invented and supported was a forerunner of today’s content management systems with a variety of retrieval (search) interfaces. The retrievable content our product supported was limited to metadata plus abstracts up to two or three pages in length; the typical database sizes of our customers ranged from 250,000 to a couple of million records.

This is small potatoes compared to what search engines typically traverse and index today but scale was always an issue and we were well aware of the limitations of the SQL engines to support contextual searching, phrase searching and complex Boolean queries. It was essential that indexes be built in real time, when records were added whether manually through screen forms, or through batch loads. The engine needed to support explicit adjacency (phrase) searching as well as key words anywhere in a field, in a record, or in a set. Saving and re-purposing results, storing search strategies, narrowing large sets incrementally, and browsing indexes of terminology (taxonomy navigation) to select unique terms that would enable a Boolean “and” or “or” query were part of the application. When our original text-based DBMS vendor went belly-up, we spent a couple of years test driving numerous RDBMS products to find one that would support the types of searches our customers expected. We settled on Progress Software primarily because of its support for search and experience as an OEM to application software vendors, like us. Development time was minimized because of good application building tools and index building utilities.

So, what does that have to do with the original question, native RDBMS search vs. standalone enterprise search? Based on discussions and observations with developers trying to optimize search for special applications, using generic search tools for database retrieval, I would make the following observations. Search is very hard and advanced search, including concept searching, Boolean operations, and text analytics, is harder still. Developers of enterprise search solutions have grappled with and solved search problems that need to be supported in environments where content is dynamically changing and growing, different user interfaces for diverse audiences and types of queries are needed, and query results require varieties of display formats. Also, in e-commerce applications, interfaces require routine screen face lifts that are best supported by specialized tools for that purpose.

Then you need to consider all these development requirements; they do not come out-of-the-box with SQL search:

  • Full text indexes and database field or metadata indexes require independent development efforts for each database application that needs to be queried.
  • Security databases must be developed to match each application where individual access to specific database elements (records or rows) is required.
  • Natural language queries require integration with taxonomies, thesauri, or ontologies; this means software development independent of the native search tools.
  • Interfaces must be developed for search engine administrators to make routine updates to taxonomies and thesauri, retrieval and results ranking algorithms, adjustments to include/exclude target content in the databases. These content management tasks require substantive content knowledge but should not require programming expertise and must be very efficient to execute.
  • Social features that support interaction among users and personalization options must be built.
  • Connectors need to be built to federate search across other content repositories that are non-native and may even be outside the enterprise.

Any one of these efforts is a multi-person and perpetual activity. The sheer scale of the development tasks mitigate against trying to sustain state-of-the-art search in-house with the relatively minimalist tools provided in most RDBMS suites. The job is never done and in-depth search expertise is hard to come by. Software companies that specialize in search for enterprises are also diverse in what they offer and the vertical markets they support well. Bottom line: identify your business needs and find the search vendor that matches your problem with a solution they will continue to support with regular updates and services. Finally, the issue of search performance and speed of processing are another huge factor to consider. For this you need some serious technical assessment. If the target application is going to be a big revenue generator with heavy loads and huge processing, do not overlook. Do benchmarks to prove the performance and scalability.

© 2018 Bluebill Advisors

Theme by Anders NorenUp ↑