Bluebill Blog | Content and information technologies

Big data and decision making: data vs intuition

There is certainly hype around ‘big data’, as there always has been and always will be about many important technologies or ideas – remember the hype around the Web? Just as annoying is the backlash anti big data hype, typically built around straw men – does anyone actually claim that big data is useful without analysis?

One unfair characterization both sides indulge in involves the role of intuition, which is viewed either as the last lifeline for data-challenged and threatened managers, or as the way real men and women make the smart difficult decisions in the face of too many conflicting statistics.

Robert Carraway, a professor who teaches Quantitative Analysis at UVA’s Darden School of Business, has good news for both sides. In a post on big data and decision making in Forbes, “Meeting the Big Data challenge: Don’t be objective” he argues ”that the existence of Big Data and more rational, analytical tools and frameworks places more—not less—weight on the role of intuition.”

Carraway first mentions Corporate Executive Board’s findings that of over 5000 managers 19% were “Visceral decision makers” relying “almost exclusively on intuition.” The rest were more or less evenly split between “Unquestioning empiricists” who rely entirely on analysis and “Informed skeptics … who find some way to balance intuition and analysis.” The assumption of the test and of Carraway was that Informed skeptics had the right approach.

A different study, “Frames, Biases, and Rational Decision-Making in the Human Brain“, at the Institute of Neurology at University College London tested for correlations between the influence of ‘framing bias’ (what it sounds like – making different decisions for the same problem depending on how the problem was framed) and degree of rationality. The study measured which areas of the brain were active using an fMRI and found the activity of the the most rational (least influenced by framing) took place in the prefrontal cortex, where reasoning takes place; the least rational (most influenced by framing / intuition) had activity in the amygdala (home of emotions); and the activity of those in between (“somewhat susceptible to framing, but at times able to overcome it”) in the cingulate cortex, where conflicts are addressed.

It is this last correlation that is suggestive to Carraway, and what he maps to being an informed skeptic. In real life, we have to make decisions without all or enough data, and a predilection for relying on either data or intuition can easily lead us astray. Our decision making benefits by our brain seeing a conflict that calls for skeptical analysis between what the data says and what our intuition is telling us. In other words, intuition is a partner in the dance, and the implication is that it is always in the dance — always has a role.

Big data and all the associated analytical tools provide more ways to find bogus patterns that fit what we are looking for. This makes it easier to find false support for a preconception. So just looking at the facts – just being “objective” – just being “rational” – is less likely to be sufficient.

The way to improve the odds is to introduce conflict – call in the cingulate cortex cavalry. If you have a pre-concieved belief, acknowledge it and and try and refute, rather than support it, with the data.

“the choice of how to analyze Big Data should almost never start with “pick a tool, and use it”. It should invariably start with: pick a belief, and then challenge it. The choice of appropriate analytical tool (and data) should be driven by: what could change my mind?…”

Of course conflict isn’t only possible between intuition and data. It can also be created between different data patterns. Carraway has an earlier related post, “Big Data, Small Bets“, that looks at creating multiple small experiments for big data sets designed to minimize identifying patterns that are either random or not significant.

Thanks to Professor Carraway for elevating the discussion. Read his full post.

Read More

How long does it take to develop a mobile app?

We have covered and written about the issues enterprises need to consider when planning to develop a mobile app, especially on choosing between native apps, mobile web apps (HTML5, etc.), or a hybrid approach that includes elements of each. And have discussed some of the choices / factors that would have an effect on the time required to bring an app to market, but made no attempt to advise or speculate on how long it should take to “develop a mobile app”. This is not a question with a straightforward answer as any software development manager with tell you.

There are many reasons estimating app development time is difficult, but there are also items outside of actual coding that need to be accounted for. For example, a key factor often not considered in measuring app development is the time involved to train or hire for skills. Since most organizations already have experience with standards such as HTML and CSS developing mobile web apps should be, ceteris paribus, less costly and quicker than developing a native app. This is especially true when the app needs to run on multiple devices with different APIs using different programing languages on multiple mobile (and possibly forked) operating systems. But there are often appealing device features that require native code expertise, and even using a mobile development framework which deals with most of this complexity requires learning something new.

App development schedules can also be at the mercy of app store approvals and not-always-predictable operating system updates.

As unlikely as it is to come up with a meaningful answer to the catchy (and borrowed) title of this post, executives need good estimates of the time and effort in developing specific mobile apps. But experience in developing mobile apps is still slim in many organizations and more non-technical managers are now involved in approving and paying for app development. So even limited information on length of effort can provide useful data points.

I found the survey that informed the Visual.ly infographic below via ReadWrite at How Long Does It Take To Build A Native Mobile App? [InfoGraphic]). It involved 100 iOS, Android and HTML5 app developers and was done by market research service AYTM for Kinvey, provider of a cloud backend platform for app developers.

Their finding? Developing an iOS or Android app takes 18 weeks. I didn’t see the survey questions so don’t know whether whether 18 weeks was an average of actual developments, opinions on what it should take, or something else.

Of course there are simple apps that can be created in a few days and some that will take much longer, but in either case the level of effort is almost always underestimated. Even with all the unanswered questions about resources etc., the infographic raises, the 18 week finding may helpfully temper somebody’s overly optimistic expectations.

Read More

Launching Your Search for Enterprise Search Fundamentals?

It’s the beginning of a new year and you are tasked with responsibility for your enterprise to get top value from the organization’s information and knowledge assets. You are the IT applications specialist assigned to support individual business units with their technology requests. You might encounter situations similar to these:

  • Marketing has a major initiative to re-write all product marketing pieces.
  • Finance is grappling with two newly acquired companies whose financial reports, financial analyses, and forecasts are scattered across a number of repositories.
  • Your Legal department has a need to categorize and analyze several thousand “idea records” that came from the acquired companies in order to be prepared for future work, patenting new products.
  • Research and development is attempting to categorize, and integrate into a single system, R&D reports from an existing repository with those from the acquisitions.
  • Manufacturing requires access to all schematics for eight new products in order to refine and retool manufacturing processes and equipment in their production area.
  • Customer support demands just-in-time retrieval and accuracy to meet their contractual obligations to tier-one customers, often from field operations, or while in transit to customer sites. The latter case often requires retrieval of a single, unique piece of documentation.

All of these groups have needs, which if not met present high risk or even exposure to lawsuits from clients or investors. You have only one specialist on staff who has had two years of experience with a single search engine, but who is currently deployed to field service operations.

Looking at just these few examples we can see that a number of search related technologies plus human activities may be required to meet the needs of these diverse constituents. From finding and assembling all financial materials across a five-year time period for all business units, to recovering scattered and unclassified emails and memos that contain potential product ideas, the initiative may be huge. A sizable quantity of content and business structural complexity may require a large scale effort just to identify all possible repositories to search for. This repository identifying exercise is a problem to be solved before even thinking about the search technologies to adopt for the “finding” activity.

Beginning the development of a categorizing method and terminology to support possible “auto-categorization” might require text mining and text analysis applications to assess the topical nomenclature and entity attributes that would make a good starting point. These tools can be employed before the adoption of enterprise search applications.

Understanding all the “use-cases” for which engineers may seek schematics in their re-design and re-engineering of a manufacturing plant is essential to selecting the best search technology for them and testing it for deployment.

The bottom line is there is a lot more to know about content and supporting its accessibility with search technology than acquiring the search application. Furthermore, the situations that demand search solutions within the enterprise are far different, and their successful application requires far greater understanding of user search expectations than Web searching for a product or general research on a new topic.

To meet the full challenge of providing the technologies and infrastructure that will deliver reliable and high value information and knowledge when and where required, you must become conversant with a boatload of search related topics. So, where do you begin?

A new primer, manageable in length and logical in order has just been published. It contains the basics you will need to understand the enterprise context for search. A substantive list of reading resources, a glossary and vendor URL list round out the book. As the author suggests, and I concur, you should probably begin with Chapter 12, two pages that will ground you quickly in the key elements of your prospective undertaking.

What is the book? Enterprise Search (of course) by Martin White, O’Reilly Media, Inc., Sebastopol, CA. © 2013 Martin White. 192p. ISBN: 978-1-449-33044-6. Also available as an online edition at: http://my.safaribooksonline.com/book/databases/data-warehouses/9781449330439

Read More

Enterprise Search Strategies: Cultivating High Value Domains

At the recent Gilbane Boston Conference I was happy to hear many remarks positioning and defining “Big Data” and the variety of comments. Like so much in the marketing sphere of high tech, answers begin with technology vendors but get refined and parsed by analysts and consultants, who need to set clear expectations about the actual problem domain. It’s a good thing that we have humans to do that defining because even the most advanced semantics would be hard pressed to give you a single useful answer.

I heard Sue Feldman of IDC give a pretty good “working definition” of big data at the Enterprise Search Summit in May, 2012. To paraphrase is was:

  • > 100 TB up to petabytes, OR
  • > 60% growth a year of unstructured and unpredictable content, OR
  • Ultra high streaming content

But we then get into debates about differentiating data from unstructured content when using a phrase like “big data” and applying it to unstructured content, which knowledge strategists like me tend to put into a category of packaged information. But never mind, technology solution providers will continue to come up with catchy buzz phrases to codify the problem they are solving, whether it makes semantic sense or not.

What does this have to do with enterprise search? In short, “findability” is an increasingly heavy lift due to the size and number of content repositories. We want to define quality findability as optimal relevance and recall.

A search technology era ago, publishers, libraries, content management solution providers were focused on human curation of non-database content, and applying controlled vocabulary categories derived from decades of human managed terminology lists. Automated search provided highly structured access interfaces to what we now call unstructured content. Once this model was supplanted by full text retrieval, and new content originated in electronic formats, the proportion of human categorized content to un-categorized content ballooned.

Hundreds of models for automatic categorization have been rolled out to try to stay ahead of the electronic onslaught. The ones that succeed do so mostly because of continued human intervention at some point in the process of making content available to be searched. From human invented search algorithms, to terminology structuring and mapping (taxonomies, thesauri, ontologies, grammar rule bases, etc.), to hybrid machine-human indexing processes, institutions seek ways to find, extract, and deliver value from mountains of content.

This brings me to a pervasive theme from the conferences I have attended this year, the synergies among text mining, text analytics, extractor/transformer/loader (ETL), and search technologies. These are being sought, employed and applied to specific findability issues in select content domains. It appears that the best results are delivered only when these criteria are first met:

  • The business need is well defined, refined and narrowed to a manageable scope. Narrowing scope of information initiatives is the only way to understand results, and gain real insights into what technologies work and don’t work.
  • The domain of content that has high value content is carefully selected. I have long maintained that a significant issue is the amount of redundant information that we pile up across every repository. By demanding that our search tools crawl and index all of it, we are placing an unrealistic burden on search technologies to rank relevance and importance.
  • Apply pre-processing solutions such as text-mining and text analytics to ferret out primary source content and eliminate re-packaged variations that lack added value.
  • Apply pre-processing solutions such as ETL with text mining to assist with content enhancement, by applying consistent metadata that does not have a high semantic threshold but will suffice to answer a large percentage of non-topical inquiries. An example would be to find the “paper” that “Jerry Howe” presented to the “AMA” last year.

Business managers together with IT need to focus on eliminating redundancy by utilizing automation tools to enhance unique and high-value content with consistent metadata, thus creating solutions for special audiences needing information to solve specific business problems. By doing this we save the searcher the most time, while delivering the best answers to make the right business decisions and innovative advances. We need to stop thinking of enterprise search as a “big data,” single engine effort and instead parse it into “right data” solutions for each need.

Read More

Integrating External Data & Enhancing Your Prospects

Most companies with IT account teams and account selling strategies have a database in a CRM system and the company records in that database generally have a wide range of data elements and varying degrees of completeness. Beyond the basic demographic information, some records are more complete than others with regard to providing information that can tell the account team more about the drivers of sales potential. In some cases, this additional data may have been collected by internal staff, in other cases, it may be the result of purchased data from organizations like Harte-Hanks, RainKing, HG Data or any number of custom resources/projects.

There are some other data elements that can be added to your database from freely available resources. These data elements can enhance the company records by showing which companies will provide better opportunities. One simple example we use in The Global 5000 database is the number of employees that have a LinkedIn profile. This may be an indicator that companies with a high percentage of social media users are more likely to purchase or use certain online services. That data is free to use. Obviously, that indicator does not work for every organization and each company needs to test the data correlation between customers and the attributes, environment or product usage.

Other free and interesting data can be found in government filings. For example, any firm with benefit and 401k plans must file federal funds and that filing data is available from the US government. A quick scan of the web site data.gov  shows a number of options and data sets available for download and integration into your prospect database. The National Weather Center, for example, provides a number of specific long term contracts which can be helpful for anyone selling to the agriculture market.

There are a number things that need to be considered when importing and appending or modeling external data. Some of the key aspects include:

  • A match code or record identifier whereby external records can be matched to your internal company records. Many systems use the DUNS number from D&B rather than trying to match on company names which can have too many variations to be useful.
  • The CRM record level needs to be established so that the organization is focused on companies at a local entity level or at the corporate HQ level.  For example, if your are selling multi-national network services, having lots of site recrods is probably not helpful when you most likely have to sell at the corporate level.
  • De-dupe your existing customers. When acquiring and integrating an external file — those external sources won’t know your customer set and you will likely be importing data about your existing customers. If you are going to turn around and send this new, enhanced data to your team, it makes sense to identify or remove existing clients from that effort so that your organization is not marketing to them all over again.
  • Identifying the key drivers that turn the vast sea of companies into prospects and then into clients will provide a solid list of key data attributes that can be used to append to existing records.  For example, these drivers may include elements such as revenue growth, productivity measures such as revenue per employee, credit ratings, multiple locations or selected industries.

In this era of marketing sophistication with increasing ‘tons’ of Big Data being available and sophisticated analytical tools coming to market every company has the opportunity to enhance their internal data by integrating external data and going to market armed with more insight than ever before.

Learn more about more the Global 5000 database

 

Read More

Technology and IT Spending Metric Options

When planning for global market growth and sizing up the opportunities in various countries, there is often a lack of data available from various industry sources. One could look at GDP figures or population data by country – both of those have some limitations. A better gauge might be to look at those business entities that generate the most revenue in each country as they will help contribute to other businesses in the geography and in general, raise the level of B2B activity overall.

Diving into the data of the Global 5000 companies – the 5000 largest companies in the world based on revenue – we find a couple of different ways to help guide your estimates of market size and rank order.

The first list is the top 10 countries by number of firms in our Global 5000 database with HQ in the country.

  • USA – 2148
  • Japan – 334
  • China – 221
  • UK – 183
  • Canada – 124
  • Germany – 98
  • France – 84
  • Australia – 77
  • India – 76
  • Italy – 65

For each company in the database, there is an estimate for the amount spent on IT – both internal and external costs. When we take those amounts for each country and look at the average IT spending for these leading firms, we see a different order of countries which would also prove to be attractive targets.

  • France – $902 million per company
  • Germany
  • Netherlands
  • Spain
  • Venezuela
  • Italy
  • China
  • Switzerland
  • South Korea
  • New Zealand – $545 million per company

Of course, all these companies are the biggest of the big and not all companies in that country will spend at that level — but it is indicative of the relative IT spending on a country basis and again shows some of the potential for attractive markets as you eye global opportunities.

Learn more about more the Global 5000 database

Read More

Enterprise Search is Never Magic

How is it that the blockbuster deals for acquiring software companies that rank highest in their markets spaces seem to end up smelling bad several months into the deals? The latest acquisition to take on taint was written about in the Wall Street Journal today, noting that HP Reports $8.8 Billion Charge on Accounting Misstatement at Autonomy. Not to dispute the fact that enterprise search megastars Fast (acquired by Microsoft) and Autonomy had some terrific search algorithms and huge presence in the enterprise market, there is a lot more to supporting search than the algorithms.

The fact that surrounding support services have always been essential requirements for making these two products successful in deployment has been well documented over the years. Hundreds of system integrators and partner companies to Microsoft and Autonomy do very well making these systems deliver the value that has never been attainable with just out-of-the-box installations. It takes a team of content, search and vocabulary management specialists to deliver excellent results. For any but the largest corporations, the costs and time to achieve full implementation have rarely been justifiable.

Many fine enterprise search products deliver high value at much more reasonable costs, and with much more efficient packaging, shorter deployment times and lower on-going overhead. Never to be ignored is that enterprise search must be accounted for as infrastructure. Without knowing where the accounting irregularities (also true with Fast) actually lay, I suspect that HP bought the brand and the prospective customer relationships only to discover that the real money was being made by partners and integrators, and the software itself was a loss leader. If Autonomy did not bring with it a solid service and integration operation with strong revenues and work in the pipeline, HP could not have gained what it bargained for in the purchase. I “know” nothing but these are my hunches.

Reflecting back on a couple of articles (If a Vendor Spends Enough… and Enterprise Search and Collaboration…) I wrote a couple of years ago, as Autonomy began hyping its enterprise search prowess in Information Week ads, it seems that marketing is all the magic it needed to reel in the biggest fish of all – a sale to HP.

Read More

Tablets in the Enterprise and BYOD strategies

A couple of observations about tablets in the enterprise:

  • Tablets of all dimensions have a role in enterprise use, as do all types of personal computing devices.
  • BYOD is certainly a challenge for some organizations, but is a reminder of how we should have been managing data all along.

Tablets and other personal computing devices in the enterprise
One reaction to Apple’s iPad mini last week was that it would change the dynamic of Apple’s market for tablets since a 7″ inch tablet is more appropriate for consumers so enterprises would stick to the 10″ versions. The only thing correct about this view is that the tablet market will change. But we don’t know how – use-cases are evolving and there are way too many variables beyond physical size. It seems just as likely that the iPad mini form-factor could grow faster in enterprises than the full size iPad. In any case there are certainly enterprise use cases for a smaller, cheaper iPad, especially since those seem to be the only significant differences, and there is no apparent app development cost or learning curve further easing enterprise adoption.

But the bigger point is that enterprises need to be able to support not only multiple tablet and smartphone form factors but a large subset of an unpredictably large set of personal device types.

This is not a new challenge, it is simply one that is accelerating because of the decreasing costs and increasing ease of device development. “Personal” devices in enterprises are not new – employees have often used their own personal computers especially as they shrunk in cost and to BYOD notebook size. Tablets and phones are the next step, but enterprises will soon be dealing with watches, wearable computing, and implants which is why…

BYOD strategies need to focus on the data not the devices
The BYOD continuum is also largely additive – employees aren’t just replacing devices but often using multiple devices to access and process much of the same data – keeping up with the variety and volume and versions of personal devices is hopeless. A BYOD management strategy that focuses on device management will at best have a negative impact on productivity, will certainly increase costs, and most likely fail. There are environments and applications where data security is critical enough to warrant the overhead of a device management strategy that approaches being fail-proof, but even in these cases the focus should be on the data itself with device control as a backup where it makes sense.

It may not be much easier to manage the data independently but that’s the ball to keep your eye on.

Read More

Frank Gilbane interview on Big Data

Big data is something we cover at our conference and this puzzles some given our audience of content managers, digital marketers, and IT, so I posted Why Big Data is important to Gilbane Conference attendees on gilbane.com to explain why. In the post I also included a list of the presentations at Gilbane Boston that address big data. We don’t have a dedicated track for big data at the conference but there are six presentations including a keynote.

Not coincidentally, I was also interviewed on the CMS-Connected internet news program about big data the same week, which gave me an opportunity to answer some additional questions about big data and its relevance to the same kind of  audience. There is still a lot more to say about this, but the post and the interview combined cover the basics.

The CMS-Connected show was an hour long and also included Scott and Tyler interviewing Rob Rose on big data and other topics. You can see the entire show here, or just the 12 twelve minute interview with me below.

Read More

Private Companies and Public Companies – Sizing up IT Spending

One aspect of the Global 5000 company database is that we include all types, shapes and locations of companies including those that are publicly listed as well as private firms. For those who sell to corporations (as opposed to consumers) there is a great deal of interest in private companies. A lot of this can be attributed to the fact that public companies have to disclose so much about their size, shape and all aspects of their organizations – most everyone knows or can find out what they need to. Privates, on the other hand, are less well known and hold the allure that there is great, undiscovered opportunity in there.

To get a sense of the dynamics of the public/private we examined a number of metrics related to companies in the Global 5000 database.  It is true that more large companies are publicly traded. Of the 5000 companies, nearly 4,000 are public and just over 1,000 are private. That is the inverse of the market as a whole where most companies in any country or industry are private. Here are a few facts about each group.

  • The average revenue for a public company in the Global 5000 is $10.3 billion while the private companies averaged $10.6 billion
  • Public companies reported an average revenue per employee of $214,000 while private companies were just over $282,000
  • For both 2010 and 2011, revenue for both public and private companies grew by slightly more than 11.5%. Virtually no difference.
  • In both cases, IT spending per company is over $290 million and approximately 2.7% of revenue.
  • Total IT spending for Global 5000 public companies is approximately $1.1 trillion while private Global 5000 companies will spend about $300 billion.

The bottom line here is that big is big. It does not make much difference if the company is public or private, the big guys will spend a lot on a wide variety of products and services including IT products and services. The real difference is in the number of these large opportunities there are. Just because we find a few of these nuggets among the privates, does not mean all privates look alike.  Most are quite a bit smaller.

Learn more about more the Global 5000 database

Read More
Page 3 of 19«12345»10...Last »