Unfortunately, IT consultants are often technology-driven and love their buzzwords: cloud, mobile, big data, Internet of things, digital transformation. Best used often in combination … “digital transformation with mobile big data in the cloud” is a hard thing to disagree with.
Please don’t get me wrong. The intellectual and technical achievements covered by these buzzwords are astounding: growth has been exponential in processing power, storage and algorithmic complexity. However, no single technology is a magic bullet.
I would like to make two cautionary observations:
- Big data means many things to many people. A recent ACM Transactions technical paper on massive parallel storage starts with the statement that “The term big data is vague enough to have lost much of its meaning” with the result that as soon as you say the term you have to say what you mean by it. In-memory computing? Text analytics? High-performance computing (HPC)? Massively parallel storage? Streaming analytics? Domain-relevant query languages? The answer is all of the above: it just depends on what your perspective is – and the technology you are starting from. So, for example, an oil service company Baker Hughes’ big data story focuses on HPC and visualization. SAP talks about in-memory databases. Control vendors focus on streaming data and alarms. SAS Institute talks about statistics and Splunk about log records. Big data is also sometimes made to mean the same as Artificial Intelligence or machine learning. Big data is a tool box: pick and mix what you need for a specific task … and don’t use a hammer if you need to put in a screw.
- Senior management in companies are being fed balderdash about big data. I will quote two fresh examples, although I could have chosen many from the mid 1990s as well:
- First take the November 2014 issue of Harvard Business Review. A side-box, in an otherwise excellent article on GE’s internet of things, tells us that “unlike analogue signals, digital data is perfectly transmitted” and that this drives digital transformation. This is not even true at a technical level, as I experience every week as I have to download a certain newspaper twice to my iPad because of data corruption during the download. More importantly, this statement ignores the analogue or physical things or people at either end of the transmission. What is transmitted perfectly is all too often inaccurate, incomplete, mistaken or a lie. Perfectly transmitted garbage remains garbage.
- McKinsey Quarterly recently published and article on “Artificial Intelligence meets the C-suite”. The article recommends that the C-suite executive needs to become “data driven” and that “domain expertise” is a barrier to this: a prejudice or “survivor bias”. One of the areas of domain analysis that can be replaced is G&G: “The oil and gas industry, for instance, has incredibly rich data sources.” …. We then get a description of drill logs and seismic data sets … and in concluding “Now these are incredibly rich and complex data sets and, at the moment, they’ve been mostly manually interpreted. And when you manually interpret what comes off a sensor on a drill bit or a seismic survey, you miss a lot of the richness that a machine-learning algorithm can pick up.” The authors conclude therefore that “the best thing you can possibly do is to get rid of the domain expert who comes with preconceptions about what are the interesting correlations or relationships in the data and to bring in somebody who’s really good at drawing signals out of data.”
This sort of wild triumphalism is scary. It extrapolates from “data about people and what they say: e,g. Google and Facebook” to “data about the physical world.” This attitude ignores physics and the physical, well-founded correlations and relationships in engineering models. It arrogantly overlooks the professional knowledge and skill spent building physically meaningful models of the world and says that a brute-force statistical or pattern-matching algorithm is superior.
We need both physical models and statistical models, working together. This means we also need data scientists and engineers to work with each other rather than against each other. McKinsey’s proposals are dangerous and counterproductive.
Leave a Reply