Data waves

Yotta lot we got

Pro
Source: Stockfresh

10 April 2014

All data is Big Data when it comes to analysis. Not that we seem to analyse much anymore: we perform analytics. Which is quickly coming with everything. Bookkeeping for Beginners—with Analytics. Smart Meter—with Analytics. Family thermometer—with analytics. In truth, that fantasy is probably valid and A Good Thing. Think of your car. It had, and has, the original dashboard, with a few dials showing important information such as how much fuel you have left. And your speed, if that concerns you. Recently dashboards have become the metaphor for on-screen graphic indicators showing Business Intelligence information (BI) in at least near-real time.

It seems inevitable that they will return to actual vehicles, with smart engine diagnostics, temperature and proximity and other sensors. We have had garage diagnostics for some years, with the mechanic plugging in a laptop to check systems. So why not have the key ones displayed live for the driver or owner? The full vehicle history can be recorded on a tamper-proof hard drive or other black box, not to mention the driver cameras favoured by many people today for insurance or driving offence evidence.

Sounds useful, sounds inevitable. How much data would be generated or stored? We know the answer to that from every shift in ICT over decades: ‘enough’ is never the limit. So what data might a saloon car have after a few years of driving? A terabyte, several of them? But wait a sec: commercial vehicles, like airplanes, surely require even more comprehensive data? Should we omit motorbikes? We are moving towards autonomous vehicles, both as people transports and drones. Surely they will be even more data-intensive? Multiply all of that by a couple of billion vehicles globally and even Google’s data ambitions will be challenged—or even the NSA’s.

“If we could count every grain of sand it would be an astounding feat of meaninglessness. If we could understand the way grains are formed, on the other hand, then we would have an understanding of the dynamics. That could then be applied to the preservation of coastlines or reversal of dune erosion”

But once again the point of having large volumes of data — apart from history and archiving, which requires just a fraction of what we create and store — is to do something with it. These days that is mostly analysis. The crack about the National Security Agency is actually relevant, because right now that slightly tarnished agency is building a new data centre in Utah to handle multiple yottabytes of data. Now a yottabyte, as TechPro readers surely know, is as big as we have so far got in internationally agreed volume measurements. A yottabyte (YB) is 1024 bytes or 1,000 zettabytes.

That zettabyte is equivalent to every grain of sand on the earth’s beaches, according to one not so whimsical calculation. It is a trillion terabytes, although a terabyte today is just an unimpressive desktop unit. Our friends in Wikipedia reckon that a yottabyte on 64GB micro SD cards would amount to 2.5 million cubic metres or about the volume of the Great Pyramid of Giza. But time moves on, 128GB micro cards are now on the market so it is only a half-pyramid. Building in Utah where land, sunlight and security are cheap the NSA centre has a mere $2 billion budget. A pittance, as Intel Ireland might testify.

On the other hand the impressive thing about the new NSA centre is that it plans not just to store an initial yottabyte or so but to process the data — now that is ambition. Given that in-memory computing is standard today for serious analytics, the idea of yottabytes or even zettabytes in RAM or solid state storage is mind-blowing or boggling or whatever. For the moment. We do love our computer hardware. From the earliest days and fiction and films, the supercomputer is a big iron beast. That little high capacity micro SD card is conceptually wonderful, and the capacity will surely climb, but it’s the giant Google in the desert data centres that catch our imagination. Handheld and wrist gadgets and wearable kit of all kinds is an exception, of course, including the Great Oculus Paradigm Rift, but really that is all about the experience rather than the computing or the power. In fact solid state computing is really not conducive to being romanticised. No sound, no moving parts — is it even working?!

Unfortunately, Really Very Big Data is in truth much the same. Yotta and Zetta (and baby sibling Peta) sound like exotic ladies but they are just mathematics with not much science and no interesting hint of fiction. The grains of sand on beaches is at least an image, one that used to be invoked in literature and religion to help explain infinity. Not that we know there are more celestial bodies than those grains of sand that seems almost medieval.

What CIOs and everyone else in ICT now has to understand and face, from the ‘ordinary user’ to the leaders of nations and corporations, is that it is actually becoming more like religion and theology: what really counts is the meaning of it all. If we could count every grain of sand it would be an astounding feat of meaninglessness. If we could understand the way grains are formed, on the other hand, or how the billions of silica and rock particles are formed and interact and change formation, then we would have an understanding of the dynamics. That could then be applied to the preservation of coastlines or reversal of dune erosion. The understanding and modelling of the physics might also assist in the manufacture of better cement or food mixes or pharmaceuticals or road surface dressings.

Understanding and modelling will lead to prediction, in the confident expectation of business and the tentative hope of scientists. The foundation is analysis and the converging sciences involved in the emerging field and profession of data analytics. At this stage there is no doubt. That is where the forward attention of any CIO should now be focussed. Business analytics is now by a long stretch the most important and paradigm-shunting area of ICT. It is also, not insignificantly, probably the most promising responsibility that falls within the remit of the CIO and the line of progress most likely to bolster the importance of the role.

“Business analytics is now by a long stretch the most important and paradigm-shunting area of ICT. It is also probably the most promising responsibility that falls within the remit of the CIO and the line of progress most likely to bolster the importance of the role”

‘Actionable analytics’ is the useful term from Gartner, which also stresses that it is now key to competitive survival. The big picture is what we are still calling Big Data. The small but urgent local picture is that we are generally not harvesting as much information as we could from the ever-growing masses of data we generate. A newer twist is that regardless of the scale of the enterprise there is a similarly fast-growing body of external data that could be similarly mined for relevant complementary inputs.

The current cliché about Big Data and Analytics is the Three Vs — Volume, Variety and Velocity. But most experts now agree that we can pretty well dispense with V for Volume because, like the yottabyte, it is not really significant in itself. There is another V — Veracity — that data science points us firmly at because you can only meaningfully combine data from different sources when you understand the relative reliability of each stream. Today we can easily draw together information from multiple sources to analyse and model.  Traditional in-house data is probably quite well structured, especially in ERP or CRM or similar systems. Other types of data that you own may offer the challenge of variety, for example in applied social media, but you can either trust it or you know the possibilities of error.

Trusting third party sources and social media is a whole new field in itself. There is an increasing number of public data stores that are close to unimpeachable, especially where there is historic or factual information — the weather in Cork in the first week of June or the locations of traffic lights in Dublin or the maps of underground utilities (mostly). There are other possibilities that are less empirical, such as your own analyses of social media, for example, or subscription analytics services that have evolved from traditional market research.

Back to the key point: enterprises are discovering and demonstrating that there are usable and valuable insights to be gained. The strategic employment of high end analytics as is now clearly recognised as a key element of the relative market performance of major corporations. There is actually no reason why smaller organisations might not benefit as much or more, other than perhaps the investment level for the systems or services and the skills to interpret and apply the fruits of the analysis.

Some types of business will benefit most from very fast or real-time analytics. But there are many others that require no more than occasional in-depth exercises and a bit of ‘what if..?’ modelling for annual or strategic planning to be enormously valuable. Very often even the required speed is no more than what might be called business realistic, such as the following morning reports that managers have relied on for decades. But with much more depth and dimension.

Aligning the business and the technology is the long term mantra ritually applied to the CIO job. Data analytics epitomises all of the elements of that aspiration. It is time for all CIOs to take ownership of the analytics function, as the leaders and the masters of the data sciences. Everything else is digital machinery.

Read More:


Back to Top ↑

TechCentral.ie