CIO Folder: The secrets are all in the past

Pro

Pictured: Ryan Meadows, Pia Heilmann and Ben Jackson, Klaviyo; Michael Lohan, IDA Ireland

1 September 2012

An unaccustomed burst into topicality and the RBS and Ulster Bank Great Technical Incident last month has triggered the notion that the quiet byways of computing history are in fact littered with the forgotten debris of similar Technical Incidents. Unkind media commentators call them Computer Disasters. But Ulster Bank at least has sufficient Irish cultural awareness to know that the historical sanction of terms like The Emergency and indeed The Troubles ensure the sheer blandness of Technical Incident keeps to the polite observances required by Board and Stakeholders without actually provoking the victims beyond their natural level of anger.

The gesture of a 0.25% additional deposit interest bonus for three months has a similarly light touch. Assuming the customer has a deposit in the first place, modest 0.25% interest changes have been the order of the day for some years. So the tone and style match the Technical Incident, even if the harsh arithmetic comes out at an even more modest 0.0625% for the year-€6.25 for every €1k invested. In traditional budget arithmetic that makes a pint and a glass or less than a packet of fags.

But that is sufficient sarcasm for the day. The key point is that the mighty RBS, like many big outfits in the past, proved more hapless in dealing with the aftermath than in whatever failures of technology or its management actually caused the disaster. Failure is always a lonely orphan, so it is not too surprising that it does not figure on the agenda for planning any business project or change. You could ring some changes on Roy Keane’s famous quoting of ‘Failing to plan is planning to fail’ maybe along the lines of ‘Total failure is failing to plan for what happens in the event of failure’.

But then again anyone in ICT is well aware of the constant threat of compound failure, or so we would all like to think. Murphy’s Law is far more powerful than Moore’s. Here in Ireland we have had our fair share of computer disasters. A gloom-monger might even reckon that the Ulster Bank ‘incident’ was an imported substitute for an overdue indigenous disaster of sufficiently significant proportions to compare with The Irish League of Credit Unions ISIS project (€35 million, abandoned) or the long shadow of PPARS (still No. 1 in both the total and overrun charts at €200 million) or the Iarnród Eireann signalling system (€45 million, 150% cost overrun). The Garda PULSE system ran more than €20 million over the original budget, has been operational for nearly a decade but is still incomplete by any standards other than perhaps those of the Department of Justice procurement pedants.

The Irish Blood Transfusion Board integrated IT project incurred much criticism for its €9 million cost on a €4.26 budget, but it worked and was credited with repaying its costs in just over two years. So it deserves honourable mention alongside the actual failures with which it has all too often been bracketed.

It is noticeable that our casual recall is dominated by public sector ICT screw-ups of various kinds and levels. It would be and is very tempting to list the eminent consulting firms which were involved in every single public sector project that overran, missed its objectives or failed in whole or in part. We will spare the blushes of the firm that took home €26 million for one of the projects mentioned above. Suffice it to say, as per the cliché, that every one of them is still in business and still a major consultancy brand and still not losing money. The Comptroller and Auditor General’s report on PPARS firmly identified the balance of risk as entirely on the state (i.e. client) side.

Of course the same or other expert consultants are usually involved with the bigger projects in the private sector. Which has an even worse record, informed experts internationally tend to agree, at least in part because in the past impatient boardrooms tended to think governance was for wimps and those darned computer guys could do it if they were motivated/threatened sufficiently. Some projects just cumulatively failed-compound failures fail better-such as the £260 million Sainsbury’s supply chain project write-off in 2005. One of our IT consultant doyens insists that every Irish large organisation has experienced a significant failed project in the course of the last two decades. Banks and big enterprise and private sector businesses can cover up or fudge the consequences of their mistakes in ways not open to the public sector.

All of which brings the appalling thought (or dreadful vista, if you prefer the British law lord report style) that failure may be an inevitable element of all ICT projects, programmes and programs, endeavours, jobs, changes and upgrades (hi there RBS). Hang on: we know that, don’t we? Disks grind to a halt, chips spark, back-ups get screwed up. We’ve been hard-wiring contingency provisions into IT systems for generations; redundancy and failover are as basic to IT engineers as prudent over-specification to their civil and mechanical colleagues.

Right, so the ICT collective community takes random or even inevitable (MTBF) failure as a given in hardware and networks. What about the actual stuff that counts, the software we use to do the work we want computing for in the first place? (Apologies for the baby talk, fellow techies, but this just may be read by senior management.)

Well yes, we aim to optimise all of our programming and testing and development processes, learn good practices and safeguards from past experience and constantly refine our development and project management models for success.

That still leaves the melancholy suspicion that once away from the tangible realities of hardware and electricity we consistently lapse into the original mistaken endeavour: we try to design and engineer failure out of the solution. Logically, that is then by definition assuming that when it is all finished, failure will not occur. We do not think failure is impossible, we do not plan to fail, but we sure as hell do not plan for failure. Yet we should.

Software systems only do what’s programmed into them. So we have been trying since the dawn of computer time to anticipate what is needed, might be needed, what might go wrong. Our constant objective is to be comprehensive. Enterprise application systems are the great territory for examples. Clients and managements want everything joined-up, integrated, single-view or whatever the marketing flavour of the month is. We end up federating enormous databases in real-time so that senior users can see a single summary figure at any time-graphically. Add the damn sums up when the report is needed, we old begrudgers say, federate the results not the bloody systems themselves, we add.

Risk management is the key, it is argued in the conventional wisdom. Hard to argue, of course, and certainly thinking backward from the possibility of failure has to be enlightening in trying to get a good handle on the likely triggers or causes of failure as well as the consequences. But that nagging voice comes back with the rejoinder that ‘risk’ in software, as in business generally, is just not amenable to measurement: it is simply not of the same kind as mechanical MTBF. Risk management tries to ‘measure’ or at least foresee probability but it cannot even estimate proportionality-or the lack of it. Knock-on effects of a systems failure are all too often excellent exemplars of the Butterfly Theory in action-wildly disproportionate. (Incidentally Butterfly Theory, together with the law of unintended consequences, should be recognised as a sub-set of Murphy’s Law, the unified field theory of chance).

Another dark suspicion that arises is that all of the IT failures other than simple hardware breakdown (and perhaps even much of that) are all too human in origin. We take human aspirations-alright, just the need for a reliable SCM or CRM or financial administration solution-and attempt to code them into software. We go further than chess grandmasters in trying to foresee every possible circumstance and contingency and building all of that in. Then we forge ahead, mesmerised by our own road maps and critical paths and project terms.

Major IT projects, including new software versions and products, conform to the old adage about advertising: more than half of the expenditure is going to be wasted. Unfortunately, it is impossible to predict where the sink holes will be. Serious research internationally suggests that a staggering four out of five major IT projects never delivered what was expected or required. You have to be fair, lots of projects, large and small, run into snags and delays, go over budget, cause disruption, tension and rows – and then the stuff slips into place, the system works and does most of what it set out to do and perhaps other things that were added along the way and everybody moves on.

But then there are the other ones: slippage from the first deadline, endless briefings and discussions and revisions, costs escalating, people working all the hours the gods made as the end draws near and so on. The results of all of this sometimes then work out OK but may also be put on hold or abandoned altogether. This type is often followed by a new team of consultants or experts to examine what went wrong – or prepare evidence for the court case. Australian expert Rob Thomsett in what he terms ‘Project pathology’ says with weary cynicism that there is one clear and easy test for a failing project in your organisation: try to stop it! "If a project is failing, a request to stop the project for planning sessions and time-outs will be met with the clearest indicator of a fatal disease: ‘We can’t stop the project for planning, we have a deadline to meet!’"

People-they screw up. Maybe it’s a learning process? Naah. It’s failure. Pick yourself up, dust yourself off…and start all over again.

Read More: CIO