Friday 29 August 2014

The magic of open data #1

"Sharing information across government databases
will dramatically increase governmental powers –
otherwise the UK government wouldn't have proposed it."
Professor Sir Nigel Shadbolt, Chairman, Open Data Institute

Lough Erne in County Fermanagh, Northern Ireland, "is a particularly scenic waterway, renowned for its beautiful setting. The area is popular for angling and watersports, with waterskiing, Rowing and wakeboarding being amongst the most popular; the stretch of water alongside the Broadmeadow, Enniskillen, has hosted stages of the World Waterski Championships annually since 2005, and in 2007, a pro-wakeboard competition, 'Wakejam' was hosted by the Erne Wakeboard Club (EWC) after successful national wakeboard competitions in the previous years. Canoeing is also a popular recreational sport on the Erne".

That's what it says in Wikipedia and that's where, on 18 June 2013, after a hard day's fishing and wakeboarding, the G8 canoed back to shore and issued their famous Declaration (para.7):
We, the G8, agree that open data are an untapped resource with huge potential to encourage the building of stronger, more interconnected societies that better meet the needs of our citizens and allow innovation and prosperity to flourish.
It is this Declaration that caused David Gauke MP, Exchequer Secretary to the Treasury, blithely to recommend standing the British Constitution on its head so that whereas we used to imagine that personal data submitted to the government would by default be treated as confidential, in future it will instead be treated as open, public and available to all:
... the UK helped secure the G8’s Open Data Charter, which presumes that the data held by Governments will be publicly available unless there is good reason to withhold it. (p.4)
Queen Méabh (Maev)
by Joseph Christian Leyendecker
Is it something in the water? "In Irish mythology and folklore, there are three tales about how the lake was formed and got its name. One says that it is named after a mythical woman named Erne, Queen Méabh's lady-in-waiting at Cruachan. Erne and her maidens were frightened away from Cruachan when a fearsome giant emerged from the cave of Oweynagat. They fled northward and drowned in a river or lake, their bodies dissolving to become Lough Erne ...".

It may not have been a fearsome giant that emerged from the cave of Oweynagat, of course. It may actually have been Rt Hon Francis Maude MP, Cabinet Office Minister, "JFDI", as he's known, who frightened poor Erne and her maidens to death. Whitehall folklore has it that he once told the Information Commissioner's Conference:
I want to bust the myths around the complexities of data sharing ... we aim to find effective ways of using and sharing data for the good of everyone ...
And it may not have been water that they were drinking when the G8 were "helped" – to use Mr Gauke's word – to agree to this inversion of the settled order (para.11-13):
Principle 1: Open Data by Default

We recognise that free access to, and subsequent re-use of, open data are of significant value to society and the economy.

We agree to orient our governments towards open data by default.

We recognise that the term government data is meant in the widest sense possible. This could apply to data owned by national, federal, local, or international government bodies, or by the wider public sector ...
The UK held the Presidency at the time and within limits they could launder their own policy through the G8 but what on earth possessed them to dream up open data by default?

Mr Gauke tries to blame Shakespeare:
Getting to this stage:
The Government published its response to the Shakespeare Review of Public Sector Information on 14 June 2013 ... (p.2)
That's Stephan, by the way, not William.

Stephan Shakespeare is the CEO of YouGov, the polling organisation, and he wrote An Independent Review of Public Sector Information [PSI].

We need to familiarise ourselves at this point with some of the lyrical vocabulary of our ancient and magical land. Here in the UK, Ordnance Survey, the Met Office, the Land Registry and Companies House are the four Trading Funds that together constitute the Public Data Group (PDG). The PDG brings in £143 million a year in revenue for the Exchequer by selling maps and weather forecasts and such like.

And Mr Shakespeare thinks that that's ridiculous. He wants to break the antique spell we live under in the UK and drag the country into the information age by giving away PSI for free to entrepreneurs. The eruption of innovation that results will expand the economy. That's the idea, at least:
It seems a straightforward decision to invest £143m to make Trading Fund data widely available is a relatively small price to pay to leverage wider economic benefits far exceeding this by orders of magnitude. (p.30)
But just when you think you've found a convincing prophet, he goes and spoils it by saying that:
Forecasting future benefits is also hard to predict. How businesses and individuals might use datasets in the future to generate new products and services and by implication impact economic growth, is equally unknown. (p.30)
In other words there isn't the slightest justification he can advance for saying that the unspecified wider economic benefits of giving away this PSI for free would exceed £143 million by uncounted orders of magnitude.

It can't be Shakespeare under whose influence the wetland sprites (Maude and Gauke?) were acting at Lough Erne. Who then?

Perhaps Tim Kelsey.

Mr Kelsey was for a while the Executive Director of Transparency and Open Data in Mr Maude's fiefdom, the Cabinet Office. A magnificent job title, and a doughty champion of open data he is and has been for years – this, for example, is a pronouncement he made in an article published in July 2009, Long live the database state:
If the next government, of whichever party, wants a better public sector it must encourage more use of personal data; not less. What should be done? Data sharing must be made easier, first by removing the legislative obstacles to sharing government databases.

... no one who uses a public service should be allowed to opt out of sharing their records ...

Nor can people rely on their record being anonymised ...
Unfortunately for Mr Kelsey, his so-called "" plan to collect all our previously confidential medical records and give them away to researchers fell apart in February 2014 when patients and doctors lost confidence in him. It can't have been him casting the open data spells four months later at Lough Erne and intoxicating the G8.

Which suggests that the guiding light may have been the charming Professor Sir Nigel Shadbolt, chairman and co-founder of the Open Data Institute, and the author of The spy in the coffee machine – the end of privacy as we know it (2008):

... sharing information across government databases will dramatically increase governmental powers – otherwise the UK government wouldn't have proposed it. (p.95)

... we should never forget that bureaucracies are information-thirsty, and will never stop consuming. Indeed, they will never even cut down. They will break or bend their own rules, and any prior specification of how information use will be limited, or data not shared, is not worth the paper it is printed on. (p.212)
Actually, he isn't the author, he's the co-author of that book, with Kieron O'Hara, his sometime PhD student. And Dr O'Hara is the sole author of Transparent Government, Not Transparent Citizens: A Report on Privacy and Transparency for the Cabinet Office, a work referred to by Stephan Shakespeare in his PSI report (p.34). And Professor Sir Nigel appeared in front of the Public Administration Select Committee (PASC) to give evidence jointly with Stephan Shakespeare. And Tim Kelsey and Professor Sir Nigel are or were both members of the Data Strategy Board and, as such, assisted with the production of Stephan Shakespeare's PSI report (p.4). And so it goes cabalistically on.

When Professor Sir Nigel and Stephan Shakespeare appeared in front of PASC they were bemoaning the fact that the Post Office Royal Mail had just been privatised and had taken the PAF with them onto their balance sheet. The PAF is the Postcode Address File and would have been given away to entrepreneurs for free if our two witnesses had had their way.

The Hon Bernard Jenkin MP, chairman of PASC, agreed.

Somewhat surprisingly.

After all, Professor Sir Nigel and Stephan Shakespeare gave no indication whatever how giving away the PAF for nothing would have made the economy grow.

And the PAF generates income. Naturally the government wanted to realise the best price possible for the Post Office Royal Mail. That's the coffee in the coffee machine you can smell. Wake up.

They've had more luck with Companies House:
Free Companies House data to boost UK economy

Companies House is to make all of its digital data available free of charge. This will make the UK the first country to establish a truly open register of business information ...

This is a considerable step forward in improving corporate transparency; a key strand of the G8 declaration at the Lough Erne summit in 2013.

It will also open up opportunities for entrepreneurs to come up with innovative ways of using the information ...
The "digital data available" from Companies House includes the title, name, address, date of birth, nationality and profession of every director and every company secretary of every company in the UK – "the end of privacy as we know it".

Do they imagine that thousands of very bright people haven't been thinking of "innovative ways of using the information" for several decades now? What are they supposed to have missed? Companies House don't say. Just like Stephan Shalespeare, who couldn't tell us how many orders of magnitude his leveraged wider economic benefits would exceed the PDG income by.

There's one obvious application of the Companies House data. Suppose you're a 50 year-old female Hungarian surveyor living in Kent and suppose that you want to establish a false identity for some entrepreneurial purpose. Not so easy in the past but now, with the Companies House data available to you for free, you can search for suitable matches in the comfort of your own home. Thank you Messrs Maude, Gauke, Jenkin, Shadbolt, O'Hara, Shakespeare and Kelsey – the answer to Queen Méabh's prayer.

Apart from that, there's no telling what sort of innovation these people are talking about. It just looks like hope. Or guesswork. Will giving away an entire country's personal data inspire innovation? How? Why? Are there any examples? If it's that easy to create innovation, are the universities wasting their time doing research? Are we wasting our money funding it? Why bother granting corporate tax relief on R&D? Is there no downside? Can nothing go wrong? Which economy will benefit? Suppose the innovators are all Estonian – how does that help the UK economy?

You may not be able to answer those questions and all the other related questions that occur to you. We know that Tim Kelsey can't. Neither can Stephan Shakespeare – he just says that anyone standing in the way of open data wants people to die of cancer and wants children to be unhappy. Shroud-waving. Blackmail.

But Professor Sir Nigel is a different kettle of fish. Very different. Can he answer the questions? Can he move the debate on from the enchanted world of Lough Erne, out of the twilight and into the open?

That is the subject of a future post which if it is ever written will be based on this talk he gave:

Prof. Sir Nigel Shadbolt - The Fifth Paradigm: From Open Data to Social Machines


Updated 26.3.15

God but Tuesday was an odd day.

Tuesday 24 March 2015, out of the blue, inattendu by any of us proles, came the surprising announcement that Public Servant of the Year ex-Guardian man Mike Bracken CBE CDO had been appointed the UK government's Chief Data Officer thereby making him Public Servant of the Year ex-Guardian man Mike Bracken CBE CDO CDO given that he was already the Chief Digital Officer.

Not that that need concern normal people.

But Twitter went wild, as hundreds of breathless congratulations poured in from all over the world the second best one being:

It's a good question. Along with how do you fit it in with being executive director of the Government Digital Service and senior responsible owner of the pan-government identity assurance programme, GOV.UK Verify (RIP)?

The best tweet was:

Rarely can 94 characters have been freighted with quite so much meaning.

Updated 8.10.15

Tim Kelsey's was meant to start operations 18 months ago in April 2014. That's when, for the first time, the medical records maintained by our GPs (general practitioners/family doctors/the primary care providers) were supposed to be collected centrally by NHS England. There was vociferous opposition from patients and GPs centred on the absence of any thought in about the confidentiality of medical records. NHS England postponed the start by about six months to the autumn of 2014, NHS England acts in response to concerns about information sharing – statement from Tim Kelsey, National Director for Patients and Information.

In October 2014 we learned that there wouldn't be a national roll-out, just a regional pilot, "GP-led clinical commissioning groups in four areas of the country are to help develop the programme as it moves into a ‘pathfinder stage’ ...".

And when would this pilot start?

Clearly not in autumn 2014. In December 2014, the Independent Information Governance Oversight Panel said that the pilot could start just as soon as 27 currently outstanding questions were satisfactorily answered and seven tests were passed.

In written evidence to the Health Committee dated 9 February 2015, the chairman of the Health & Social Care Information Centre revealed that over 700,000 people had opted out of and that "the HSCIC does not currently have the resources or processes to handle such a significant level of objections".

In March 2015 Tim Kelsey told us on hold until election.

By June 2015, after the election, you could take your pick. Either those 700,000 people could opt out of but they could forget about receiving any healthcare. Or their opt-outs would be ignored and their data would be sold to insurance companies anyway. Will Jeremy Hunt ensure that “700,000” patient opt-outs are respected?, medCondifential wanted to know, while the Telegraph newspaper warned us that Nearly 1million patients could be having confidential data shared against their wishes.

And now?
Tim Kelsey to leave NHS England

17 September 2015 - 12:00

He has been appointed commercial director at Telstra Health, a division of Australia’s leading telecommunications provider where he will lead development of new digital and mobile solutions for patients, professionals and citizens around the world ...

Updated 3.11.15

There's something odd about a tweet of Nigel Shadbolt's today.

We know that he and Stephan Shakespeare were hacked off at the Post Office Royal Mail keeping control of the Postcode Address File (PAF) when it was privatised.

Maybe so, but that's the law, the PAF was and is the Post Office Royal Mail's intellectual property (IP) and, if there's some value to be derived from it, it would have been remiss of the management to give it up.

Entrepreneurs can still access the PAF, it's not lost to them. They just have to pay for the privilege.

Did the UK economy lose anything thereby? There is this assumption in some versions of the case for open data that free access to data causes innovation and/or that paid access inhibits innovation.

It is questionable whether that assumption is true. There has not yet been an explosion of innovation caused by Companies House's data becoming freely available. The considerable innovations of DueDil, on the other hand, all took place while the data had to be paid for.

Which makes Sir Nigel's use of the word "contaminated" sound more like something coming out of the mouth of a fundamentalist zealot than the urbane academic we are used to.

Flicking through your copy of Volume 38 of the Journal of Contemporary Asia, you will have come across this on p.546 ...
While the bourgeoisie was relatively small, its representative ideology none the less penetrated other classes. Members of the proletariat could be corrupted by modes of thinking characteristic of the bourgeoisie and take up the "stand" of this class (Mao, 1974a: 73; The Polemic, 1965: 33, 421-2). The proletariat was therefore compelled to wage an ideological struggle to divest members of its own class of bourgeois contamination, and to remould the thought patterns of the bourgeoisie (Ch'en, 1970: 107, 117, 123; Mao, 1977b: 409-10, 504).
... and you may agree that we can do without any recurrence of Maoist "ideological struggle", otherwise the speech given today by the eminently bourgeois Matt Hancock, Cabinet Office Minister, on the topic of data-driven government takes on a sinister, minatory hue.

Mr Hancock has established a "Steering Group of digital and data visionaries" who will drive the agenda on data-sharing and data-driven policy-making. That Steering Group includes Sir Nigel, among others. Here's hoping that none of these stoutly proletarian visionaries becomes contaminated.

Updated St Patrick's Day 2016

The G8 fell for it, see above, or at least pretended to fall for it:
We, the G8, agree that open data are an untapped resource with huge potential to encourage the building of stronger, more interconnected societies that better meet the needs of our citizens and allow innovation and prosperity to flourish.
That was back in June 2013.

Matt Hancock, Cabinet Office Minister, fell for it, or so he said, when he launched the current consultation on data-sharing:
There is huge potential for improving citizens’ lives through data sharing in the UK. The consultation we launch today will help make sure we get data right and bolster security whilst making people’s lives better.
The Chancellor fell in with falling for it in yesterday's Budget:
1.251 This Budget sets out steps to ensure the benefits of digital technology are felt by all businesses and individuals. The government will ... provide up to £5 million to develop options for an authoritative address register that is open and freely available – making wider use of more precise address data and ensuring it is frequently updated will unlock opportunities for innovation ...
The Government Digital Service plan to declare the new national identity assurance scheme to be live in a few weeks time. It's nothing but a machine for collecting your personal information and sharing it widely in the UK and abroad, out of your control.

The pretence that these initiatives are intended to expand the economy is just that, a pretence. Opening up data to all and sundry does not cause innovation.

The G8 Declaration is quire clear. The intention is to invert the Constitution (p.4) ...

... the UK helped secure the G8’s Open Data Charter, which presumes that the data held by Governments will be publicly available unless there is good reason to withhold it.

... all in the name of the bone-headed plan to compile the registers which, together, will constitute a "single source of truth" for Government as a Platform.

Updated 22.3.16

The information-sharing paradox

The G8, we were saying on St Patrick's Day, and the Cabinet Office and the Treasury all want to make more data open. Once it is available to all at no cost, innovation will be the inevitable result and the economy will expand by orders of magnitude, according to Stephan Shakespeare of YouGov, although he can't explain how – where there should be a coherent argument, there's just a hole.

It's not just the G8, the Cabinet Office and the Treasury. Companies House have fallen for it, too. And now they're having to confront the obvious problem, Our register: advice on protecting your personal information. (Don't get your hopes up. Their advice is useless.)

And it's not just the UK. The sensible Australians have got the bug. In the name of their "National Innovation and Science Agenda", Oz government wants much more personal data sharing, hat tip Kat Hall.

This is a global epidemic. With no solution. Because how do you combat a global epidemic? With massive information-sharing ...

Updated five days after St George's Day 2016

As noted on St Patrick's Day, please see above, there is a national log-rolling exercise being conducted in respect of open data. That exercise was assisted yesterday with the publication by Sir Jeremy Heywood, the Cabinet Secretary, of Open data - the revolution is here.

"Open data is data that anyone can access, use or share ... It really has changed people's lives for the better: the value is well proven", says Sir Jeremy.

"The value is well proven"? Follow Sir Jeremy's link and you will find reference to a PwC report, research from Lateral Economics, a McKinsey estimate, a CapGemini study, a Transport for London claim, the results of some Open Data Institute research and the findings of the Landsat Advisory Group that the 42 year-old Landsat mission has not been a waste of public money.

The value of open data remains well questionable. When Sir Jeremy follows up with "I believe the Civil Service can play a central role in harnessing these benefits", the question arises what benefits?

It gets worse. "At both national and local level, government holds huge amounts of data and more is generated every day. People rightly expect us to protect their personal data. But with general and anonymised data we can now achieve things that would have been considered impossible only a decade ago" – what impossible things can Sir Jeremy do now?

"The Government has already published over 27,000 datasets, covering almost £200 billion of public spending, since launching in 2010. We have done this to be open and transparent about the information we hold ...". OK so far, Sir Jeremy's is a Good Thing, to the extent that it helps the public to hold the government to account ...

... but then we get "but also so that others, inside and outside of government, can take that data and use it to build new and exciting products and services". What exciting products and services? Sir Jeremy's Companies House, for example, has opened up the personal information of hundreds of thousands of company directors (none of it anonymised by the way) and there's not a single exciting product or service to be seen as a result.

It is only prudent to be well sceptical about Sir Jeremy's assumption that open data inspires innovation and causes the economy to expand and improves people's lives.

Also about his claim that personal information will be treated with respect by his Government Digital Service (GDS), the bizarrely-chosen seat of his Government Data Programme, "... we are aiming to transform the way government stores, manages and uses data. The data team at Government Digital Service is ...".

The Privacy and Consumer Advisory Group have devised a set of nine principles for identity assurance. While claiming to abide by all nine, GDS have flouted the lot with their GOV.UK Verify (RIP) identity assurance scheme which is due to go live tomorrow. Sir Jeremy's revolution is here, indeed.

Updated 1.5.16 1

There was a documentary about the Queen on telly the other day. The programme covered the abdication of the disgraceful wrong 'un Edward VIII and the accession of George VI. There is evidence that, on the King's untimely death, Lord Mountbatten sought to bring back the wrong 'un and have him re-enthroned. At which point in the programme one transcendently magnificent lady, a cousin of the Queen's, delivered herself of the following: "it was always said of Lord Louis that if he swallowed a nail he would shit a corkscrew".

Let's call that property mountBatten(). It's a relation between any number of nails of any sort and any number of corkscrews of any sort. It's obviously not a very pleasant property. But in politics it can be occasionally necessary, all hewn of crooked timber as we most unfortunately are, it has its place.

Not least, we expect our cabinet secretaries to possess it. They must have thousands of other properties as well but it must be the case that cabinectSecretary.mountBatten(). Otherwise they can't do the job and they're no use to us.

When Sir Jeremy Heywood sort of promises that we can all enjoy the imprecise benefits of open data while nevertheless retaining our anonymity, as he did the other day, as noted above, it must be clear to everyone that this is more at the corkscrew end of the body politic's digestive tract than the nail end.

You can forget anonymity if the open data initiative that we are promised for this month, May 2016, proceeds. Professor Ross Anderson says so. So does Professor Martyn Thomas. Professor Sir Nigel Shadbolt looks as though he may agree.

They're just professors, of course, what do they know, you may ask. But it's not just them. GDS also have warned about the risk posed to anonymity by open data:
Update, 29 March 2016: We are now able to publish a CSV file (663 kb) containing the data used for the web tool for 7 of the 9 demographic variables provided by the ONS omnibus survey. This is combined with our model's estimate of the individual's probability of being verified by certified companies over time. This is the maximum number of variables we could make public, whilst preserving the anonymity of respondents.

Updated 1.5.16 2

The Electoral Commission once engaged the Government Digital Service (GDS) to do some data-mining work. It didn't go well. It was a painful experience. It ended in failure. The Commission's July 2013 report on the exercise includes:
• There were considerable delays to the original timetable for establishing this pilot. A significant cause of the delays was the lack of capacity and resources within Cabinet Office (and the Government Digital Service (GDS), which is part of Cabinet Office) due to their workload related to the transition to IER ...

• For the national data mining, Cabinet Office’s original intention was that pilot areas should adopt a fairly standardised approach to checking the data received and contacting the individuals identified, to ensure that results were comparable. In practice, however, the nature and extent of follow up work varied widely.

• Much of this variation was caused by practical difficulties, for example the need to spend more time than expected in ensuring the accuracy of the data received. However, some of the variation could have been avoided if there had been fewer delays and a greater level of support provided by Cabinet Office to pilot areas. In particular, a few areas told us they felt unsupported and were unclear about what to do ...

• It is not possible to produce an overall figure for the cost of this pilot. This is because we do not have final costs for all pilot areas or any costs for Cabinet Office (including GDS), who conducted much of the work.

• We are also therefore unable to estimate the cost per new elector registered or the likely cost of any national rollout. Any estimates of these would need to include the cost of coordinating and managing the pilot (the role taken by Cabinet Office in this pilot), as any future work with data mining would require some form of central coordination ...

• The reasons that so many existing electors and ineligible individuals were returned on the data include poor data specifications from Cabinet Office ...

• Inconsistent address formatting and incomplete addresses are likely to have contributed to the significant numbers of existing electors returned in the data (Cabinet Office could not provide the data which would have allowed for a definitive assessment) ...

• In order to answer this question [Is data mining a cost effective way of registering new electors?], we would need to assess the cost benefit of data mining by, for example, calculating the cost per new elector registered. However, we are unable to do this as Cabinet Office could not provide details of their expenditure on the pilot. As they managed the process and conducted much of the matching and data processing, their costs could be significant and are crucial in reaching any realistic assessment of cost effectiveness ...

– The addresses appeared to be more complete than those held in other national databases but a poor data specification from Cabinet Office meant that the format was inconsistent ...

The findings from this pilot do not justify the national roll out of data mining ...

In addition, there were numerous issues in this pilot with the communication and support provided by Cabinet Office ...

Cabinet Office need to ensure that they maintain good communication between themselves, the data holding organisations and EROs [electoral registration officers] throughout the process, including after data from the national databases has been returned to EROs ...
It's a long time ago, of course, well before yesterday, but there is no evidence of GDS making any more successful trips into the world of data science.

Later that year, on 16 October 2013, Mike Bracken, who was chief executive of GDS at the time, gave a speech to the Code For America Summit. "The Efficiency and Reform Group have saved about £10 billion of Whitehall costs", he told delegates, and "this figure represents about 4% of the UK's gross domestic product".

No. £10 billion was about 0.6% of GDP at the time, not 4%. That sort of a mistake must bring tears to the eyes of the Office for National Statistics (ONS). Mr Bracken nevertheless became the government's chief data officer in March 2015.

He left GDS last September, 2015. It's not obvious that GDS have since developed any greater respect for numbers. They say that their forlorn identity management scheme, GOV.UK Verify (RIP), can only go live if the account creation success rate reaches a minimum of 90%. It's currently 70%. GDS want GOV.UK Verify (RIP) to go live anyway.

"In May we will be publishing the latest instalment of our next National Action Plan as part of the Open Government Partnership". That's what the Cabinet Secretary said the other day.

The ONS are in on that plan. So are the ODI, the Open Data Institute. And ...

... and GDS. How did they get in there? Their inclusion can't be based on their record. It's not exactly an example of data science in action, is it.

No comments:

Post a Comment