Thursday, 19 December 2013

The peculiar art of the Whitehall press release

Date confirmed for Individual Electoral Registration (IER), says yesterday's Cabinet Office press release: "The government has today confirmed its intention to move to IER on 10 June 2014 in England and Wales and 19 September 2014 in Scotland".

We are moving in Great Britain from household registration to individual electoral registration. That is the will of Parliament as enshrined in the Electoral Registration and Administration Act 2013.

How will local Electoral Registration Officers (EROs) make sure that the electoral roll includes all those eligible to vote and only those eligible to vote? It's an old question. With old answers – we've been voting for several centuries now.

There was one new answer.

How about comparing the electoral rolls with other databases like the National Insurance number database? That way EROs could be given a list of people to follow up who should be on the electoral roll but aren't, and try to prevail on them to register.

Worth trying. The Electoral Commission drafted in the Government Digital Service (GDS) to do a "data mining" or "data matching" exercise.

Whatever you want to call it, the exercise was an unmitigated failure. "The findings from this pilot do not justify the national roll out of data mining", said the Commission in their July 2013 Data mining pilot – evaluation report, first recommendation, p.8, in bold.

The Commission gave several reasons for their conclusion, including the fact that GDS put forward not only foreign people ineligible to vote as candidates for EROs to follow up but also people who were already registered and didn't need any follow-up.

They had other reasons in addition. The delays caused by GDS. GDS's procedural changes mid-stream which meant results weren't comparable. The refusal by GDS to say how much their work had cost, with the result that the Commission don't know what the pilot cost and can't estimate the cost of live running.

And that's just the second pilot. In the first pilot, GDS made it look as though 82% of residents on the electoral roll in Ceredigion were impostors. EROs need reliable data. This is the election of governments we're talking about here, both local and national.

One way and another, the Commission's conclusion seems unimpeachable. The findings from this pilot do not justify the national roll out of data mining.

And how is this matter dealt with in yesterday's press release?

The Rt Hon Greg Clark MP, the Cabinet Office Minister responsible, is quoted as saying: "Following the successful dry run of the data matching process over the summer, and the Electoral Commission’s assessment that there is no reason to delay implementation, this confirms progress towards a more modern, secure system of electoral registration".

Somehow the unmitigated failure of the second pilot has become a "successful dry run". Please see comment below, 21 December 2013, 1:19 a.m. Please see also Whitehall press release – an apology.

11 comments:

Leonig Mig said...

Your conclusion is false as the phrase 'data mining exercise' and 'data matching process' refer to different things, albeit they both have the word 'data' and are three words long.

I'm not sure if you are poor at doing the analysis or you are intentionally trying to muddy the waters to smear these initiatives? If it is the latter you are doing a good job, if it is the former then you need to tighten up on the quality if you want this to be a respected blog.

David Moss said...

If it was up to me I would call it a "data matching exercise". The Electoral Commission choose to call it a "data mining exercise". That's up to them.

Either way, we're talking about the same exercise. Two names for one thing. Which makes my conclusion true, and not false as you assert.

Let's be clear, and not "muddy the waters" – the GDS initiative has not been "smeared" by me. It has been severely criticised by the Electoral Commission. With good reason. Several good reasons. None of them rebutted by GDS. Nor by you, ...

... whoever you are. Wikipedia tells me that: "Leonig Mig is an excellent pseudonym for someone to use online. It is such a stupid name nobody would have it for real, so it allows google.com to keep track of everything. I starting using wikipedia at university to get general overviews on subjects when starting specific bits of research. After a time I found myself editing inaccuracies, but more frequently improving on little bits of style here and there. I have written a series of articles about the area I grew up and contribute a bit here and there".

And as for respected blogs, your Things I've Learned About I.T. is a must-read.

David Moss said...

Leonig Mig raises the interesting question about the distinction between "data mining" and "data matching". Let's add to that and introduce "data sharing".

C.f. Whitehall, the Guardian newspaper and Lord Leveson – darkness at noon et seq and Is data-sharing between consenting adults now legal?.

Leonig Mig said...

Notwithstanding the ad-hominem (well done for Googling me) you are incorrect, they are in fact different activities and the conclusion of your blog entry is therefore un-founded.

David Moss said...

Leonig (if I may)

Who said:

• There were considerable delays to the original timetable for establishing this pilot. A significant cause of the delays was the lack of capacity and resources within Cabinet Office (and the Government Digital Service (GDS), which is part of Cabinet Office) due to their workload related to the transition to IER ...

• For the national data mining, Cabinet Office’s original intention was that pilot areas should adopt a fairly standardised approach to checking the data received and contacting the individuals identified, to ensure that results were comparable. In practice, however, the nature and extent of follow up work varied widely.

• Much of this variation was caused by practical difficulties, for example the need to spend more time than expected in ensuring the accuracy of the data received. However, some of the variation could have been avoided if there had been fewer delays and a greater level of support provided by Cabinet Office to pilot areas. In particular, a few areas told us they felt unsupported and were unclear about what to do ...

• It is not possible to produce an overall figure for the cost of this pilot. This is because we do not have final costs for all pilot areas or any costs for Cabinet Office (including GDS), who conducted much of the work.

• We are also therefore unable to estimate the cost per new elector registered or the likely cost of any national rollout. Any estimates of these would need to include the cost of coordinating and managing the pilot (the role taken by Cabinet Office in this pilot), as any future work with data mining would require some form of central coordination ...

• The reasons that so many existing electors and ineligible individuals were returned on the data include poor data specifications from Cabinet Office ...

• Inconsistent address formatting and incomplete addresses are likely to have contributed to the significant numbers of existing electors returned in the data (Cabinet Office could not provide the data which would have allowed for a definitive assessment) ...

• In order to answer this question [Is data mining a cost effective way of registering new electors?], we would need to assess the cost benefit of data mining by, for example, calculating the cost per new elector registered. However, we are unable to do this as Cabinet Office could not provide details of their expenditure on the pilot. As they managed the process and conducted much of the matching and data processing, their costs could be significant and are crucial in reaching any realistic assessment of cost effectiveness ...

– The addresses appeared to be more complete than those held in other national databases but a poor data specification from Cabinet Office meant that the format was inconsistent ...

The findings from this pilot do not justify the national roll out of data mining ...

In addition, there were numerous issues in this pilot with the communication and support provided by Cabinet Office ...

• Cabinet Office need to ensure that they maintain good communication between themselves, the data holding organisations and EROs [electoral registration officers] throughout the process, including after data from the national databases has been returned to EROs ...


Answer: the Electoral Commission.

That is the foundation of blog post. It is not "un-founded".

Are you suggesting that it is wrong in some way to Google you? Does one need a licence?

Why do you post on Wikipedia if you don't want people to know? Why do you have a blog if you don't want people to read it?

Why do you comment anonymously?

Leonig Mig said...

Seriously you should look at the detail of this a little closer. Data mining and data matching are two different aspects of IER. Data mining came to nothing but the trial of matching worked, which is why they are going ahead.

David Moss said...

The Zimbabwean cousins who emigrated to Australia are over for dinner and half way down the bottle of grappa something about Leonig Mig's tenacity penetrates the brain.

My cards are on the table, he can see my pair of kings, he doesn't dispute the failure of what the Electoral Commission call a "data mining exercise", we can bank that, but he's still in the game.

He hasn't shown his hand. Is he holding three eights?

Check the Electoral Commission website.

There's the data mining evaluation report with a date of 5 December 2013 against it which was somehow commented on in these parts seven weeks before that on 20 October 2013 and which bears the date "July 2013" on its front cover.

There also is a never-seen-before report, Confirmation dry-run 2013: Results, with a date of 17 October 2013 against it, although the PDF properties date is 20 days later, 6 November 2013.

We must assume that this is an Electoral Commission report, it's on their website, but there's no Electoral Commission cover on it. On the other hand, the author name in the PDF properties is Davide Tiberti and he is an Electoral Commission person.

So presumably that's who the "we" are in the Conclusions on p.19 saying:

We are currently analysing the data from the local data matching activities conducted by electoral administrators during the dry run alongside the responses we received to our survey questions on local data matching. We intend to publish a further analysis on this later in the year.

Which falls some way short of Rt Hon Greg Clark MP's "successful dry run" claim and some way short of Leonig Mig's "the trial of matching worked" claim. But it does look as though there are two things going on here and not just one.

A two-pipe problem. To be resumed later on.

Anonymous said...

Leonig Mig -> Gim Gino Le -> Jim Gumbley (of GDS) -> http://www.jgumbley.com
https://www.ohloh.net/p/alphagov_github_com/commits?page=5 etc

David Moss said...

Thanks for that, Anonymous, you may well be right but the name "Leonig Mig" has grown on me and I'm going to stick to it.

Leonig Mig said...

Glad we got the chance to chew it over David.

http://xkcd.com/386/

David Moss said...

Leonig, agreed – good chew.
xkcd ok in small quantities.
Venting vaguely à propos.

Post a comment