GKB: Geodemographics Knowledge Base

Administrative data are data collected routinely at source for purposes other than research or statistics eg gathered by government for registrations, transactions and record-keeping, usually during service delivery and operational activities. They are also collected by commercial and private sector organisations, universities and health services.

Administrative data can be some or all of public, personal, big and social media data, at the same time.

Examples/sources of administrative data include: birth, marriage, civil partnership, divorce and death records; the GP Patient Register, Hospital Episode Statistics and Cancer Registry; the School Census, National Pupil Database, Higher Education Statistics Agency and Student Loans; Driver and Vehicle Licensing Agency data; Passport information; welfare and pensions data including Universal Credit, Job Seekers’ Allowance, Carers’ Allowance and Pensions Credit; tax data such as Real-time PAYE and VAT; Police records; Electoral Register and Land Registry; National Insurance; Council Tax; mobile ‘phone records; website cookies; IP addresses; smartphone location services and company customer details.

Benefits of using administrative alongside or instead of survey data for research include the exploration of new research questions, improved population coverage, improved geographical coverage, increased timeliness and frequency, savings in survey data collection activities, addressing the decrease in survey response rates and minimisation of respondent burden. Administrative data can be used in sampling frame design, editing and imputation, survey evaluation and longitudinal analyses.

Examples of successful use of administrative data in research, particularly linked administrative data, can be found at the Centre for Longitudinal Studies website  and at the website of the Avon Longitudinal Study of Parents and Children. Another example is this study combining survey data, paradata (data about the process by which a dataset has been collected) and administrative data to investigate the optimisation of government survey collection given survey non-response.

There are some excellent administrative data sources and some really good examples of re-use of administrative data for research and statistics. There is a greater ability to process data than ever before (and lots more data to process), legal gateways for data-sharing are now in place, and the appetite to share data is increasing, as successful projects demonstrate value.

However, there are many challenges. Firstly, obtaining data is complex and time-consuming: the Administrative Data Research Network, formed in 2014, transitioned to the Administrative Data Research Partnership in July 2018, working closely with the Office for National Statistics, in order to improve data flows. It can take a number of years to agree a data-share. Existing data-sharing agreements have been further complicated by General Data Protection Regulation, 2018.

There are costs in re-purposing datasets: specialist skills are required, alongside suitable software and processing capacity. There is no single catalogue of administrative data sources, so researchers may not even know what data are available.

Public perception is very important in the re-use of data. Recent examples where data-sharing has led to negative headlines include:

Other challenges include the difficulty in establishing mode impact on data collection eg were the data collected face-to-face or on a mobile telephone or laptop. Self-reporting is subjective and can therefore be biased. Systematic bias can lead to specific sub-populations being under-represented in the data. There are important considerations around understanding the data sets: what assumptions have been made, what disclosure control has or will be applied, especially after linkage, and what unintended consequences might ensue from using the data in this way?

There are also considerations around the data themselves: quality and accuracy, non-response, handling of missing data, variable change over time, bias in supplied information, population and geographical coverage, and the data lifecycle including over-writing of variables and deletion.

Data linkage is not easy, especially in the absence of a UK-wide population register. The roles of the data controller and data processor need careful thought, ensuring compliance with data protection law and a legal basis for processing. Matching errors can involve both false matches and missed matches. There can be problems with the linkage itself, although work is ongoing to improve these processes and some guidance is available eg GUILD.

Re-identification risk is clearly an issue, for both linked and unlinked data.

Looking ahead, it seems clear that there will be much more usage of private sector data for government/statistical purposes, and more creative linkages: but for what purpose and to whose benefit? The UK Statistics Authority acknowledges the value of administrative data and related issues and wants to make things better, while the increasingly well-known Five Safes model, emphasising safe people, projects, settings, outputs and data, is an excellent framework for consideration of confidentiality.

For my part, I would like to see research potential and quality assessment becoming integral to administrative data systems design and upgrade; clear standardised metadata for existing administrative data; a positional shift to “how” rather than “if” administrative data sources can be used; transparency as the norm, and recognition that from analysts to policy-makers, people are intrinsic to success.

So, are administrative data the answer? Well, they might be. It depends on the question! Once that is posed, consider what data could be used to inform the results. Using administrative data may indeed be the correct approach, but it is unlikely to be a quick or an easy option.

Dr Emma White leads GDPR Technical Implementation at the University of Southampton, where she previously managed the Administrative Data Research Centre for England. She was Head of Administrative Data for NatCen Social Research, and was Head of Policy and Analysis for 2011 Census Outputs with the Office for National Statistics (ONS). She holds a PhD in Mathematics.

As part of Emma’s role with ONS, she provided regular census updates to CGG over a number of years, and was delighted to be invited to re-join the group in 2016 to be on the receiving end of said updates. She finds the collegiate working and responsiveness of the group very rewarding, especially when collaborating on responses to consultations and inquiries. She also enjoys the vast amount of knowledge and experience that the group brings to the table. Emma believes that CGG continues to make a significant contribution in representing members’ views to government, and in sharing their expertise in census and geodemographics activities.

Geodemographics - blogs and resources

Visit the Geodemographics Knowledge Base (GKB) for expert blogs and links to useful sources of geodemographic data and knowledge.

Visit the website

Get the latest MRS news

Our newsletters cover the latest MRS events, policy updates and research news.