What is statistical data integration?

Statistical data integration involves combining data from different administrative and/or survey sources, at the unit level (i.e. for an individual person or organisation) or micro level (e.g. information for a small geographic area), to produce new datasets for statistical and research purposes. This approach leverages more information from the combination of individual datasets than is available from the individual datasets separately.

In this guide, data integration refers to the full range of management and governance practices around the process, including project approval, data transfer, linking and merging the data and dissemination.

Linking (also referred to as ‘data linkage’ or ‘data matching’) is that part of the process that involves creating links between records from different sources based on common features in those sources.

For records that can be linked, data merging is the process of combining individual records (or information in those records) into an integrated dataset specific to the purpose of the analysis. It is recommended that the integrated dataset be de-identified, unless the use of identified data is required and approved for the purpose of the project.

Why integrate data?

Integrated datasets provide public benefits in terms of improved research, supporting good government policy making, program management and service delivery.

A major advantage of data integration is that it allows better use of data that is already available, so it is a cost effective and timely way of gathering more information in order to help improve social, economic and environmental wellbeing. It also reduces the duplication of information collection from people and businesses, as integration projects make use of existing information which was collected from them for other purposes.

For more information about the Commonwealth arrangements for statistical data integration see: