The separation principle

The separation principle is a mechanism to protect the identities of individuals and organisations in datasets, applied as part of the linking and merging process used to form the integrated dataset. The separation principle means that no one working with the data  can view both the linking (identifying) information (such as name, address, date of birth or ABN) together with the merged analysis (content) data (such as clinical information, benefit details or company profits) in an integrated dataset.

Under the separation principle, individuals only have access to the information needed to perform their role: those involved in linking the datasets only see the identifying information needed to create the links between different datasets (such name and address); while those involved in analysing the integrated data only have access to de-identified data specific to the project requirements. For example, rather than someone being able to see that John Brown has a rare medical condition, the person creating the links will only see John Brown’s name and address and limited other demographic information common to each dataset that will assist in identifying the link. The analyst will only see a record that shows a person has a rare medical condition, together with other variables needed for analysis such as broad age group, gender, hospital admissions etc.

The separation principle is one way to protect the identities of individuals and organisations in datasets during the linking and merging process used to form the integrated dataset. However, it is important to note that analysis data in an integrated dataset can still provide enough information to allow identification of a person or business, even after direct identifiers (such as name and address) are removed. It is therefore the responsibility of the integrating authority to also appropriately confidentialise the data before it is made available to researchers, in accordance with the requirements of data custodians  [1].  Protecting the confidentiality of the data is a key element in maintaining the ongoing trust of the Australian public. For more information about confidentiality, including popular techniques for confidentialising data see the  Confidentiality Information Series available on the National Statistical Service website, http://www.nss.gov.au/.

There are different ways the separation principle can be applied, but in all cases the result is that no-one accesses an integrated dataset containing both linking and analysis data. The separation principle is broadly illustrated in the diagram below.

 

 

Applying the separation principle

Once an integrating authority has been selected, the data custodians prepare their respective datasets for linkage. How this will happen will vary according to the project requirements and the linkage models being used. It is generally better practice for the data custodians to separate the identifying information (for example name and address) from the content data (for example administrative or clinical information) before it is transferred. Alternatively, where legislation and other requirements permit, data custodians may choose to submit the entire encrypted dataset (containing both identifiers and project specific content information in one dataset) to the integrating authority. The integrating authority is then responsible for applying the separation principle, unless access to the identified integrated data is required and approved for the purpose of the project and permitted by legislation. The links below provide more information about how the separation principle could be applied in each of these approaches.

Approach 1 (external separation): data custodians apply the separation principle before providing data to the integrating authority

 

Approach 2 (internal separation): data custodians provide the full (encrypted) dataset to the integrating authority and the integrating authority applies the separation principle

Other topics in this section regarding the Commonwealth arrangements for statistical data integration are:

For more information on the role of integrating authorities see:

For more information on the project delivery phase see:

1. “Access to potentially identifiable integrated data for statistical and research purposes, outside secure and trusted institutional environments should only occur where: legislation allows; it is necessary to achieve the approved purposes; and meets agreements with source data agencies.” (High Level Principle 6) ] .