The Western Australian Data Linkage System is one of a few comprehensive, population-based data linkage systems worldwide, creating links between information from different sources relating to the same individual, family, place or event, while maintaining privacy. The Raine Study is an established cohort study with more than 2000 currently active participants.
Individual consent was obtained from participants for information in publicly held databases to be linked to their study data. A waiver of consent was granted where it was impracticable to obtain consent. Approvals to link the datasets were obtained from relevant ethics committees and data custodians. The Raine Study dataset was subsequently linked to academic testing data collected by the Western Australian Department of Education.
Examination of diet and academic performance showed that children who were predominantly breastfed for at least 6 months scored higher academically at age 10 than children who were breastfed for less than 6 months. A further study found that better diet quality at ages 1, 2 and 3 years was associated with higher academic scores at ages 10 and 12 years. Examination of nutritional intake at 14 years of age found that a better dietary pattern was associated with higher academic performance. The detailed longitudinal data collected in the Raine Study allowed for adjustment for multiple covariates and confounders.
Data linkage reduces the burden on cohort participants by providing additional information without the need to contact participants. It can give information on participants who have been lost to follow-up; provide or complement missing data; give the opportunity for validation studies comparing recall of participants with administrative records; increase the population sample of studies by adding control participants from the general population; and allow for the adjustment of multiple covariates and confounders. The Raine Study dataset is extensive and detailed, and can be further improved by linking to other external data sources. By linking educational outcomes to the Raine Study database, it was shown across three different age groups that a healthy diet was consistently associated with higher academic performance.
The Western Australian Pregnancy Cohort (Raine) Study was established with the enrolment of 2900 pregnant women at 18 weeks gestation attending a public antenatal clinic and nearby private clinics in Perth, Australia, between May 1989 and November 1991.1 The study’s purpose was to investigate whether intensive use of ultrasound imaging and Doppler flow studies would improve pregnancy outcomes, and to develop a long-term cohort to examine the role of early life events on later health.2 Extensive data were collected during pregnancy, and 2868 offspring were assessed at birth and followed up at 1, 2, 3, 5, 8, 10, 14, 17, 18, 20 and 22 years of age.3 Questionnaire data, physical measurements and biological samples were collected during pregnancy and infancy, resulting in a database containing a broad range of prospective demographic data, detailed phenotype data, and measures of the antenatal and postnatal environment. Data collection was expanded during childhood, adolescence and young adulthood to cover a lifecourse framework, including phenotype, environmental, behavioural, occupational and genetic information. Information routinely collected in administrative databases (such as hospital admissions, medication use and education outcomes) has the potential to be linked to the existing Raine Study data. The purpose of this article is to outline the potential benefit to longitudinal cohorts of linkage to administrative datasets, using the first Raine Study data linkage involving diet (collected in the Raine Study at follow-up) and educational outcome (collected by the local education authority) as an example.
Begun in 1995 to link local health datasets, the current Western Australian Data Linkage System (WADLS) is one of the few comprehensive, population-based data linkage systems worldwide.3 Data linkage is a technique for creating links between information from different sources that relate to the same individual, family, place or event.3 Data custodians are the organisations or agencies (or their representatives) responsible for the collection and use of datasets, and have access to identifying demographic and phenotypic information. Data custodians are responsible for protecting the privacy of individuals, according to both legislation and public interest in the right to privacy of personal information.4 The Raine Study Executive Committee is the custodian for Raine Study data.
Data are separated into two distinct types: identifying information (e.g. name, address, date of birth), and clinical or service information (e.g. hospital admissions or educational records). Data custodians provide identifying information to the WADLS for linkage using a probabilistic matching approach, and clerical review when necessary. Linkage keys, unique to each individual, are generated across different data collections.5 The linkage keys are encrypted and replace all identifying information. The encrypted identifiers are returned to the data custodians, who provide service information with the encrypted keys, but no identifying information, to the researcher.
The WADLS acts as an intermediary between data custodians and researchers. Western Australian Department of Health (WADoH) Human Research Ethics Committee (HREC) approval is required for creating new linkages and using the linkage infrastructure. Separate approvals and agreements are obtained from all agencies and data custodians before linkage can be conducted. Researchers receiving linked data agree, in writing, to a strict set of data security conditions.3
Data linkage is generally used for large, population-based projects, facilitating merging of large sources of information on individuals. For example, Silva et al. examined 12 831 individuals aged 10–21 years with a diagnosis of attention deficit hyperactivity disorder (ADHD) who were matched by age, sex and socio-economic status with a further 29 722 individuals with no ADHD diagnosis.6 Data linkage also enables linking of trial data to other datasets. For example, Kelty et al. linked self-reported adverse events from a clinical trial with hospital mortality data and emergency data collections.7 This also allowed for the validation of self-reported events against hospital admissions data. Longitudinal cohort data can also be linked to other datasets – for example, Knuiman et al. linked the Busselton Cohort Study to death records to ascertain mortality rates over a 26-year period.8
The potential to link Raine Study information with data from more than 30 collections – including local and national health and welfare datasets, genealogical links and spatial references3 – can significantly complement and improve the research potential of the Raine Study. As a first step and proof of principle, Raine Study data have been successfully linked with administrative records managed by the WADLS. The success has been in terms of both logistically linking the data and providing useful findings. This linkage enabled researchers to examine the associations between Raine Study diet data and educational outcome data collected by the Western Australian Department of Education.9 Here, we outline the steps necessary to conduct the linkage, its value and the challenges faced.
For child participation in the Raine Study, written informed consent was obtained from parents or guardians at pregnancy, birth, and at 1, 2, 3, 5, 8, 10, 14 and 17 years of age. Assent was obtained from the participants at age 14 and 17. In 2008, when the cohort participants started to turn 18 years old, approval was granted by the University of Western Australia HREC to contact all Raine Study 18-year-old participants (including participants whose parents had previously withdrawn), and acquire consent for the Raine Study to:
Where the Raine Study failed to contact the participant, or the participant did not respond, a waiver of consent (sought to protect, maintain, maximise and improve value of the data previously collected on the cohort) was granted by the University of Western Australia Human Research Ethics Committee.
Information and consent forms were sent to more than 2500 cohort members who were not lost to follow-up, withdrawn or deceased (Figure 1). Over 12 months, 1127 participants returned a signed consent form. Of these, five declined permission for data linkage and were not included when creating linkage keys, and 949 active participants did not return a signed form. Under the waiver of consent, those lost to follow-up and nonresponders, including withdrawn participants, were included for data linkage.
Figure 1. Flowchart of Raine Study participant consent for data linkage at age 18
In 2012, the Raine Study made a successful application to the WADoH HREC (#2012/70) and obtained approval for linkage to the WADLS.
Specific project applications to link the Raine Study longitudinal diet data with educational results were made to the Raine Study Executive Committee, the WADLS (ref #2013/75) and the WA Developmental Pathways Project (WADPP). The WADPP includes nonhealth custodians, which enables linkage of a number of nonhealth datasets to the WADLS, including education.10 The participant consent and approvals processes, and data manipulation and labelling took 1 year to complete.
Raine Study data, with information on diet in infancy, childhood and adolescence, were linked to Western Australian Literacy and Numeracy Assessment (WALNA) records for participants in Years 5 (age 10), 7 (age 12) and 9 (age 14).
The first study examined the relationship between duration of breastfeeding (available from the cohort at birth, 1, 2 and 3 years, based on the maternal report) and WALNA scores in Year 5 (age 10 years, n = 1038).11 Children who had been predominantly breastfed for at least 6 months attained better educational outcome scores (based on WALNA results at mean age 10.4 years) in mathematics (β 15.79; 95% confidence interval [CI] 1.04, 30.55; p = 0.036) and reading (β 18.28; 95% CI 3.92, 32.64; p = 0.021) in the multivariable model than those breastfed for less than 6 months, after adjusting for marital status, maternal age and education, early reading with child and household income. The effect was gender-specific, with only boys who had been breastfed for at least 6 months attaining significantly better scores in mathematics (β 34.48; 95% CI 13.66, 55.27; p = 0.001), reading (β 24.72; 95% CI 4.87, 44.58; p = 0.015), writing (β 37.41; 95% CI 9.43, 65.40; p = 0.009) and spelling (β 28.23; 95% CI 6.38, 50.07; p = 0.011).
The second study examined the relationship between early diet and WALNA scores in Year 5 (age 10, n = 2247) and Year 7 (age 12, n = 2287).12 A modified 24-hour dietary recall was completed by the parent or guardian at ages 1, 2 and 3. An overall dietary score was developed, with higher scores representing greater intakes of fruits, vegetables, wholegrains and protein sources (excluding processed and red meats), and lower intake of sweetened drinks and snack foods.12,13 The extensive Raine Study phenotype dataset allowed for adjustment of the effects of gender; maternal age, race and education; family income; the presence of the biological father; breastfeeding duration; and parental language stimulation (reading to the child) from birth to age 3. A better-quality diet at 1 year old was associated with significantly higher WALNA scores in Year 5 (mathematics [β 0.47; 95% CI 0.09, 0.84; p = 0.015], reading [β 0.63; 95% CI 0.24, 1.02; p = 0.002], writing [β 0.94; 95% CI 0.37, 1.50; p = 0.001] and spelling [β 0.89; 95% CI 0.31, 1.47; p = 0.003]) and Year 7 (mathematics [β 0.71; 95% CI 0.25, 1.16; p = 0.002], reading [β 0.75; 95% CI 0.39, 1.10; p < 0.001] and spelling (β 0.90; 95% CI 0.35, 1.45; p = 0.001]).
The third study examined the association between dietary patterns in adolescence and WALNA results in Year 9 (age 14, n = 779 mathematics, n = 741 reading and n = 470 writing).7 At age 14, dietary data were collected using a parent-completed food frequency questionnaire.14 Dietary patterns were derived by factor analysis from the major food groups.15 Two major dietary patterns were identified: ‘healthy’ (high intake of fruits, vegetables, wholegrains, legumes and fish) and ‘nonhealthy’ (high intake of takeaway foods, red and processed meat, soft drinks, fried and refined food). Adjustments were made for sociodemographic and family characteristics, and it was found that higher scores for the nonhealthy dietary pattern were significantly associated with poorer WALNA scores for mathematics (β −13.14; 95% CI −24.57, −1.76; p = 0.024) and reading (β −19.16; 95% CI −29.85, −8.47; p ≤ 0.001), with a similar trend found for writing (β −17.28; 95% CI −35.74, 1.18; p = 0.066).
Linking the Raine Study data with other datasets was a laborious and lengthy – but ultimately worthwhile – process. Findings from linking the education outcomes to the Raine Study diet data provided clear evidence of a link between healthy dietary intake and academic achievement across three lifecourse age periods. The initial efforts we made to obtain HREC approval, individual consent from cohort participants, a waiver of consent for linkage, ethical approval and data custodian permission to establish the linkage keys will pave the way for future data linkage project applications.
Data linkage with other administrative data serves to reduce the burden on participants. Data collection methods within the Raine Study predominantly require participant involvement, including self-report questionnaires, diaries, interviews, physical examination, clinical testing and biological sampling. The current linkage allowed accurate educational outcomes to be ascertained without burdening the participants, and without introducing error from participant recall mistakes and missing data from participant failure to respond to or recall the data. Where participants were lost to follow-up, linkage with existing datasets can provide missing information on outcomes such as hospital admissions. Data not collected in previous follow-ups can be sourced. If the measurement of an exposure or outcome is available from more than one source, data linkage can help achieve optimal accuracy of data by comparing and validating the measures used.16 Linkage to total population data also allows comparison of the cohort with patterns in the population to assess representativeness.
Linking the Raine Study data with other datasets, including those available through the WADLS (from collections concerning birth notifications, mental health, child protection, disability, hospital admissions, education, case registries [diabetes, cancer, autism], deaths, medication use and primary care records), can add information to an existing rich database and expand the use of the Raine Study data for cross-disciplinary research projects. The high-quality findings provided by longitudinal studies with linked data, as outlined in this paper, are important sources of evidence for health and social policy guidelines.
We acknowledge the Raine Study participants and their families. The following institutions are acknowledged for core funding: the University of Western Australia (UWA); Curtin University; the Telethon Kids Institute; the Raine Medical Research Foundation; the UWA Faculty of Medicine, Dentistry and Health Sciences; the Women’s and Infant’s Research Foundation; and Edith Cowan University. We acknowledge the WADLS for linkage support, the WADPP for enabling linking to a number of nonhealth datasets (supported by Australian Research Council Linkage Project 100200507) and the Western Australian Department of Education for providing the educational data.
© 2016 Mountain et al. This article is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence, which allows others to redistribute, adapt and share this work non-commercially provided they attribute the work and any adapted version of it is distributed under the same Creative Commons licence terms.