Objectives: We sought to determine the ease with which breast cancer pathology data could be ascertained for a large cohort of Australian women, to support epidemiological research.
Method: We assessed a range of options for accessing breast cancer pathology data. Manual review of the pathology report provided to the New South Wales Cancer Registry (NSWCR) was considered most feasible, complete and reliable. Incident breast cancers (ICD–10 C50) in female 45 and Up Study participants, resident in NSW, were identified from linked NSWCR data for the period 2006–2012. Data not routinely available in the NSWCR, including hormone receptor status, were extracted from the pathology report provided to the registry.
Results: Among 143 079 eligible women, 2051 had a first registration of breast cancer following cohort recruitment. The mean age at cancer diagnosis was 64.5 years. Based on cancer registry data, the cancers were predominantly ductal (74.1%), 54.4% were localised to the breast at diagnosis and 24.2% were >50 mm in size. Based on manually extracted data from pathology records, 23.9% of cancers were histological grade 1, 79.6% were oestrogen receptor positive and 71.2% were progestogen receptor positive. These data were mostly complete (<10% missing). HER2 receptor status was less well reported, with 31.9% of cancers having indeterminate or missing data, while 11.3% were reported as positive. Data on lymph node status was missing in 16.1% of breast cancer reports, 33.7% were node positive. 8.0% of breast cancers had involved surgical margins, and this data was missing for 14.1% of cases.
Conclusion: Pathology information, in addition to that available from routine registry data, is required both for breast cancer research and for monitoring trends in the types of breast cancer occurring over time in Australia. All the important additional data items required are recorded on the pathology report, which is provided to the NSWCR as part of cancer notification but is not routinely coded, and are generally fairly complete. However, access to these data for large-scale studies requires substantial effort. Coding the pathology data and making it routinely available would substantially improve cancer research and enable proper monitoring of breast cancer trends in Australia.
Breast cancer is one of the most commonly diagnosed cancers in Australia and worldwide.1-3 For robust epidemiological research into breast cancer, the clinically important characteristics about invasive breast cancer include the stage at diagnosis (tumour size and nodal status), tumour grade, histological subtype and receptor status (including oestrogen receptor and human epidermal growth factor receptor 2 [HER2 receptor]), and other potentially important biomarkers (such as the cell proliferation index Ki–67).4 These factors have predictive and prognostic value which inform clinical decision making and are also needed to better understand breast cancer epidemiology.
All Australian jurisdictions have legislation requiring notification of cancers (excluding non-melanoma skin cancers) to the jurisdiction’s cancer registry. Although information on breast cancer stage and morphology are routinely reported in New South Wales Cancer Registry (NSWCR) data, other key data, such as hormone receptor status, are not. To inform the feasibility of future large-scale epidemiological and health service use research into breast cancer, we conducted a study to determine the ease with which more detailed breast cancer pathology data could be ascertained for a large cohort of Australian women. We report on the process of obtaining data and its completeness.
The Sax Institute’s 45 and Up Study (‘Study’) is a population-based cohort established in NSW, Australia, to examine factors related to healthy ageing.5 Potential participants resident in NSW were randomly sampled from the Department of Human Services (DHS) (formerly Medicare Australia) enrolment database, which provides near-complete coverage of the population. People aged 80 years and older and residents of rural and remote areas were oversampled. In total, 267 153 adults (mean age 62 years; 53.6% female) joined the Study between 2006–2009. Participants completed a questionnaire on study entry and consented to follow-up and record linkage to their health records, including cancer registrations.5
For this report, the NSW Centre for Health Record Linkage linked Study participants to the NSWCR, which is managed by the NSW Cancer Institute and collects information on people in NSW diagnosed with cancer. Information collected on each notified invasive breast cancer case includes: sex, country of birth, Indigenous status, date of birth, date of diagnosis, cancer group, cancer topography and morphology, diagnostic method, degree of spread at diagnosis, laterality, tumour size, number of primary sites, cause and date of death, and measures of residential geography and socio-economic advantage. At the time of this study, NSWCR data was available for the period 1 January 1991 to 31 December 2012.
Several options were considered for accessing additional data that were not routinely available through the NSWCR, such as hormone receptor status. These options included legacy Clinical Cancer Registries held by the NSWCR; a specific breast cancer extension dataset collected at the local health district level; pathology reports direct from pathology laboratories; and the scanned pathology reports held at NSWCR that were originally used to make the cancer notification. Manually accessing the scanned copies of cancer pathology reports provided to the NSWCR was considered to be the most feasible and reliable option for collecting surgical pathology data. The other options were rejected because they did not provide complete coverage of the population, were missing key data items or, in the case of obtaining reports directly from pathology laboratories, would not be feasible for such a large, geographically dispersed study population.
We requested that the NSWCR extract their pathology records for manual review for all eligible 45 and Up Study participants, i.e. women linked to a NSWCR record for an invasive breast cancer (ICD-10 code ‘C50’) following recruitment to the 45 and Up Study. We developed a data extraction form for information not routinely reported in the NSWCR including: hormone receptor status (oestrogen receptor, progesterone receptor, HER2 receptor), type of specimen (e.g. biopsy, surgical) and tissue type (e.g. breast, lymph node). We extracted data that could also be clinically useful, including the surgical margin clearance (clear or involved), lymph node status and mode of cancer detection (i.e. screen detected or not). The data extraction form was refined in consultation with NSWCR staff. Data were extracted by NSWCR staff with a 10% sample re-extracted to check agreement, and the full extract provided in July 2018, in an anonymised format. Extensive cleaning and manipulation of the data extract was required before analysis.
The 45 and Up Study receives institutional ethical oversight from the UNSW Human Research Ethics Committee (reference HC15408). This study was approved by the NSW Population and Health Services Research Ethics Committee (reference HREC/17/CIPHS/10).
Descriptive analyses were undertaken, with counts, percentages and incidence rates. All analyses were performed using SAS (NC, US: SAS Institute; version 9.3).
Among 143 079 eligible women, 2068 linked to a breast cancer registration from recruitment to the Study (earliest year 2006). For the 2068 women with a linked breast cancer registration, there were 4315 related pathology reports which were reviewed. The time taken was 16 months from ethics approval of the project, linkage to the NSWCR data and receipt of the extracted pathology data for the 2068 women.
The major data cleaning issues encountered included resolving incompatibility between the reported date of diagnosis of breast cancer in the NSWCR and the date of the pathology report(s); resolving short-hand and sometime conflicting reports; consolidating the multiple pathology records into a single incident breast cancer record; and deriving the HER2 status from an immunohistochemistry (IHC) score and the in-situ hybridisation (ISH) result (where the IHC result was equivocal).
After excluding women with a first NSWCR invasive breast cancer record dated prior to recruitment, there were 2051 women with an incident breast cancer in the cohort. The incidence rate was 318/100 000 person years and the mean age at diagnosis was 64.5 years (standard deviation [SD]: 10.6).
Table 1 describes characteristics of the women with incident breast cancer based on survey information provided at recruitment into the 45 and Up Study. The majority of women with incident breast cancer (85.1%) were post-menopausal, had given birth to at least one child before age 30 years (73.5%) and had breastfed (74.0%); 83.9% had no family history of breast cancer. The proportion of missing data was less than 3% for the majority of these characteristics, however 7.8% of body mass index data was missing (Table 1).
Table 1. Characteristics at recruitment of women in the 45 and Up Study with incident invasive breast cancer, N = 2051
|Menopause and age at menopause (years)||No Menopause||251 (12.2)|
|Been through menopause – age unknown||463 (22.6)|
|2 or more||1587 (77.4)|
|Age at first birth (years)||<25||883 (43.1)|
|No children||265 (12.9)|
|No children||265 (12.9)|
|Body mass index (kg/m2)||<25||752 (36.7)|
|Use of hormone replacement therapy||Never||1143 (55.7)|
|Smoking status||Never||1269 (61.9)|
|Alcohol intake (standard drinks per week)||Non-drinker/past drinker||757 (36.9)|
|Mother/father or brother/sister with breast cancer||No||1720 (83.9)|
Table 2 summarises the routinely available data from the NSWCR for the incident breast cancer diagnosis. Of breast cancers notified, 52.3% were on the left side. Cancers were predominantly ductal in type (74.1%), 54.4% were localised to the breast at diagnosis and 24.2% were >50 mm in size. Table 3 shows the additional data we extracted through review of the pathology reports provided to the NSWCR. About one-fifth of cancers were reported as screen detected (22.3%), 27.4% were histological grade 3 (i.e. will probably grow and spread faster), 79.6% of cancers were oestrogen receptor positive, and 71.2% progesterone receptor positive. These data were mostly complete with only about 8% missing data for each descriptor. There were 11.3% of cancers reported as HER2 positive but, compared with oestrogen and progesterone receptor status, recording of HER2 receptor status was less complete with 20.7% of cancers having indeterminate or missing data in the pathology records and 11.2% recording an equivocal result. We did not find that the proportion of cases with missing/equivocal HER2 data improved with time nor did it differ by age at diagnosis, when comparing women <70 years with those ≥70 years (data not shown). About a third (33.7%) of women were lymph node positive and data on lymph node status was missing in 16.1% of reports; this differed substantially by age, with women aged <70 years having 11.8% missing values and ≥70 years 26.9% missing. There were 8.0% of breast cancer specimens assessed from the pathology report as having involved surgical margins, and this data was missing for 14.1% of women.
Table 2. Tumour characteristics for women in the 45 and Up Study with incident breast cancer, based on routinely available data from the NSWCR, N = 2051
|Age (years) at incident breast cancer||45–54||397 (19.4)|
|Best basis of diagnosisa||Cytology||9 (0.4)|
|Found postmortem||0 (0.0)|
|Death certificate||1 (0.1)|
|Histopathology viewed by central cancer registry||2009 (98.0)|
|Laterality||Unilateral – left||1072 (52.3)|
|Unilateral – right||966 (47.1)|
|Not reported||13 (0.6)|
|Morphological type||Ductal||1519 (74.1)|
|Invasiveness||Localised to tissue of origin||1116 (54.4)|
|Regional spread, adjacent organs and/or regional lymph nodes||751 (36.6)|
|Distant metastases||107 (5.2)|
|Size of breast cancer (mm)||<20||760 (37.1)|
a ‘Best basis of diagnosis’ is defined in the NSW Cancer Registry – data dictionary, available from: www.cherel.org.au/media/38825/e20-21538-nswcr-data-dictionary-2017-cim-release.pdf
Table 3. Tumour and surgical characteristics for women in the 45 and Up Study with incident breast cancer, based on data extracted from pathology reports, N = 2051
|Best available specimen typea||Biopsy||141 (6.9)|
|Not reported/missing||38 (1.9)|
|Best available tissue typea||Breast||2001 (97.6)|
|Lymph node||10 (0.5)|
|Not reported/ Missing||38 (1.9)|
|Screen detected||Yes||457 (22.3)|
|Histological grade||Grade 1||491 (23.9)|
|Grade 2||831 (40.5)|
|Grade 3||561 (27.4)|
|Not reported/missing||168 (8.2)|
|Oestrogen receptor status||Positive||1632 (79.6)|
|Awaiting result/not reported/missing||156 (7.6)|
|Progesterone receptor status||Positive||1460 (71.2)|
|Awaiting result/not reported/missing||159 (7.8)|
|Tumour epidermal growth factor receptor 2 (HER2) status||Positive||231 (11.3)|
|Awaiting result/not reported||424 (20.7)|
|Nodal status||0||1030 (50.2)|
|1–3 positive||496 (24.2)|
|4+ positive||195 (9.5)|
|Invasive carcinoma at resection margin||Yes||165 (8.0)|
|Unclear/not known/missing||289 (14.1)|
a Where multiple ‘specimen types’ or ‘tissue types’ were available for a cancer record a ‘best available’ item was created by applying a pre-specified hierarchy.
In the state of NSW, where one-third of the Australian population resides, we could access detailed breast cancer pathology data that was not routinely reported by the state’s cancer registry, on a large scale (more than 2000 breast cancer cases). The additional data ascertained included mode of cancer detection (i.e. screen detected), histological grade, hormone receptor status and nodal status; these are all vitally important characteristics for breast cancer research. We found that information in the pathology record for key breast cancer surgical data, such as oestrogen and progestogen receptor status, was relatively complete, (less than 8% missing). There were substantially higher proportions of missing data for HER2 status (21%) and lymph node status (16%).
The incidence of breast cancer in our cohort was broadly in line with all-age incidence rates reported from large cohort studies and registry data in developed countries.6-8 In Australia, among those aged 50–74 years, age-standardised breast cancer incidence has been reported to be about 300 per 100,0009, in line with the incidence rates found in this study. The proportions of incident breast cancers classified based on laterality, morphological type, oestrogen receptor status, HER2 status and nodal involvement in our data are also within the ranges reported in the international literature for women.10-15 However, the proportion of women testing progesterone receptor positive in our study (70%) appears greater than other published reports which find 50–65% of breast cancers tested progesterone receptor positive.10-12,15 Also the proportion of screen-detected breast cancers found in this study (22.3%) is substantially lower than that reported nationally in age-equivalent cohorts, suggesting possible incomplete recording on pathology reports.9,16-18
Although the data we extracted has great potential to support breast cancer research, we identified a number of limitations. Firstly, the routinely available registry data was not contemporary. The most current data available when data linkage was undertaken in May 2017 was for breast cancer diagnoses made to December 2012, and at the time of writing this report (2019), data was available only to December 2015. The time from receipt of a cancer notification by the NSWCR to its availability to researchers can significantly impact the ability to explore contemporary clinical practice and evaluate changes in policy and practice.
Secondly, the lack of structured and consistent reporting in the pathology reports affected both the effort required to extract the data and the uniformity of data. While the only data quality elements assessed in this study were accessibility, timeliness, completeness and consistency, considerable manipulation and cleaning of the data was required to resolve multiple, and sometimes conflicting, test results (eg HER2 testing), to create a single reference record of test results, and to interpret apparent short-hand recording of test results. Structured reporting was endorsed by the Royal College of Pathologists of Australasia (RCPA) in 2007 and protocols for structured reporting of breast cancer pathology were published in 2010 and revised in 2012.19 The RCPA has acknowledged the contribution of structured reporting to cancer control through better clinical management, registration and research.19,20 However this study found that adoption of structured reporting was not widespread, with the consequence that considerable effort and time was required to extract data, with the added potential for misinterpretation and data error. This study had access to pathology data only to 2012; it may be that the adoption of structured reports is now more widespread. However, based on our findings, we would recommend that action by the RCPA and other influential bodies is needed to increase the adoption of structured reports by pathologists. There are undoubtedly resource implications for the NSWCR to clean and make available additional cancer pathology data. However, factors such as receipt of high-quality and electronic pathology reports can reduce the cost per case and make collection of these data more cost effective.21
Also, although data completeness was generally good, some key data items (specifically HER2 results and lymph node status) still had significant proportions of missing values. Medicare Benefits Schedule (MBS) services data can provide some insight into HER2 testing. While IHC examination had stabilised by 2005 in Australia, ISH testing was not included on the MBS until 2012 and did not stabilise until 2013; also the annual volume of services for ISH is substantially greater than that for IHC.22 As a combination of these tests is required to determine HER2 status, the noncontemporary nature of the data available for this study may explain the relatively large proportion of missing data for HER2. Also, while we did not find a trend over time towards lower proportions of missing HER2 data in our sample, as most incident breast cancers in the cohort occurred between 2008–2012, examining a more contemporaneous sample would have been more informative regarding improvements over time in reporting. We also explored whether the proportion of HER2 results that were equivocal or missing differed by age at cancer diagnosis, which might have been the case if clinicians were less likely to request ISH testing in older women due to different approaches to breast cancer management with older age.23 However, our findings do not to support this. For nodal status, we did find that that the proportion of cases with missing data was greater in women aged 70 years and older. This could be explained by older women being less likely to undergo both breast and axillary surgery and hence nodal status remaining unknown.23
Finally, we observed that while the routinely available NSWCR data showed complete reporting of tumour size (i.e. no missing values), this was inconsistent with the pathology reports available for the manually extracted data. In this sample, 6.9% had a biopsy as the best available specimen, and tumour size cannot be ascertained from such a specimen. This suggests that we might not have been able to extract and review all the pathology reports that were related to the 2051 notified breast cancers or, that there may have been some inconsistency between the date of the report and the date of diagnosis on the breast cancer notification. This loss of data may also explain some of the missing data that we found for other variables of interest.
To accurately monitor progress on breast cancer in Australia, the routinely reported data available from the NSWCR needs to be supplemented with additional cancer pathology data, such as on hormone receptor status. We were able to show that the additional breast cancer pathology data are generally accessible through reports held by the NSWCR, and that all the key data items are recorded on the pathology report, to varying degrees of completeness. To enable more efficient large-scale research and monitoring of breast cancer in Australia, the NSWCR and other Australian registries should include a more extensive range of data items from the pathology notification in the routinely available data. Within the recognised limitations of resourcing for data processing by the registry, improvements in timely availability of data is also recommended.
The research was funded by a grant provided by the National Breast Cancer Foundation (NBCF) and was completed using data collected through the 45 and Up Study. The 45 and Up Study is managed by the Sax Institute (which also publishes Public Health Research & Practice) in collaboration with major partner Cancer Council NSW; and partners: the National Heart Foundation of Australia (NSW Division); NSW Ministry of Health; NSW Government Family & Community Services – Ageing, Carers and the Disability Council NSW; and the Australian Red Cross Blood Service. We thank the many thousands of people participating in the 45 and Up Study. We also thank the NSW CHeReL for conducting the linkage, the NSWCR for facilitating the linkage and Sue Edwards and Denise Bradfield for extracting the data for this study, and Dianne O’Connell from the NSW Cancer Council for her advice regarding data extraction.
Externally peer reviewed, not commissioned. Study author BL is a member of the Editorial Board and Associate Editor of Public Health Research & Practice but had no input in the peer review process for this manuscript.
© 2021 Bartlett et al. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence, which allows others to redistribute, adapt and share this work non-commercially provided they attribute the work and any adapted version of it is distributed under the same Creative Commons licence terms.