Research Article
Cancer Research
IF: 0
Q1

Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients

European Journal of Cancer2021Vol. 144: 224-231
2
Citas
0
Visualizaciones
N/A
Descargas
N/A
Altmetric Score
17/12/2020
Publicado
Autores
Nuria Ribelles

Nuria Ribelles
Correspondencia

Hospital Universitario Virgen del Rocío, Seville, Spain

Pablo Rodriguez-Brazarola

Pablo Rodriguez-Brazarola

University of Málaga, Spain

Begoña Jimenez

Begoña Jimenez

Hospital Universitario Virgen del Rocío, Seville, Spain

Tamara Diaz-Redondo

Tamara Diaz-Redondo

Hospital Universitario Virgen del Rocío, Seville, Spain

Antonia Marquez

Antonia Marquez

Hospital Universitario Virgen del Rocío, Seville, Spain

Alfonso Sanchez-Muñoz

Alfonso Sanchez-Muñoz

Hospital Universitario Virgen del Rocío, Seville, Spain

Bella Pajares

Bella Pajares

Hospital Universitario Virgen del Rocío, Seville, Spain

Francisco Carabantes

Francisco Carabantes

Hospital Universitario Virgen del Rocío, Seville, Spain

Maria J. Bermejo

Maria J. Bermejo

Hospital Universitario Virgen del Rocío, Seville, Spain

Ester Villar

Ester Villar

Hospital Universitario Virgen del Rocío, Seville, Spain

Maria E. Dominguez-Recio

Maria E. Dominguez-Recio

Hospital Universitario Virgen del Rocío, Seville, Spain

Enrique Saez

Enrique Saez

Hospital Universitario Virgen del Rocío, Seville, Spain

Laura Galvez

Laura Galvez

Hospital Universitario Virgen del Rocío, Seville, Spain

Ana Godoy

Ana Godoy

Hospital Universitario Virgen del Rocío, Seville, Spain

Sofia Ruiz-Medina

Sofia Ruiz-Medina

Hospital Universitario Virgen del Rocío, Seville, Spain

Irene Lopez

Irene Lopez

Hospital Universitario Virgen del Rocío, Seville, Spain

Emilio Alba

Emilio Alba

Hospital Universitario Virgen del Rocío, Seville, Spain

Resumen

Background: CDK4/6 inhibitors plus endocrine therapies are the current standard of care in the first-line treatment of HR+/HER2-negative metastatic breast cancer, but there are no well-established clinical or molecular predictive factors for patient response. In the era of personalised oncology, new approaches for developing predictive models of response are needed.

Materials and methods: Data derived from the electronic health records (EHRs) of real-world patients with HR+/HER2-negative advanced breast cancer were used to develop predictive models for early and late progression to first-line treatment. Two machine learning approaches were used: a classic approach using a data set of manually extracted features from reviewed (EHR) patients, and a second approach using natural language processing (NLP) of free-text clinical notes recorded during medical visits.

Results: Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment, of which 126 (20.6%) occurred within the first 6 months. There were 152 patients (24.9%) who showed no disease progression before 28 months from the onset of first-line treatment. The best predictive model for early progression using the manually extracted dataset achieved an area under the curve (AUC) of 0.734 (95% CI 0.687–0.782). Using the NLP free-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714–0.800). The best model to predict long responders using manually extracted data obtained an AUC of 0.669 (95% CI 0.608–0.730). With NLP free-text processing, the best model attained an AUC of 0.752 (95% CI 0.705–0.799).

Conclusions: Using machine learning methods, we developed predictive models for early and late progression to first-line treatment of HR+/HER2-negative metastatic breast cancer, also finding that NLP-based machine learning models are slightly better than predictive models based on manually obtained data.

Palabras Clave
Breast cancer
Hormone receptor positive
CDK4/6-inhibitors
Machine learning
Natural language processing
Electronic health records
Acceso a la Publicación
Información de Publicación
Volumen
144
Páginas
224-231
Publicado
17/12/2020
Métricas de Impacto
Citas2
Factor de Impacto0
Cuartil
Q1
00