Revista JCR
Procesamiento de Lenguaje Natural
IF: 0

182P Evaluating large (LLM) versus small language models (SLM) in summarizing real-life oncology clinical narratives in Spanish

Beatriz Villaescusa-Gonzalez, Francisco J. Moreno-Barea, Alberto T. Girona, Alejandro Silva, Nuria Ribelles, José M. Jerez

ESMO Real World Data and Digital Oncology2025Vol. 10: 100379
0
Citas
12
Visualizaciones
4
Descargas
2
Altmetric Score
1/11/2025
Publicado
Autores
Beatriz Villaescusa-Gonzalez

Beatriz Villaescusa-Gonzalez
Correspondencia

Unidad de Gestión Clínica Intercentros de Oncología, Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain

Alberto T. Girona

Alberto T. Girona

Unidad de Gestión Clínica Intercentros de Oncología, Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain

Alejandro Silva

Alejandro Silva

Departamento de Lenguajes y Ciencias de la Computación, Escuela Técnica Superior de Ingeniería Informática, Universidad de Málaga, Málaga, Spain

Nuria Ribelles

Nuria Ribelles

Unidad de Gestión Clínica Intercentros de Oncología, Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain

Resumen

Background: The digitalization of healthcare processes has generated an unprecedented volume of data, and substantial interest has arisen in extracting meaningful clinical variables from Electronic Health Records (EHRs). Due to stringent privacy regulations and limited computational resources in public healthcare, the deployment of advanced LLMs is hindered. There is a critical need for SLMs that can perform local and secure EHR summarization. Methods: We explored the use of LLMs for the automatic generation of semi-structured summaries from oncology EHRs, adopting two classes of LLMs: GPT-4.1 (via OpenAI API), and an SLM open-source own deployed using the Ollama software framework. To evaluate the quality of summaries, an assessment protocol was devised comprising three principal dimensions: the presence of errors or fabricated statements (hallucinations), omissions of relevant information, and the readability of the text, rated on a scale of 1 to 3. The Final Score was computed as a weighted sum (0.4 × Hallucinations + 0.4 × Omissions + 0.2 × Readability), and summaries were classified as Adequate (≥0.8), Fair (0.6–0.8), or Insufficient (<0.6). Two expert medical evaluators independently reviewed the generated texts. Results: A total of 50 summaries were evaluated, and the results with our SLM model are presented. The mean Final Score was 71.3% (SD = 16.9). Eighteen summaries (36.0%) were rated as Adequate; 17 (34.0%) as Fair; 15 (30.0%) as Insufficient. Most summaries were free from hallucinations, but clinically relevant omissions were frequent, constituting the main limiting factor in overall quality scores. Readability was primarily scored at levels 2 and 3, indicating that while the majority of summaries were understandable, only a subset achieved the highest level of clarity and structural coherence. Conclusions: This study demonstrates that SLMs can provide a feasible pathway towards automatic summarisation that respects the privacy and computational limitations of public healthcare systems, representing a promising solution for the safe and local implementation of AI-driven summarisation in oncology. A comparison between the two models' summarisation will be presented at the meeting.

Palabras Clave
Natural language processing
Large language models
Small language models
Summarization
Electronic health records
Acceso a la Publicación
Información de Publicación
Volumen
10
Páginas
100379
Publicado
1/11/2025
Recibido
31/7/2025
Aceptado
19/9/2025
Métricas de Impacto
Citas0
Factor de Impacto0
Cuartil
Visualizaciones12
Descargas4
Altmetric2