Beatriz Villaescusa-Gonzalez, Francisco J. Moreno-Barea, Alberto T. Girona, Alejandro Silva, Nuria Ribelles, José M. Jerez

Unidad de Gestión Clínica Intercentros de Oncología, Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain

Unidad de Gestión Clínica Intercentros de Oncología, Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain

Departamento de Lenguajes y Ciencias de la Computación, Escuela Técnica Superior de Ingeniería Informática, Universidad de Málaga, Málaga, Spain

Unidad de Gestión Clínica Intercentros de Oncología, Hospitales Universitarios Regional y Virgen de la Victoria, Málaga, Spain
Background: The digitalization of healthcare processes has generated an unprecedented volume of data, and substantial interest has arisen in extracting meaningful clinical variables from Electronic Health Records (EHRs). Due to stringent privacy regulations and limited computational resources in public healthcare, the deployment of advanced LLMs is hindered. There is a critical need for SLMs that can perform local and secure EHR summarization. Methods: We explored the use of LLMs for the automatic generation of semi-structured summaries from oncology EHRs, adopting two classes of LLMs: GPT-4.1 (via OpenAI API), and an SLM open-source own deployed using the Ollama software framework. To evaluate the quality of summaries, an assessment protocol was devised comprising three principal dimensions: the presence of errors or fabricated statements (hallucinations), omissions of relevant information, and the readability of the text, rated on a scale of 1 to 3. The Final Score was computed as a weighted sum (0.4 × Hallucinations + 0.4 × Omissions + 0.2 × Readability), and summaries were classified as Adequate (≥0.8), Fair (0.6–0.8), or Insufficient (<0.6). Two expert medical evaluators independently reviewed the generated texts. Results: A total of 50 summaries were evaluated, and the results with our SLM model are presented. The mean Final Score was 71.3% (SD = 16.9). Eighteen summaries (36.0%) were rated as Adequate; 17 (34.0%) as Fair; 15 (30.0%) as Insufficient. Most summaries were free from hallucinations, but clinically relevant omissions were frequent, constituting the main limiting factor in overall quality scores. Readability was primarily scored at levels 2 and 3, indicating that while the majority of summaries were understandable, only a subset achieved the highest level of clarity and structural coherence. Conclusions: This study demonstrates that SLMs can provide a feasible pathway towards automatic summarisation that respects the privacy and computational limitations of public healthcare systems, representing a promising solution for the safe and local implementation of AI-driven summarisation in oncology. A comparison between the two models' summarisation will be presented at the meeting.