Publications

A Unified Review of Deep Learning for Automated Medical Coding

Published in Preprint, 2022

Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning models in natural language processing have been widely applied to this task. However, it lacks a unified view of the design of neural network architectures for medical coding. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we discuss key research challenges and future directions.

Recommended citation: Ji, Shaoxiong, et al. "A Unified Review of Deep Learning for Automated Medical Coding." arXiv preprint arXiv:2201.02797 (2022). https://arxiv.org/abs/2201.02797

Multitask Balanced and Recalibrated Network for Medical Code Prediction

Published in Preprint, 2021

Human coders assign standardized medical codes to clinical documents generated during patients’ hospitalization, which is error-prone and labor-intensive. Automated medical coding approaches have been developed using machine learning methods such as deep neural networks. Nevertheless, automated medical coding is still challenging because of the imbalanced class problem, complex code association, and noise in lengthy documents. To solve these issues, we propose a novel neural network called Multitask Balanced and Recalibrated Neural Network. Significantly, the multitask learning scheme shares the relationship knowledge between different code branches to capture the code association. A recalibrated aggregation module is developed by cascading convolutional blocks to extract high-level semantic features that mitigate the impact of noise in documents. Also, the cascaded structure of the recalibrated module can benefit the learning from lengthy notes. To solve the class imbalanced problem, we deploy the focal loss to redistribute the attention of low and high-frequency medical codes. Experimental results show that our proposed model outperforms competitive baselines on a real-world clinical dataset MIMIC-III.

Recommended citation: Sun, Wei, et al. "Multitask Balanced and Recalibrated Network for Medical Code Prediction." arXiv preprint arXiv:2109.02418 (2021). https://arxiv.org/pdf/2109.02418.pdf

Multitask Recalibrated Aggregation Network for Medical Code Prediction

Published in ECML 2021, 2021

Medical coding translates professionally written medical reports into standardized codes, which is an essential part of medical information systems and health insurance reimbursement. Manual coding by trained human coders is time-consuming and error-prone. Thus, automated coding algorithms have been developed, building especially on the recent advances in machine learning and deep neural networks. To solve the challenges of encoding lengthy and noisy clinical documents and capturing code associations, we propose a multitask recalibrated aggregation network. In particular, multitask learning shares information across different coding schemes and captures the dependencies between different medical codes. Feature recalibration and aggregation in shared modules enhance representation learning for lengthy notes. Experiments with a real-world MIMIC-III dataset show significantly improved predictive performance.

Recommended citation: Sun, Wei, et al. "Multitask Recalibrated Aggregation Network for Medical Code Prediction." arXiv preprint arXiv:2104.00952 (2021). https://link.springer.com/chapter/10.1007/978-3-030-86514-6_23