Multimodal Electronic Medical Record Disease Classification Combining Image and Text Data
Keywords:
Multimodal Fusion, Modality Alignment, Self-Attention Mechanism, Tumor Segmentation, Electronic Medical RecordsAbstract
Multimodal electronic health records (EHRs) in the healthcare field contain rich text and image data, providing crucial information for the diagnosis and classification of complex diseases. However, most existing research focuses on single-modality data analysis, overlooking the potential complementarity between different modalities. Furthermore, even the few proposed multimodal fusion methods still suffer from issues such as imprecise modality alignment, poor feature fusion, and high module design complexity. To overcome these challenges, this study proposes a multimodal fusion-based EHR disease classification method specifically for the task of tumor segmentation. The main contribution of this work consists of designing a novel multimodal fusion model that has the modality alignment, self-attention, residual connection, and dynamic weighting of features in one model. This combination is not only dealing with the serious problems of feature misalignment and information loss but also it is the ability of fusion to be flexible extending the contribution of each modality to dynamically change.Our model combines modality alignment, self-attention, residual connections, and a dynamic feature weighting module to effectively fuse text and image data. We conducted comprehensive experiments on the BraTS 2015 and BraTS 2018 datasets. The results demonstrate that the proposed method significantly outperforms existing methods in multiple metrics, including the Dice coefficient, positive predictive value (PPV), and sensitivity.
Published
Issue
Section
License
Copyright (c) 2025 Journal of Management Science and Operations

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.