After applying the inclusion and exclusion criteria (Figure 1), a total of 39 studies were selected for this review. Findings are presented as a representative table of 12 studies in Table 1, and a complete harmonized supplement (Table S1) covering all 39 included studies. Key performance gains, such as median change in the area under the curve, were derived by synthesizing data from the subset of studies in Supplementary Table 1 that directly reported performance metrics for both a single-modality baseline model and the fused multimodal model.
Rationale for Multimodal Data Integration in CAD Risk Assessment
Current risk stratification largely focuses on a narrow set of variables, failing to exploit the “wealth of insights lying at various intersections of patient data.”
4 For instance, a standard risk calculator might consider a patient’s age, sex, smoking status, blood pressure, and cholesterol—but not their coronary calcium score, genetic predisposition, or daily exercise patterns. In reality, CAD risk is influenced by a confluence of factors spanning biological, clinical, and lifestyle domains. Multimodal data fusion refers to the integration of multiple heterogeneous data types into a unified predictive model.
2 From a methodological standpoint, the premise is that each data modality provides complementary information, capturing potentially orthogonal aspects of the disease process, and their combination can lead to richer feature representations and more robust model performance than any single modality alone. The informatics task is therefore to develop fusion techniques that can effectively leverage this complementarity. This entire process, from heterogeneous data collection through the methodological core to an actionable clinical prediction, is conceptually illustrated in
Figure 2. Indeed, a 2022 scoping review found that in studies comparing multimodal models to single-modality models, the multimodal approach achieved on average a 6.4% improvement in predictive accuracy.
2 While seemingly modest, this highlights a consistent methodological observation: the synergistic potential of integrated data. Such gains, often achieved through sophisticated ML approaches, can translate into significantly better risk stratification at the population level by reclassifying many patients into correct risk categories.
12 | Figure 2 Conceptual Framework for Multimodal Data Fusion in Precision CAD Risk Prediction. |
There are several compelling reasons, rooted in informatics principles, to pursue multimodal risk models:
-
Complementary data sources: Different modalities capture different aspects of CAD risk, presenting both an opportunity and a methodological challenge for integration. Imaging can quantify atherosclerotic burden (e.g. plaque volume or calcium) and ventricular function; genomics captures inherent genetic susceptibility; EHRs provide a longitudinal record of risk factors, comorbidities, and treatments; and wearables record real-time physiology and lifestyle indicators. Individually, each is an imperfect predictor, but together they provide a richer feature set for risk assessment.18–22 The methodological challenge lies in creating a unified model that can meaningfully combine these disparate data types, which vary in structure, temporality, and scale. For example, coronary calcium on a CT scan directly measures atherosclerosis, while a PRS reflects lifelong genetic risk; integrating the two could identify an individual with high genetic risk who has not yet developed calcified plaque, or vice versa.
-
Improved discrimination and reclassification: Multimodal models have demonstrated higher discrimination (C-statistic/AUC) and better patient risk reclassification than traditional tools, representing a key methodological advance. Early fusion modeling in cardiology, which methodologically combined clinical variables with imaging features, yielded superior prognostic performance compared to clinical scores alone.5–7 These improvements, while sometimes moderate, can be clinically meaningful—especially for borderline-risk patients where decisions (to start a statin, refer for further testing, etc.) are sensitive to risk estimates.1,12 From an informatics perspective, the ability of fused models to refine risk categories highlights their potential to enhance clinical decision support.
-
Capturing disease complexity and dynamics: CAD is a complex, multifactorial disease with non-linear interactions (e.g. diabetes exacerbating the effect of cholesterol, or genetics modulating response to lifestyle). Multimodal models, especially those based on AI, are methodologically better equipped to capture these interactions that traditional linear models often miss.3,23–26 They can also incorporate temporal data—for example, trends in blood pressure or cholesterol over time, or changes in plaque volume on serial scans—to reflect the evolving risk profile of a patient, a capability often lacking in static models.4,27 Li et al. demonstrated this by using repeated longitudinal EHR measurements (vitals, labs) in a ML model that outperformed a single-time-point risk score for predicting 5-year atherosclerotic cardiovascular disease.12 The ML model had a C-statistic of ~0.79 and showed improved calibration and decision curve utility over the guideline-recommended China-PAR risk equation. This study illustrates the methodological advantage conferred by leveraging temporal EHR data, where the trajectory and variability of risk factors can significantly enhance prediction beyond single snapshot assessments.12
Therefore, combining modalities is a logical step toward precision risk prediction—ensuring that each patient’s risk assessment leverages all available data about them, rather than only population-derived proxies. Below, we discuss each major data modality and the methodological implications of its integration into CAD risk models.
Key Data Modalities for CAD Risk Prediction
Imaging Biomarkers (CT, MRI, and Others).
Cardiovascular Imaging. Cardiovascular imaging provides direct visualization of structural and functional disease, making it a powerful tool for risk stratification. Methodologically, imaging biomarkers often represent quantitative or semi-quantitative features that offer a direct measure of the underlying pathology. In CAD, two non-invasive imaging approaches are prominent from an informatics integration perspective: CAC scoring and CCTA.
Coronary artery calcium scoring by non-contrast CT quantifies calcified plaque in the coronaries; decades of evidence have established CAC as one of the strongest predictors of future coronary events.28–30 An elevated CAC (Agatston) score reclassifies risk beyond traditional factors and has been incorporated into prevention guidelines (e.g. as a tiebreaker for statin decisions).31 From an informatics standpoint, CAC scores are relatively standardized numerical values that can be readily incorporated into statistical or ML models. In asymptomatic individuals, CAC can identify those at high risk even if clinical risk is moderate, and, vice versa, CAC=0 can downgrade risk (the so-called “power of zero”).5 By methodologically integrating CAC with clinical data, the Multi-Ethnic Study of Atherosclerosis (MESA) risk score was developed, demonstrating improved risk discrimination over clinical variables alone. As one study summarized, “Agatston calcium and MESA score are a powerful cardiovascular risk predictor” for future events.32
Coronary computed tomography angiography visualizes both calcified and non-calcified plaque and any luminal stenoses. Traditionally used diagnostically, CCTA also possesses significant prognostic value.5 Beyond stenosis, plaque characteristics seen on CCTA (often termed “high-risk plaque” features, such as positive remodeling, low attenuation core, napkin-ring sign) confer incremental risk information.33,34 For example, patients with high-risk plaque features on CCTA have higher rates of future acute coronary syndromes independent of stenosis severity.33 Coronary computed tomography angiography can thus identify individuals with vulnerable plaque who might benefit from aggressive therapy even if no severe stenosis is present.5 A key informatics advancement is the use of AI-driven tools to automatically quantify plaque burden and subtype on CCTA, enabling the extraction of rich, quantitative imaging biomarkers for large-scale use in fusion models.35,36 For instance, an AI prototype can now output stenosis measurements and a Coronary Artery Disease Reporting and Data System classification from CCTA images, and others can measure plaque volumes and detect features like low-attenuation plaque.5 Such quantitative imaging biomarkers, when combined with clinical and lab data, hold promise for refined, methodologically sound risk models.
Echocardiography and cardiac MRI (CMR) provide additional functional biomarkers relevant to risk,5 particularly for heart failure and cardiomyopathies, which often coexist or contribute to CAD outcomes. Left ventricular ejection fraction is a well-known prognostic marker.37,38 Left ventricular ejection fraction and other measures (global longitudinal strain from echo, or late gadolinium enhancement from CMR indicating scar) can thus enhance risk prediction beyond atherosclerotic burden alone.35 For example, in patients with dilated cardiomyopathy, methodologically combining multiparametric CMR (fibrosis, function) with clinical data improved prediction of sudden cardiac death.39 Automated CMR analysis using AI, which can rapidly derive ventricular volumes and function, is an important informatics development for supplying these metrics into risk models.5 Nuclear imaging (SPECT/positron emission tomography perfusion) also provides ischemia and viability information; one study showed that fusing clinical variables with SPECT data yielded an AUC of 0.81 for predicting major adverse cardiovascular events, slightly better than 0.78 with imaging alone, illustrating the additive value from a methodological fusion perspective.7,40
Integration of Imaging with Other Modalities: Methodological Considerations. The additive value of imaging has been demonstrated in several fusion studies, highlighting a core principle in biomedical informatics: integrating direct phenotypic assessments with other data types enhances predictive power. As noted, Motwani et al. showed significant gains by adding CCTA features to clinical risk factors.6 Likewise, Betancur et al. improved major adverse cardiovascular events prediction by integrating SPECT findings with patient data.7 Al’Aref et al. combined clinical factors with the CAC score to predict obstructive CAD on CCTA, achieving a fusion model AUC of 0.88, outperforming the clinical model (0.77) and slightly exceeding imaging alone (0.87).40 These results underscore that while imaging biomarkers are often strong predictors, their optimal use, methodologically, is in concert with other patient information. In general, imaging adds a personalized “phenotypic” layer on top of clinical risk profiles—essentially measuring the disease process directly—and thus can substantially refine risk estimates when integrated appropriately within a robust informatics framework.
Genomic and Molecular Data (PRS and Beyond). Genetic predisposition plays a significant role in CAD risk. Polygenic risk scores (PRS) aggregate the effect of many common genetic variants into a single score representing an individual’s inherited risk for CAD.41 Methodologically, PRS provide a static, lifelong estimate of genetic susceptibility. Over the past decade, researchers have developed and validated PRS for CAD that can stratify individuals by their genetic risk. For example, one analysis found that about 8% of the population have a polygenic profile conferring a ≥3-fold increased risk of CAD.42 Another study reported that people in the top quintile of a CAD PRS had ~90% higher relative risk of coronary events.43 These findings underscore that genetics can identify a subset of individuals with substantially elevated baseline risk from birth. Unlike most risk factors, the genome is fixed—making PRS a potentially powerful tool for early risk prediction, even before traditional risk factors manifest, a unique characteristic from an informatics integration perspective.44
The clinical utility of PRS is an area of active research and methodological refinement. A comprehensive review by Klarin and Natarajan concluded that the PRS predict incident CAD and can modulate the expected benefit from preventive therapies.41 For instance, individuals with high PRS derived greater absolute benefit from statin therapy, suggesting PRS might help personalize preventive interventions. Polygenic risk scores are also being studied for guiding decisions like earlier screening.41 However, PRS are not deterministic; they interact with environment and behavior. Notably, even those with high genetic risk can significantly cut their risk through healthy lifestyle changes.43 This interaction highlights the methodological imperative to integrate genetics with other data modalities.
Integrating Genomics with Other Data: Methodological Approaches. The most straightforward fusion method involves adding PRS to established clinical risk models. Several studies have shown that incurporating PRS into clinical risk equations improves discrimination and net reclassification, demonstrating its incremental methodological value.41 For example, Inouye et al. demonstrated that genome-wide PRS added to traditional risk factors significantly reclassified individuals’ 10-year CAD risk categories.44 Another study found that combining a PRS with a person’s CAC score provides complementary risk information: the PRS captures lifelong predisposition, while CAC reflects accumulated disease.45 Methodologically, this combines a static genetic marker with a dynamic phenotypic marker. In middle-aged adults, a high PRS can identify those at risk before they develop detectable coronary calcium, whereas CAC scoring can capture risk not explained by genetics.45 Indeed, recent work reported that both PRS and CAC were independent predictors of coronary events, and using them together yielded better risk discrimination than either alone.46 This type of multimodal genetic-imaging approach could be particularly useful for risk stratification in individuals with intermediate clinical risk.
Beyond polygenic scores, other “omics” data are emerging, presenting new methodological opportunities and challenges for informatics. Plasma proteomics and metabolomics can provide molecular fingerprints of disease activity.47 These have been used to generate proteomic risk scores, which, when combined with genomics and clinical data, might further refine risk stratification.48 However, such multi-omic integration is methodologically less mature compared to genomics and imaging.49 Gene–environment interactions are also relevant: integrating data on lifestyle with genetic risk can identify individuals whose genetic risk is being modulated by their behaviors.43 Overall, genomics adds a “baseline risk” anchor—stratifying individuals by inherent risk from an early age—which can be methodologically layered with dynamic clinical and imaging data that accumulate over time.50 As informatics tools for genomic data mature and costs fall, genomic data will likely be increasingly integrated into routine CAD risk assessments.
Electronic Health Records and Clinical Data. The EHR contains a trove of longitudinal patient information, including demographics, medical history, diagnoses, medications, vital signs, laboratory results, and physician notes. Traditionally, risk models only utilize a few selected variables from this rich source. Multimodal EHR-based modeling, as an informatics endeavor, aims to harness a much broader swath of EHR data, often longitudinally, for risk prediction.12 Recent advances in data mining and ML have made it feasible to methodologically incorporate dozens or even hundreds of EHR features simultaneously into a predictive model.51 For example, algorithms can be fed a patient’s entire history of lab values, vital signs over time, and medication records.12
A prime example is the study by Li et al. involving over 200,000 Chinese adults.12 They extracted 25 repeated clinical measurements per person over time and used ML (eXtreme Gradient Boosting and Least Absolute Shrinkage and Selection Operator regression) to predict 5-year atherosclerotic cardiovascular disease events. The model achieved a C-statistic of ~0.79 and showed significantly improved calibration and decision curve analysis compared to the guideline-based China-PAR risk score. Although AUC gains were modest (~0.03–0.04), the improvement in risk classification is impactful. This study methodologically illustrates how mining temporal EHR data (trajectories and variability of risk factors) can enhance prediction beyond static models.
Another dimension of EHR data for informatics exploration is unstructured text, such as clinical notes and reports.1 These often contain valuable insights not captured in structured fields. Natural language processing algorithms can convert free text into features for risk models, representing a significant methodological tool.52 For instance, a natural language processing pipeline might identify mentions of “angina” as additional risk indicators. The integration of such unstructured data with structured data is a frontier of multimodal fusion, with early work suggesting modest improvements in risk prediction and the potential to uncover novel risk factors.18–22,52
Electronic health record data fusion is central to the concept of a “learning health system,” where routine clinical data continuously feeds into risk models that update and improve methodologically over time.1 A key informatics challenge, however, is standardizing and cleaning EHR data, as it can be fragmented and suffer from missingness. Methodologies like data imputation and generative models (e.g. generative adversarial networks to fill missing lab values) have been explored to address this.53–55
Integration of EHR with Other Modalities. In most multimodal models, clinical/EHR data serve as the foundational layer. Methodologically, this integration occurs across several dimensions. First is the use of baseline structured data (demographics, diagnoses, baseline labs) which provide essential context; for example, the presence of diabetes or hypertension profoundly influences the interpretation of a given CAC score or gene variant.
Second, and more powerfully, is the methodological strength of using longitudinal EHR data. Static, single-time-point models are being outperformed by ML models that integrate repeated measurements over time. A prime example is the study by Li et al. which integrated demographics, medications, and irregularly repeated laboratory and physiological measurements from over 200,000 adults.12 Their ML model demonstrated improved 5-year atherosclerotic cardiovascular disease prediction over the guideline-recommended Cox model (C-statistic ~0.79), primarily by capturing the trajectory and variability of risk factors.12
Third is the exploration of unstructured data using natural language processing to extract features from clinical notes (e.g. mentions of “angina”), which may offer modest improvements.
Finally, EHR data are commonly used in late-fusion strategies with other modalities. For example, Zhao et al. demonstrated an EHR-genetic late fusion model for predicting CAD events, which outperformed using EHR data alone, illustrating one methodological approach to merge these data types.56
Wearable and Sensor Data. The proliferation of wearable devices has introduced a new modality for risk assessment: continuous or high-frequency monitoring of physiological and behavioral markers. From an informatics perspective, data from wearable devices represent high-velocity, high-volume time-series data that can capture aspects of health and lifestyle difficult to measure in clinic visits—e.g. daily step count, heart rate variability, sleep patterns, and arrhythmias. These factors can modulate CAD risk and may serve as early warning signals. For instance, wearables provide a quantifiable window into parameters like physical activity and sleep, which are linked to cardiovascular risk.
Several studies and prototypes have explored methodologically integrating wearable sensor data into cardiovascular risk models. Ali et al. proposed a comprehensive smart healthcare monitoring system for CVD prediction that fuses electronic medical record data with wearable sensor data.57 Their conceptual framework outlines how vital signs and biosignals from wearables (ECG, blood pressure, etc.) are continuously collected and combined with medical records to generate dynamic risk alerts, highlighting the informatics challenge of real-time data integration and analysis. Zhang et al. developed a tool to triage acute chest pain by early fusion of multimodal signals—ECG, heart sounds, echocardiography, Holter data, and biomarkers—demonstrating the feasibility of merging wearable-device data with imaging and labs for acute risk stratification.58 Similarly, Li et al. combined ECG and phonocardiogram features, showing that this dual-sensor approach methodologically improved prediction over single-sensor models.59
In terms of outcomes, some studies have linked wearable-derived metrics to hard events. Persistent tachycardia or reduced heart rate variability can signal higher risk. Large-scale projects like the Apple Heart Study hint at how wearables could identify at-risk individuals. Future integration may include data from continuous blood pressure and glucose monitors. One study showed wearable sensor data could predict certain lab test abnormalities, suggesting it reflects underlying physiology relevant to cardiovascular stress, an interesting avenue for informatics exploration.60
Methodological Challenges and Opportunities with Wearables. Data from wearable devices are inherently noisy and highly individualized, posing significant informatics challenges in ensuring data quality, handling missing periods, and minimizing false alarms. However, AI models, especially deep learning, are methodologically well-suited for finding signals in noisy time-series data. Recurrent neural networks or transformers can ingest long sequences of sensor readings to detect subtle patterns indicative of risk. Integrating wearable-device data with EHR data is a new methodological frontier; an AI model could potentially flag patients for higher near-term risk based on anomalous trends in wearable-device data. In summary, wearable devices provide a continuous, lifestyle-integrated data modality that complements traditional data sources. When fused, wearables could help capture the impact of daily behaviors and early physiological changes on CAD risk, making risk prediction more dynamic and personalized—potentially evolving into a living risk score. While direct outcome prediction evidence is still emerging, the incorporation of wearables into risk models is a promising area for future informatics research.
AI and ML Techniques for Multimodal Fusion
Integrating diverse data types into a cohesive predictive model is a complex informatics task. Machine learning and AI methods are the linchpin enabling effective multimodal data fusion for CAD risk prediction. Unlike traditional regression techniques, which often struggle with high-dimensional, heterogeneous inputs, modern ML, especially deep learning, can handle large multimodal feature spaces and uncover complex non-linear relationships.
61 These capabilities are crucial for advancing beyond simplistic models to those that truly reflect the multifaceted nature of CAD. Here, we outline key methodological approaches and advancements in this domain.
Early versus Late versus Intermediate Fusion: Methodological Considerations. In ML parlance, early fusion involves concatenating all input data (after appropriate preprocessing) and feeding it into a single model. Late fusion entails building separate models for each modality and then combining their predictions.5 Intermediate (mid-level) fusion involves merging data at an intermediate layer, for example, by combining learned features from separate sub-networks dedicated to each modality.62 Each strategy presents distinct methodological advantages and disadvantages. Early fusion, by concatenating inputs, methodologically allows for the model to learn cross-modal interactions from the raw (or minimally processed) data but can lead to very high-dimensional feature spaces. This poses optimization challenges and increases the risk of overfitting if not managed with appropriate regularization techniques or sufficiently large datasets. Conversely, late fusion is architecturally simpler and preserves modality-specific performance as each sub-model optimizes on its data; however, it methodologically risks missing synergistic feature interactions that might only be apparent when features are combined at earlier stages. Intermediate fusion offers a methodological compromise, aiming to learn modality-specific representations in initial layers before merging them in deeper layers, thus enabling both specialized feature extraction and joint interaction modeling.2 The choice of fusion strategy is therefore a critical methodological decision, contingent on dataset characteristics, the nature of inter-modal relationships, computational resources, and the specific research question. In practice, many CAD fusion studies have utilized late fusion, often combining outputs or risk scores via a meta-classifier.5 However, there is an evident trend toward more integrated approaches like intermediate fusion, particularly with the rise of deep learning architectures.
Deep Learning Architectures: A Methodological Paradigm for Fusion. Deep learning has revolutionized data analysis in many fields, and its application to multimodal fusion in healthcare is a significant methodological advancement. Convolutional neural networks (CNNs) excel at imaging analysis, while recurrent neural networks or transformers are well-suited for sequential data like time-stamped EHR entries or wearable-device time series. For multimodal fusion, researchers often construct multi-branch neural networks. This architecture represents a powerful methodological paradigm, allowing for tailored processing of each data type (e.g. a CNN branch for CT/MRI data, a multilayer perceptron or transformer branch for tabular EHR data, and another for genomic data). These branches then merge (concatenate their learned feature representations) at some point to produce a unified prediction, inherently supporting intermediate fusion.5 Such architectures have shown success; one model combining clinical variables and CCTA images through deep learning improved risk prediction of mortality over models using either clinical or imaging data alone. Another deep learning model fused fundus photography with patient demographics to predict CAD, employing a graph convolutional neural network to handle the multimodal data structure, showcasing the flexibility of these advanced methods.5
Graph-Based Fusion: An Emerging Methodological Frontier. An emerging technique is representing multimodal data within a graph structure, where nodes can represent patients or data elements (e.g. specific biomarkers, genetic variants, clinical events) and edges represent relationships or similarities between them. Graph convolutional neural networks, generally referred to as graph convolutional networks (GCN), can then learn representations from this graph, effectively fusing information in the process.35 This approach offers a natural way to represent and learn from complex relationships within and between different data modalities and patient entities. Huang et al. used a GCN to combine vascular biomarkers from retinal images with clinical characteristics to predict CAD, treating different data sources as interconnected nodes.62 Methodologically, graph-based approaches are especially useful when data elements have inherent network structures (e.g. genes in pathways, patients in social networks) or when one wants to integrate knowledge graphs with patient data. In CAD, one could envision a graph where a patient node connects to nodes representing their risk factors, imaging findings, genetic variants, etc., and a graph neural network learns which connections are most predictive of outcomes.35 This is still a cutting-edge approach but holds promise for integrating disparate data while preserving and leveraging complex relationships, a distinct methodological advantage over traditional feature vector-based methods.
Handling Missing Data and Heterogeneity: A Core Informatics Challenge. A ubiquitous methodological challenge in real-world multimodal datasets is that not every patient will have every data type (e.g. not all patients undergo MRI or genetic testing). Machine learning models must handle such missing modalities gracefully, and robust informatics solutions are crucial. Solutions include imputation techniques, which range from simple statistical methods to sophisticated ML-based approaches for filling in missing values. Generative models, such as generative adversarial networks and variational autoencoders, can be trained to generate one modality from another—for example, to predict what a patient’s imaging might look like given their clinical profile. Methodologically, these generative approaches can learn the underlying data distributions and relationships between modalities to create plausible synthetic data, thereby allowing a full feature vector for every patient, though their use requires careful validation to avoid introducing bias.39 While not yet common in CAD risk modeling, these techniques could help utilize partial data more effectively. Another approach is to design models that can accept variable inputs, outputting a prediction even if one modality is absent, perhaps with an associated uncertainty penalty. This flexibility will be crucial for real-world deployment, as complete data availability is rare outside curated research cohorts.
Automated Feature Extraction: A Methodological Shift. A barrier in earlier fusion studies was the need for manual feature extraction—e.g. a human or separate software had to quantify plaque from images or curate EHR variables, a labor-intensive process.39 New AI tools automate this, representing a significant methodological advancement. Computer vision can extract dozens of imaging features (volumes, textures, etc.) from CT/MRI, and natural language processing can pull key concepts from text records.5 This automation greatly expands the feasible feature set. As noted, CNNs can process raw images directly, eliminating manual selection of imaging biomarkers. Similarly, raw lab time-series can be input into a recurrent neural network without manual summarization. This means multimodal models can consider “thousands of different parameters” to potentially identify novel predictive patterns.5 The downside is an increased risk of overfitting or learning spurious correlations when so many features are considered, necessitating larger training datasets and rigorous validation strategies.5
Explainability and Model Interpretation: A Paramount Methodological Concern. Given the “black box” nature of many advanced ML models, ensuring model interpretability is a paramount methodological concern, especially for clinical acceptance and trust. Techniques like SHapley Additive exPlanations or integrated gradients can help interpret which features (or even modalities) are driving a specific prediction for an individual patient. For example, an explainable multimodal model might indicate that a high CAC score combined with a high LDL level was the top contributor to a patient’s high-risk prediction, while for another, it might be a high PRS coupled with blood pressure variability. Such insights not only build trust that the model aligns with medical reasoning or can be rationalized but can also reveal new risk factors or interactions. From an informatics perspective, developing and validating robust explainability methods for complex multimodal models is essential for facilitating clinical translation, ensuring responsible AI deployment, and potentially uncovering new scientific insights.
To recapitulate, AI and ML techniques form the engine of multimodal data fusion, providing the methodological toolkit to handle complex, high-dimensional, and heterogeneous data that traditional statistical models often cannot. The choice of fusion strategy (e.g. early, late, intermediate) and model architecture (e.g. multi-branch neural networks, GCNs) is a critical methodological decision, often tailored to the specific dataset characteristics, the nature of the data modalities, and the prediction task at hand. One survey indicated that early fusion was a common strategy in health ML literature and that multimodal models generally outperformed single-modality models. However, these advanced models also present challenges, such as the need for large training datasets and ensuring generalizability and interpretability, which are active areas of methodological research.