Skip to main content
. 2020 Oct 27;20(21):6100. doi: 10.3390/s20216100

Table 3.

Summary of challenges in applying ML techniques in HSCT.

Challenges Reasons Potential Solution
Limited Data Capture
  • Complex HSCT procedure with numerous post-transplant complications

  • Lack of continuous and real-time capture of various data streams involved

  • Mix of automated and manual data capture

  • Utilize wearable sensor devices or leverage mHealth platforms for robust data collection

Data Quality Issues
  • Lot of missingness and inconsistencies due to complex data collection procedures

  • Loss of important variables lead to loss of relevant information

  • Developing autonomous, adaptive, and online preprocessing algorithms that can automatically capture the data quality issues and resolve them by employing appropriate techniques in real-time

High Dimensional Data
  • Large number of clinical and/or genomic variables associated with the HSCT outcome

  • Developing novel streaming dimension reduction techniques for efficient processing of large number of features associated with the HSCT outcome

Data Privacy Issues
  • Large amount of sensitive patient data is required in building predictive models due to numerous factors involved

  • Combining multiple data streams from disperse data stores leads to potential data privacy issues

  • Developing appropriate privacy measures, such as data anonymization techniques to ensure complete privacy of patients’ data

  • Using technique such as “federated learning” [47] that trains a shared global model via a centralized aggregation server, while keeping sensitive data in local institutions of their origin

  • Enabling some form of privacy access control to different data streams that can ensure that only those with proper authorization can access a patient’s data streams

Obsolete Predictive Models
  • Dynamic evolution of disease states in patients undergoing HSCT

  • Developing adaptive ML techniques having capability of detecting data changes over time and adapting accordingly

Diverse Data Types
  • Captured data are of different modalities and sampled at different rates

  • Multi-modal data integration techniques using deep learning has to be developed for effective integration

Data Integration issues
  • Most of the captured data are typically dispersed among various data stores (e.g., cloud storage, EHR, individually-managed databases)

  • Using mHealth platforms could be a potential solution.