Aim To assess machine-learning models, their methodological quality, compare their performance, and highlight their limitations. Methods The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations were applied. Electronic databases Science Direct, MEDLINE through (PubMed, Google Scholar), EBSCO, ERIC, and CINAHL were searched for the period of January 2016 to September 2023. Using a pre-designed data extraction sheet, the review data were extracted. Big data, risk assessment, colorectal cancer, and artificial intelligence were the main terms. Results Fifteen studies were included. A total of 3,057,329 colorectal cancer (CRC) health records, including those of adult patients older than 18, were used to generate the results. The curve's area under the curve ranged from 0.704 to 0.976. Logistic regression, random forests, and colon flag were often employed techniques. Overall, these trials provide a considerable and accurate CRC risk prediction. Conclusion An up-to-date summary of recent research on the use of big data in CRC prediction was given. Future research can be facilitated by the review's identification of gaps in the literature. Missing data, a lack of external validation, and the diversity of machine learning algorithms are the current obstacles. Despite having a sound mathematical definition, area under the curve application depends on the modelling context.
Keyphrases
- big data
- machine learning
- artificial intelligence
- meta analyses
- systematic review
- deep learning
- risk assessment
- public health
- healthcare
- case control
- human health
- high resolution
- randomized controlled trial
- electronic health record
- physical activity
- mass spectrometry
- neural network
- current status
- drug induced
- clinical practice
- health promotion