Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning.
Kevin Bretonnel CohenBenjamin GlassHansel M GreinerKatherine Holland-BouleyShannon StandridgeRavindra AryaRobert FaistDiego MoritaFrancesco ManganoBrian ConnollyTracy GlauserJohn PestianPublished in: Biomedical informatics insights (2016)
We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient's status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral.
Keyphrases
- machine learning
- electronic health record
- big data
- minimally invasive
- coronary artery bypass
- deep learning
- artificial intelligence
- drug resistant
- primary care
- clinical decision support
- surgical site infection
- autism spectrum disorder
- adverse drug
- randomized controlled trial
- risk assessment
- pseudomonas aeruginosa
- acinetobacter baumannii
- cystic fibrosis
- virtual reality
- human health
- data analysis