Exploring automatic inconsistency detection for literature-based gene ontology annotation.
Jiyu ChenBenjamin GoudeyJustin ZobelNicholas GeardKarin M VerspoorPublished in: Bioinformatics (Oxford, England) (2022)
We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.