Fuzzy Logic as a Strategy for Combining Marker Statistics to Optimize Preselection of High-Density and Sequence Genotype Data.
Ashley S LingEl Hamidi HaySamuel E AggreyRomdhane RekayaPublished in: Genes (2022)
The high dimensionality of genotype data available for genomic evaluations has presented a motivation for developing strategies to identify subsets of markers capable of increasing the accuracy of predictions compared to the current commercial single nucleotide polymorphism (SNP) chips. In this simulation study, an algorithm for combining statistics used in the preselection and prioritization of SNP markers from a high-density panel (1.3 million SNPs) into a composite "fuzzy" ranking score based on a Sugeno-type fuzzy inference system (FIS) was developed and evaluated for performance in preselection for genomic predictions. FST scores, and p -values were evaluated as inputs for the FIS. The accuracy of genomic predictions for fuzzy-score-preselected panel sizes of 1-50 k SNPs ranged from -0.4-11.7 and -0.3-3.8% higher than FST and p -value preselection, respectively. Though gains in prediction accuracies using only two inputs to the FIS were modest, preselection based on fuzzy scores yielded more accurate predictions than both FST scores and p -values for the majority of evaluated panel sizes under all genetic architectures. FIS have the potential to aggregate information from multiple criteria that reflect SNP-trait associations and biological relevance in a flexible and efficient way to yield higher quality genomic predictions.