Machine Learning Directed Aptamer Search from Conserved Primary Sequences and Secondary Structures.
Javier Perez TobiaPo-Jung Jimmy HuangYuzhe DingRunjhun Saran NarayanApurva NarayanJiaying XiePublished in: ACS synthetic biology (2023)
Computer-aided prediction of aptamer sequences has been focused on primary sequence alignment and motif comparison. We observed that many aptamers have a conserved hairpin, yet the sequence of the hairpin can be highly variable. Taking such secondary structure information into consideration, a new algorithm combining conserved primary sequences and secondary structures is developed, which combines three scores based on sequence abundance, stability, and structure, respectively. This algorithm was used in the prediction of aptamers from the caffeine and theophylline selections. In the late rounds of the selections, when the libraries were converged, the predicted sequences matched well with the most abundant sequences. When the libraries were far from convergence and the sequences were deemed challenging for traditional analysis methods, this algorithm still predicted aptamer sequences that were experimentally verified by isothermal titration calorimetry. This algorithm paves a new way to look for patterns in aptamer selection libraries and mimics the sequence evolution process. It will help shorten the aptamer selection time and promote the biosensor and chemical biology applications of aptamers.