One Among Millions: The Chemical Space of Nucleic Acid-Like Molecules.
Henderson James CleavesChristopher J ButchPieter Buys BurgerJay GoodwinMarkus MeringerPublished in: Journal of chemical information and modeling (2019)
Biology encodes hereditary information in DNA and RNA, which are finely tuned to their biological functions and modes of biological production. The central role of nucleic acids in biological information flow makes them key targets of pharmaceutical research. Indeed, other nucleic acid-like polymers can play similar roles to natural nucleic acids both in vivo and in vitro; yet despite remarkable advances over the last few decades, much remains unknown regarding which structures are compatible with molecular information storage. Chemical space describes the structures and properties of molecules that could exist within a given molecular formula or other classification system. Using structure generation methods, we explore nucleic acid analogues within the formula ranges BC3-7H5-15O2-4 and BC3-6H5-15N1-2O0-4, where B is a recognition element (e.g., a nucleobase). Other restrictions included two obligatory points of attachment for inclusion into a linear polymer and substructures predicting chemical stability. These sets contain 86,007 (CHO) and 75,309 (CHNO) compositionally isomeric structures, representing 706,568 CHO and 454,422 CHNO stereoisomers, that diversely and densely occupy this space. These libraries point toward there being large spaces of unexplored chemistry relevant to pharmacology and biochemistry and efforts to understand the origins of life.