QuantifyPoly(A): reshaping alternative polyadenylation landscapes of eukaryotes with weighted density peak clustering.
Congting YeDanhui ZhaoWenbin YeXiaohui WuGuoli JiQingshun Q LiJuncheng LinPublished in: Briefings in bioinformatics (2022)
The dynamic choice of different polyadenylation sites in a gene is referred to as alternative polyadenylation, which functions in many important biological processes. Large-scale messenger RNA 3' end sequencing has revealed that cleavage sites for polyadenylation are presented with microheterogeneity. To date, the conventional determination of polyadenylation site clusters is subjective and arbitrary, leading to inaccurate annotations. Here, we present a weighted density peak clustering method, QuantifyPoly(A), to accurately quantify genome-wide polyadenylation choices. Applying QuantifyPoly(A) on published 3' end sequencing datasets from both animals and plants, their polyadenylation profiles are reshaped into myriads of novel polyadenylation site clusters. Most of these novel polyadenylation site clusters show significantly dynamic usage across different biological samples or associate with binding sites of trans-acting factors. Upstream sequences of these clusters are enriched with polyadenylation signals UGUA, UAAA and/or AAUAAA in a species-dependent manner. Polyadenylation site clusters also exhibit species specificity, while plants ones generally show higher microheterogeneity than that of animals. QuantifyPoly(A) is broadly applicable to any types of 3' end sequencing data and species for accurate quantification and construction of the complex and dynamic polyadenylation landscape and enables us to decode alternative polyadenylation events invisible to conventional methods at a much higher resolution.