Login / Signup

Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001-2017.

Daniel D LiuLanjing Zhang
Published in: Laboratory investigation; a journal of technical methods and pathology (2018)
The gene expression omnibus (GEO) is the world's largest public repository of functional genomic data. Despite its broad use in secondary genomic analyses, the temporal trends in the characteristics of genomic data on GEO, including experimental procedures, geographic origin, funder(s), and related disease, have not been examined. We identified 75,376 Series deposited to the GEO during 2001-2017 and built a database of all human genomic data (39,076 Series, 51.8% of all Series). Using the associated publications, we obtained funding information and identified the related disease area. Of the Series with classified disease areas, the two most common were cancer (n = 12,688, 32.5%) and immunologic diseases (n = 2,393, 6.1%), while the percentages of all other disease areas were below 5%, including neurological diseases (n = 1733, 4.4%), infectious diseases (n = 1225, 3.1%), diabetes (n = 828, 2.1%), and cardiovascular diseases (n = 299, 0.8%). In recent years, there has been a significant increase in the use of high-throughput sequencing (HTS), protein array and multiple-platform technologies, as well as in the proportion of North American deposits. Compared to those from other regions, North American deposits appeared to lead the shift from array-based to HTS technologies (odds ratio [OR], 95% confidence intervals [CI] = 3.39, 3.23-3.55, P = 9.40E-323), and were less likely to focus on a major disease area (OR = 0.64, 95% CI: 0.61-0.67, P = 5.02E-107), suggesting a greater emphasis on basic science in North America. Furthermore, the Series utilizing HTS were less likely to be disease-classified compared to other technologies (OR = 0.39, 95% CI: 0.37-0.41, P = 1.00E-322), suggesting a preferential use or adoption of HTS in basic science settings. Finally, funding from the NHGRI, NCI, NIEHS, and NCCR resulted in a higher number of GEO Series per grant than other NIH institutes, demonstrating different preferences on genomic studies among awardees of NIH institutes. Our findings demonstrate geographic, technological, and funding disparities in the trends of GEO deposit characteristics.
Keyphrases