Login / Signup

A detailed open access model of the PubMed literature.

Kevin W BoyackCaleb SmithRichard Klavans
Published in: Scientific data (2020)
Portfolio analysis is a fundamental practice of organizational leadership and is a necessary precursor of strategic planning. Successful application requires a highly detailed model of research options. We have constructed a model, the first of its kind, that accurately characterizes these options for the biomedical literature. The model comprises over 18 million PubMed documents from 1996-2019. Document relatedness was measured using a hybrid citation analysis + text similarity approach. The resulting 606.6 million document-to-document links were used to create 28,743 document clusters and an associated visual map. Clusters are characterized using metadata (e.g., phrases, MeSH) and over 20 indicators (e.g., funding, patent activity). The map and cluster-level data are embedded in Tableau to provide an interactive model enabling in-depth exploration of a research portfolio. Two example usage cases are provided, one to identify specific research opportunities related to coronavirus, and the second to identify research strengths of a large cohort of African American and Native American researchers at the University of Michigan Medical School.
Keyphrases
  • african american
  • systematic review
  • healthcare
  • primary care
  • sars cov
  • machine learning
  • optical coherence tomography
  • electronic health record
  • big data
  • deep learning
  • high density
  • drug induced