Login / Signup

Results and student perspectives on a web-scraping assignment from Utah State University's data technologies course to evaluate the African activity in the statistical computing community.

Adelyn FlemingJoanna D ColtrinJhonatan MedriCody HilyardRigoberto TellezJürgen Symanzik
Published in: Computational statistics (2022)
In 2019, members of the Executive Committee of the International Association for Statistical Computing (IASC) were contacted by members of the IASC from Africa asking whether it would be feasible to establish a new regional IASC section in Africa. The establishment of a new regional section requires several steps that are outlined in the IASC Statutes at https://iasc-isi.org/statutes/. The approval likely depends on whether the proposed new regional section has the potential to conduct typical section activities, such as organizing regional conferences, workshops, and short courses. To establish whether it is feasible to add a regional section in Africa, the IASC must know whether there is currently enough high-level activity within African countries with respect to computational statistics. To answer this question, we looked at author affiliations of articles published in the Springer journal Computational Statistics (COST) and the Elsevier journal Computational Statistics & Data Analysis (CSDA) from 2015 to 2020 and used these data as a proxy to compare author productivity for authors with an affiliation in Africa in 2019 and 2020, compared to authors with an affiliation in Latin America in 2015 and 2016. This article looks at quantitative results to the questions above, provides insight on how students from Utah State University's STAT 5080/6080 "Data Technologies" course from the Fall 2019 semester used web scraping techniques in a homework assignment to gather author affiliations from COST and CSDA to answer these questions, and includes the evaluation of student feedback obtained after the end of the course.
Keyphrases
  • data analysis
  • electronic health record
  • big data
  • healthcare
  • mental health
  • randomized controlled trial
  • high school
  • machine learning
  • artificial intelligence