Login / Signup

Integrating and analyzing medical and environmental data using ETL and Business Intelligence tools.

Alejandro VillarMaría T ZarrabeitiaPablo Fdez-ArroyabeAna Santurtún
Published in: International journal of biometeorology (2018)
Processing data that originates from different sources (such as environmental and medical data) can prove to be a difficult task, due to the heterogeneity of variables, storage systems, and file formats that can be used. Moreover, once the amount of data reaches a certain threshold, conventional mining methods (based on spreadsheets or statistical software) become cumbersome or even impossible to apply. Data Extract, Transform, and Load (ETL) solutions provide a framework to normalize and integrate heterogeneous data into a local data store. Additionally, the application of Online Analytical Processing (OLAP), a set of Business Intelligence (BI) methodologies and practices for multidimensional data analysis, can be an invaluable tool for its examination and mining. In this article, we describe a solution based on an ETL + OLAP tandem used for the on-the-fly analysis of tens of millions of individual medical, meteorological, and air quality observations from 16 provinces in Spain provided by 20 different national and regional entities in a diverse array for file types and formats, with the intention of evaluating the effect of several environmental variables on human health in future studies. Our work shows how a sizable amount of data, spread across a wide range of file formats and structures, and originating from a number of different sources belonging to various business domains, can be integrated in a single system that researchers can use for global data analysis and mining.
Keyphrases
  • data analysis
  • electronic health record
  • human health
  • big data
  • healthcare
  • risk assessment
  • machine learning
  • drinking water
  • air pollution
  • health information
  • quality improvement
  • liquid chromatography