Login / Signup

A Decentralized Architecture for Trusted Dataset Sharing Using Smart Contracts and Distributed Storage.

Miguel PincheiraElena DoniniMassimo VecchioSalil S Kanhere
Published in: Sensors (Basel, Switzerland) (2022)
The data economy is based on data and information sharing and tremendously impacts society as it facilitates innovative collaborations and decision-making strategies. Nonetheless, most dataset-sharing solutions rely on a centralized authority that rules data ownership, availability, and accessibility. Recent works have explored the integration of distributed storage and blockchain to enhance decentralization, data access, and smart contracts for automating the interactions between actors and data. However, current solutions propose a smart contract design limiting the system's scalability in terms of actors and shared datasets. Furthermore, little is known about the performance of these architectures when using distributed storage instead of centralized storage approaches. This paper proposes a scalable architecture called DeBlock for data sharing in a trusted way among unreliable actors. The architecture integrates a public blockchain that provides a transparent record of datasets and interactions, with a distributed storage for data storage in a completely decentralized way. Furthermore, the architecture provides a smart-contract design for a transparent catalog of datasets, actors, and interactions with efficient search and retrieval capabilities. To assess the system's feasibility, robustness, and scalability, we implement a prototype using the Ethereum blockchain and leveraging two decentralized storage protocols, Swarm and IPFS. We evaluate the performance of our proposed system in different scenarios (e.g., varying the amount and size of the shared datasets). Our results demonstrate that our proposal outperforms benchmarks in gas consumption, latency, and resource requirements, especially when increasing the number of actors and shared datasets.
Keyphrases
  • electronic health record
  • big data
  • health information
  • social media
  • decision making
  • rna seq
  • healthcare
  • machine learning
  • data analysis
  • climate change
  • deep learning
  • single cell
  • carbon dioxide