PolyDAT: A Generic Data Schema for Polymer Characterization.
Tzyy-Shyang LinNathan J RebelloHaley K BeechZi WangBassil El-ZaatariDavid J LundbergJeremiah A JohnsonJulia A KalowStephen L CraigBradley D OlsenPublished in: Journal of chemical information and modeling (2021)
Polymers are stochastic materials that represent distributions of different molecules. In general, to quantify the distribution, polymer researchers rely on a series of chemical characterizations that each reveal partial information on the distribution. However, in practice, the exact set of characterizations that are carried out, as well as how the characterization data are aggregated and reported, is largely nonstandard across the polymer community. This scenario makes polymer characterization data highly disparate, thereby significantly slowing down the development of polymer informatics. In this work, a proposal on how structural characterization data can be organized is presented. To ensure that the system can apply universally across the entire polymer community, the proposed schema, PolyDAT, is designed to embody a minimal congruent set of vocabulary that is common across different domains. Unlike most chemical schemas, where only data pertinent to the species of interest are included, PolyDAT deploys a multi-species reaction network construct, in which every characterization on relevant species is collected to provide the most comprehensive profile on the polymer species of interest. Instead of maintaining a comprehensive list of available characterization techniques, PolyDAT provides a handful of generic templates, which align closely with experimental conventions and cover most types of common characterization techniques. This allows flexibility for the development and inclusion of new measurement methods. By providing a standard format to digitalize data, PolyDAT serves not only as an extension to BigSMILES that provides the necessary quantitative information but also as a standard channel for researchers to share polymer characterization data.