The Block Copolymer Phase Behavior Database.
Nathan J RebelloAkash AroraHidenobu MochigaseTzyy-Shyang LinJiale ShiDebra J AudusEric S MuckleyArdiana OsmaniBradley D OlsenPublished in: Journal of chemical information and modeling (2024)
The Block Copolymer Database (BCDB) is a platform that allows users to search, submit, visualize, benchmark, and download experimental phase measurements and their associated characterization information for di- and multiblock copolymers. To the best of our knowledge, there is no widely accepted data model for publishing experimental and simulation data on block copolymer self-assembly. This proposed data schema with traceable information can accommodate any number of blocks and at the time of publication contains over 5400 block copolymer total melt phase measurements mined from the literature and manually curated and simulation data points of the phase diagram generated from self-consistent field theory that can rapidly be augmented. This database can be accessed via the Community Resource for Innovation in Polymer Technology (CRIPT) web application and the Materials Data Facility. The chemical structure of the polymer is encoded in BigSMILES, an extension of the Simplified Molecular-Input Line-Entry System (SMILES) into the macromolecular domain, and the user can search repeat units and functional groups using the SMARTS search syntax (SMILES Arbitrary Target Specification). The user can also query characterization and phase information using Structured Query Language (SQL) and download custom sets of block copolymer data to train machine learning models. Finally, a protocol is presented in which GPT-4, an AI-powered large language model, can be used to rapidly screen and identify block copolymer papers from the literature using only the abstract text and determine whether they have BCDB data, allowing the database to grow as the number of published papers on the World Wide Web increases. The F1 score for this model is 0.74. This platform is an important step in making polymer data more accessible to the broader community.
Keyphrases
- electronic health record
- big data
- machine learning
- healthcare
- randomized controlled trial
- systematic review
- mental health
- high throughput
- emergency department
- adverse drug
- pseudomonas aeruginosa
- data analysis
- escherichia coli
- autism spectrum disorder
- high resolution
- mass spectrometry
- drug delivery
- smoking cessation
- deep learning
- virtual reality