Introduction

Polymers are a vital and innovative class of materials in materials science, with applications ranging from everyday plastic packaging to cutting-edge technologies. However, the field faces significant challenges due to the vast diversity of monomer structures, complex chain arrangements, and varied synthetic processes. As a result, polymer research often relies on inefficient trial-and-error methods, hindering innovation.

The rapid advancement of machine learning (ML) technology offers a promising solution to these challenges. Although polymer informatics encompassing datasets, feature engineering, and ML models is still in its infancy, ML shows great potential for uncovering complex relationships within high-dimensional data. This capability can dramatically accelerate the discovery of new polymer materials.

To fully leverage ML's potential, polymer researchers should learn to integrate this tool into their work effectively. By doing so, they can:

  • Deepen the understanding of polymer science 
  • Predict the properties of polymers
  • Help to design new polymers more quickly
  • Accelerate the characterization of polymers

As the field of polymer informatics matures, it will likely revolutionize polymer science, enabling more efficient and innovative approaches to material discovery.

1. Deepen the understanding of polymer science 

Molecular dynamics (MD) simulations and density functional theory (DFT) provide crucial insights that help explain experimental phenomena in polymer science. Coupled with the increasing power of supercomputers, all-atomistic MD simulations have emerged as powerful tools for analyzing complex physical phenomena in polymers. However, the complexity of polymer systems poses significant challenges for computational methods. An all-atomistic MD model for a typical polymer system would consist of billions of atoms and require billions of time steps to run. This level of computation is beyond the capabilities of even the most sophisticated supercomputers available today, highlighting the current limitations of these techniques.

Challenges

Plastic products in real-world applications are far from being "pure polymers," as they typically contain additive chemicals. While MD and DFT simulations have demonstrated significant potential in studying pure polymer systems, their application to complex plastic formulations remains limited. MD simulations have proven valuable in investigating specific phenomena, such as polymer blends and solvent interactions within pure polymers. Yet, despite this success, the development and application of MD simulation methods for studying plastic additives remain notably scarce. This gap in simulation capabilities presents a significant challenge in understanding the complete behavior of commercial plastic materials.

Our solutions

In MI-6, we employed multiple computational methods (MD and DFT)  for studying polymer-additive interactions and polymer chain behaviors. Two main usage methods:

  1. 3D Data Analysis
    Feature calculation in 3D modeling
    Prediction of copolymer phase diagrams
    Usage as 3D chemical descriptors
  2. Customized MD Simulation
    Specific application to mixture systems
    Tensile deformation behavior
    Compression deformation behavior
    Molecular infrared spectra

2. Predict the properties of polymers

The prediction and design of key physical and mechanical properties of polymeric materials based on their structural information has become an area of great interest in polymer science. One of the main approaches in this field is the application of quantitative structure-property relationship (QSPR) methods to polymers. By generating rapid predictions, these QSPR models allow researchers to explore unknown polymer systems more effectively, potentially leading to the discovery of novel polymers with desired properties. 

Challenges

Despite the promise of ML in polymer science, several challenges persist. The primary difficulty in QSPR-related studies of polymeric materials lies in the encoding of chemical structures. Polymers typically feature long chain lengths and exhibit polydispersity, making a complete characterization of their structure challenging. This complexity poses obstacles in translating polymer structures into accurate molecular graphs, which are essential for calculating the features used to describe these materials in ML models.

Our solutions

To address these challenges, the development of novel molecular descriptors that can better capture the structural complexity of polymers is an active area of research. We emphasize on integrating domain knowledge from polymer physics and chemistry into ML models to improve their accuracy and interpretability.

3. Help to design new polymers more quickly

In the realm of polymer design, significant progress has been made in developing advanced generative models to create novel monomer structures. Most molecular generative models are string-based, utilizing SMILES (Simplified Molecular Input Line Entry System) notation. These models include Recurrent Neural Networks (RNN), BERT (Bidirectional Encoder Representations from Transformers), and Variational Autoencoders (VAE). However, macromolecules like polymers present unique challenges due to their complex chemical structures. To address these limitations, graph-based decoders have been developed. A recent advancement in this area is the graph-based model, which aims to overcome the limitations of previous approaches in representing complex macromolecular structures.

Challenges

Generating an extensive pool of candidate molecules is essential to ensure the inclusion of desirable compounds. However, searching through an overly large candidate space can become prohibitively time-consuming. To address this challenge, optimization techniques play a crucial role in efficiently identifying high-scoring candidates with desired properties.

Our solutions

We use a curiosity-driven learning method which enables the integration of generative models to discover novel monomer candidates that are underrepresented in existing datasets. This approach has proven valuable in identifying new monomers that simultaneously possess high degradability and strong mechanical properties. 

4. Advancing Polymer Characterization and Data Analysis

Polymer structures exhibit inherent complexity, requiring sophisticated characterization methods such as mass spectrometry (MS), nuclear magnetic resonance (NMR), and infrared spectroscopy (IR) to determine their composition and structure. These insights significantly accelerate research and development processes while enhancing our understanding and optimization of polymer production. For example, accurate predictions of melt viscosity enable better control of extrusion processes, ensuring consistent, high-quality polymer production. In recent years, machine learning methods have emerged as powerful tools for expediting the analysis of these complex characterization data, offering new possibilities for polymer science advancement.

Challenge

The interpretation of complex spectra requires substantial domain expertise, with applications like IR spectroscopy being limited to functional groups that produce identifiable signals. Traditionally, researchers have relied on manual peak detection processes, which are both time-consuming and demand significant expertise. This limitation has highlighted the pressing need for automated peak detection tools to enhance spectral analysis efficiency. However, as peak detection capabilities have improved, a new challenge has emerged: the need to effectively analyze and interpret the extensive peak data generated from multiple spectra. 

Our solutions

To address this critical need in spectral data analysis, advanced automated systems for peak detection, peak grouping and IR-to-structure prediction have been developed. These systems harness cutting-edge ML techniques, aiming to achieve unprecedented accuracy and efficiency in analyzing spectral peaks across diverse datasets.

References

(1) Everaers, R.; Karimi-Varzaneh, H. A.; Fleck, F.; Hojdis, N.; Svaneborg, C. Kremer–Grest Models for Commodity Polymer Melts: Linking Theory, Experiment, and Simulation at the Kuhn Scale. Macromolecules 2020, 53 (6), 1901–1916. https://doi.org/10.1021/acs.macromol.9b02428.
(2) Li, B.; Wang, Z.-W.; Lin, Q.-B.; Hu, C.-Y. Molecular Dynamics Simulation of Three Plastic Additives’ Diffusion in Polyethylene Terephthalate. Food Additives & Contaminants Part A 2017, 34 (6), 1086–1099. https://doi.org/10.1080/19440049.2017.1310398.
(3) Hayashi, Y.; Shiomi, J.; Morikawa, J.; Yoshida, R. RadonPy: Automated Physical Property Calculation Using All-Atom Classical Molecular Dynamics Simulations for Polymer Informatics. npj Computational Materials 2022, 8 (1). https://doi.org/10.1038/s41524-022-00906-4.
(4) Rasulev, B.; Casanola-Martin, G. QSAR/QSPR in Polymers. International Journal of Quantitative Structure-Property Relationships 2020, 5 (1), 80–88. https://doi.org/10.4018/ijqspr.2020010105.
(5) Jin, W.; Barzilay, R.; Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv.org. https://doi.org/10.48550/arXiv.1802.04364.
(6) Yang, X.; Zhang, J.; Yoshizoe, K.; Terayama, K.; Tsuda, K. ChemTS: An Efficient Python Library for de Novo Molecular Generation. Science and Technology of Advanced Materials 2017, 18 (1), 972–976. https://doi.org/10.1080/14686996.2017.1401424.
(7) Jin, W.; Barzilay, R.; Jaakkola, T. Hierarchical Generation of Molecular Graphs using Structural Motifs. arXiv.org. https://doi.org/10.48550/arXiv.2002.03230.
(8) Sha, W.; Li, Y.; Tang, S.; Tian, J.; Zhao, Y.; Guo, Y.; Zhang, W.; Zhang, X.; Lu, S.; Cao, Y.; Cheng, S. Machine Learning in Polymer Informatics. InfoMat 2021, 3 (4), 353–361. https://doi.org/10.1002/inf2.12167.
(9) Cuthbertson, A. A.; Lincoln, C.; Miscall, J.; Stanley, L. M.; Maurya, A. K.; Asundi, A. S.; Tassone, C. J.; Rorrer, N. A.; Beckham, G. T. Characterization of Polymer Properties and Identification of Additives in Commercially Available Research Plastics. Green Chemistry 2024. https://doi.org/10.1039/D4GC00659C.