AI-driven Elucidation of Structure–Property Relationships in Polymers

Polymers are ubiquitous in modern life, finding applications in everything from plastics and textiles to advanced materials for electronics and medicine. Their versatility arises from the diverse physical, chemical, and electrical properties they possess. These properties are determined by a complex interplay of factors, including the chemical composition of the constituent monomers, the architecture of the polymer chains and the synthesis methods employed. This complexity, while offering immense possibilities for tailoring polymers to specific applications, also presents significant challenges in polymer design and selection.

In recent years, the rapid advancement of computing power and artificial intelligence (AI) algorithms, particularly machine learning (ML), has opened up new avenues for tackling these challenges. ML techniques have proven highly effective in classification and regression tasks, enabling researchers to uncover intricate relationships between a polymer structure and its properties. In the context of polymer design, ML models can be trained to predict polymer properties based on the numerical representation of a polymer. These trained models can then be used to guide the development of new polymers with desired characteristics.

Polymer Descriptors: The Key to Unlocking ML's Potential

The numerical representation of a polymer, known as a polymer descriptor, is the crucial link between a polymer's structure and its use in ML models. Polymer descriptors aim to capture the essential structural information that determines a polymer's properties. The quality and relevance of these descriptors directly impact the accuracy and reliability of ML predictions. Therefore, the selection and development of appropriate polymer descriptors are important for the success of ML-driven polymer design.

Characteristic structural information is extracted from polymer molecular structures and converted into numerical data. Machine learning then analyses and predicts the relationship between the generated polymer descriptors and the target physical properties. By specifying desired property targets, inverse design methods identify optimal molecular structures that achieve these desired properties.

Fig.1 Image of utilization of polymer descriptors and machine learning

By combining polymer descriptors—generated from polymer structures—with machine learning techniques, it becomes possible to link complex structures and their properties via predictive models. Polymer informatics thus holds promise for inverse-designing polymer structures with specific desired physical properties using these models (Figure 1).

Polymer descriptors can be broadly classified into two main categories:

  • Monomer-level descriptors: These descriptors focus on the characteristics of individual monomers, such as the number of carbon atoms in the backbone, molecular weight, ring or linear structure, and the presence of functional groups. Monomer-level descriptors provide detailed information about the building blocks of the polymer.
  • Polymer-level descriptors: These descriptors capture larger-scale features of the polymer, such as overall physical and chemical properties. Polymer-level descriptors provide a more holistic view of the polymer structure.

Monomer-Level Descriptors

Molecular descriptors, which are also applicable to monomers, are essential tools for converting complex chemical information into numerical values that computers can process. For details, please refer to previous articles.

However, polymers present a unique challenge.  While small molecules have well-defined structures, polymers exhibit variability and polydispersity.  Therefore, instead of a single structure, a "pseudopolymer structure"—a simplified representation of the polymer's monomeric units—is used as input for descriptor calculations.  As illustrated in Figure 2, these pseudopolymer structures enable the calculation of relevant molecular descriptors.

Polymer informatics involves generating polymer units, known as pseudo-polymers, from monomer structures, converting them into molecular descriptors (such as SMILES or fingerprints), and finally inputting these into machine learning models. This enables polymer property prediction and novel material design.

Fig.2 Monomer-level descriptors using pseudopolymer structures

Polymer-Level Descriptors

Polymer properties depend significantly on structural features that are challenging to quantify numerically.  Incorporating polymer-level physicochemical descriptors can greatly improve the predictive accuracy of machine learning models.  Techniques like molecular dynamics (MD) simulations and density functional theory (DFT) provide features at various scales, enhancing both the accuracy and interpretability of ML-based physical property prediction.  Figure 3 illustrates an example of using DFT and MD simulations to derive such polymer properties.

Methods to derive polymer properties using DFT and MD as polymer descriptors. Optical and electrical properties are obtained from pseudo-polymer structures through DFT calculations. After setting MD force fields, the polymer chains' 3D structures are generated and simulated using MD to obtain thermal and mechanical properties.

FIg.3 An example of polymer-level descriptors derived from DFT and MD simulations

(1) Quantum chemical descriptors derived from DFT calculations
Starting from pseudo-polymers represented by SMILES, optical and electrical properties are calculated using Density Functional Theory (DFT) after conformation searches and related procedures. These properties can then serve as monomer-level descriptors.

(2) Descriptors based on MD force-field parameters
Next, polymer chains are generated, and their initial three-dimensional structures are prepared. Force fields (interaction potentials) used in Molecular Dynamics (MD) simulations are established at this stage. Force-field parameters—including bonded terms such as bond lengths and angles, as well as non-bonded terms such as van der Waals and electrostatic interactions—can themselves be employed as descriptors. These parameters have clearly defined physical meanings, offering high interpretability, and are reflective of microscopic polymer characteristics.

(3) Descriptors of macroscopic physical properties obtained from MD simulations
Subsequently, polymer cells are constructed and subjected to MD simulations. Various physical properties, such as thermal conductivity, specific heat capacity, and bulk modulus, can be extracted directly from simulation behaviors.

By leveraging descriptors across multiple scales obtained from polymer simulations as inputs for machine-learning-based property prediction models, it becomes possible not only to improve the accuracy of property predictions but also enhance the interpretability of results. Consequently, this approach is expected to yield new insights into the relationship between molecular behavior and macroscopic properties.

The Challenges and Ongoing Research

Despite the significant progress made in using ML for polymer design, several challenges remain. The vastness of chemical space, the complex relationship between polymer structure and properties, and the difficulty in accurately representing polymer structures numerically are all factors that complicate the application of ML in this field.

Ongoing research efforts are focused on developing new and more effective polymer descriptors, exploring novel ML algorithms, and creating larger and more comprehensive polymer databases. These efforts hold great promise for accelerating the development of new and improved polymers for a wide range of applications.

Here are some additional points to consider:

  • Descriptor Selection: Choosing the right descriptors is crucial. Researchers often generate a long list of potential descriptors and then use feature selection techniques to identify the most relevant ones.
  • Data Quality: The accuracy of ML models heavily depends on the quality of the data used for training. Polymer databases need to be accurate, comprehensive, and well-curated.
  • Interpretability: While ML models can make accurate predictions, understanding why a particular polymer has certain properties is also important. 
  • Integration with Experiments: ML can be used to guide experiments, and experimental data can be used to refine ML models. This iterative process can significantly accelerate the pace of polymer research.

By addressing these challenges and continuing to push the boundaries of both descriptor development and ML applications, the field of polymer science is poised for significant breakthroughs. ML-driven polymer design has the potential to revolutionize the way we create and utilize these essential materials, leading to more sustainable, efficient, and high-performing polymers for a wide range of applications.

This article provided only a general overview of polymer descriptors. In future articles, miLab will offer more in-depth technical introductions, so please stay tuned.

Reference

  • Zhao, Y.; Mulder, R. J.; Shadi Houshyar; Le, T. C. A Review on the Application of Molecular Descriptors and Machine Learning in Polymer Design. Polymer Chemistry 2023, 14 (29), 3325–3346. https://doi.org/10.1039/d3py00395g.
  • Gurnani, R.; Kamal, D.; Tran, H.; Sahu, H.; Scharm, K.; Ashraf, U.; Ramprasad, R. PolyG2G: A Novel Machine Learning Algorithm Applied to the Generative Design of Polymer Dielectrics. Chemistry of Materials 2021, 33 (17), 7008–7016. https://doi.org/10.1021/acs.chemmater.1c02061.
  • Hayashi, Y.; Shiomi, J.; Morikawa, J.; Yoshida, R. RadonPy: Automated Physical Property Calculation Using All-Atom Classical Molecular Dynamics Simulations for Polymer Informatics. npj Computational Materials 2022, 8 (1). https://doi.org/10.1038/s41524-022-00906-4.