2025.02.25

Basic Introduction to Genetic Algorithms and Hints for Applications

This article explains the basic concepts and characteristics of Genetic Algorithms (GAs) inspired by nature, and discusses their applications in fields such as Materials Informatics (MI), machine learning, and simulation. Because GAs evolve solutions based solely on evaluation values without requiring derivative information, they are particularly effective for solving discontinuous, noisy, black-box problems. Moreover, their ability to evaluate the entire population in parallel enables them to approach global optima even in vast search spaces. The algorithm’s flow and operations—including selection, crossover, and mutation—are visually explained using figures and videos, making the behavior and background of GAs intuitively understandable.

She holds a Bachelor's degree in Mechanical Engineering from the Bangladesh University of Engineering and Technology (BUET). Following her undergraduate studies, she transitioned into the field of artificial intelligence, with a particular focus on the industrial applications of machine learning and deep learning techniques.
Professionally, she has worked as an AI Engineer, where she contributed to the development and standardisation of quality assurance pipelines for AI systems, ensuring robust performance from initial development through to production deployment.
In her current role as a Machine Learning Researcher, her work centres on the research and development of advanced optimisation algorithms tailored for black-box systems within the field of materials informatics. Her research aims to enhance data-driven methodologies to accelerate materials discovery and design.

Natural Optimization Algorithms Inspired by Nature
Overview of Genetic Algorithms
Characteristics of Genetic Algorithms
Applications in Materials Science
Conclusion
References

Natural Optimization Algorithms Inspired by Nature

Nature has always been a source of inspiration for humanity in solving complex problems innovatively. Designs inspired by birds have led to advances in aircraft, and those inspired by fish have influenced marine engineering. Nature’s designs often provide simple yet effective solutions to human challenges.

Genetic Algorithms (GAs) are a branch of evolutionary algorithms that draw inspiration from the powerful optimization process of evolution in nature. In evolution, species adapt and improve over generations, and the fittest individuals pass on their traits to the next generation. GAs mimic this natural selection mechanism to search for optimal solutions.

In the fields of chemistry and materials science, GAs have been applied to optimize molecular structures, design new materials, and develop catalysts, among other uses. They are also frequently employed for optimizing hyperparameters in machine learning algorithms and model parameters in computational chemistry simulations—benefiting scientists even if they are not directly aware of it.

This article primarily introduces the basic elements and characteristics of Genetic Algorithms and touches on their applicability in chemistry and materials science.

Problems Solved by Genetic Algorithms

There are many problems for which traditional methods struggle to find a solution. For example:

Problems with a large number of parameters where exploring the entire search space is difficult.
Problems that are analytically intractable.
Problems where the search space is discontinuous and prone to trapping in local optima.

In such cases, Genetic Algorithms are effectively used because:

(1) They handle a large number of candidate solutions simultaneously, efficiently covering the entire search space.
(2) When analytical solutions cannot be obtained, candidate solutions are progressively improved through adaptive operations to obtain an approximate optimum.
(3) In rugged search spaces, the random elements of mutation and crossover allow the algorithm to escape local optima.

The following sections explain, through basic concepts and procedures, how GAs achieve these capabilities.

Overview of Genetic Algorithms

Genetic Algorithms (GAs) come in various forms, but this article focuses on their common core components. After briefly introducing the main concepts, the computational procedure is explained visually.

Key Concepts

Initialization:
Before starting a GA, you determine the problem settings and necessary parameters (population size, crossover rate, mutation rate, stopping criteria, etc.), then generate the initial population.
Population:
This is the set of candidate solutions for the optimization problem. The initial population is generated using statistical methods or domain knowledge.
Individual:
Each candidate solution within the population is called an individual. The variables of each individual are referred to as genes.
For example, in Figures 1 and 2, each individual consists of six genes (six variables) with values ranging from 0 to 4.
Fitness Function:
A function that quantifies the "goodness" of a solution for the optimization problem. For example, in a maximization problem, a solution with a higher objective function value is considered to have higher fitness and is closer to the optimum.
Selection:
This is the process of choosing individuals for generating offspring in the next generation. For example, selecting based on fitness increases the likelihood that individuals with high fitness (the elite) are chosen. However, to avoid a loss of diversity, techniques like roulette wheel selection or tournament selection are used.
Genetic Operations:
To generate diverse yet adaptive offspring for the next generation, the following main operations are applied:
Crossover: Combines the genes of two parent individuals to produce new offspring (see Figure 1).
Mutation: Introduces random changes to an individual’s genes to generate new candidate solutions (see Figure 2).
Generation:
One cycle of genetic operations (selection, crossover, mutation) forms one generation. With each generation, a new population of candidate solutions is formed, and the process is repeated to find better solutions.
Convergence:
When there is no significant improvement over a certain number of generations, the candidate solutions stabilize, and the process is considered to have converged. The individual with the highest fitness is then adopted as the optimal solution.

Fig.1 Example of the "Crossover" Operation

Fig.2 Example of the "Mutation" Operation

Optimization Process

Using the concepts above, the optimization process using Genetic Algorithms (GAs) is visualized in Figure 3. The process is as follows:

Initialization:
Set up the problem and generate the initial population.
Initial Population Selection:
Select the initial candidate solutions from the available search space.
Evaluation of the Initial Population:
Compute the fitness for each individual in the initial population.
Selection:
Choose individuals based on their fitness.
Crossover:
Combine the genes of selected individuals to create offspring.
Mutation:
Apply random changes to the offspring’s genes.
Evaluation of the New Generation:
Compute the fitness for the newly generated individuals.
Generation Replacement:
Replace the old population with the new one by repeating steps 4 through 7.
Convergence:
When the population shows no significant improvement over a set number of generations, the process is considered to have converged.

The standard procedure of a Genetic Algorithm. Starting with initialization, the population's fitness is evaluated, followed by the generation of a new population through selection, crossover, and mutation. This process is repeated through successive generations until convergence is achieved, yielding the optimal solution. Such a process is utilized in simulations and parameter optimizations in fields like first-principles calculations and machine learning.

Fig.3 The Basic Procedure of a Genetic Algorithm

In this way, Genetic Algorithms effectively leverage high-scoring local optima obtained through genetic operations while simultaneously exploring for the overall optimum (global optimum). The process—where selection and crossover refine good solutions and mutation enables exploration into uncharted regions—is visually demonstrated in Video 1.

Video 1 illustrates how the initial points are set in a localized region (the bright area at the bottom left) and gradually expand outward toward the global optimum (the dark area at the center). During this process, elite individuals are selected through genetic operations, and mutation drives exploration beyond the current boundaries, enabling parallel pursuit of optimal solutions in each generation.

Here, how do the algorithm’s behaviors change when its parameters are adjusted? For instance, increasing the impact of mutation causes each individual's information to change more frequently, making it easier to escape local optima. However, this also leads to instability in searching for good solutions, and the search space may become so broad that achieving sufficient precision near the optimum becomes challenging (Video 2).

Video 2 shows that although a population initially spreads rapidly from a localized area and reaches near the global optimum, it fails to converge and instead wanders across the entire field. This visual demonstration highlights how parameter settings can significantly alter the search strategy of a GA.

Next, we summarize the characteristics of Genetic Algorithms that exhibit such behaviors. Please pardon the slightly technical explanation that follows.

Characteristics of Genetic Algorithms

1. Derivative-Free Optimization

Traditional optimization methods (e.g., gradient descent) rely on derivative information—that is, the slope or rate of change of the function. However, many real-world problems involve objective functions that are not smooth or continuous, making derivatives difficult or even meaningless to compute. A major strength of Genetic Algorithms (GAs) is that they require no derivative information, relying solely on evaluation values (fitness) to guide the search for optimal solutions.

Flexible Application: Even when functions are discontinuous or contain noise, decisions for the next generation are made solely based on evaluation outcomes, enabling GA to handle a wide range of problems.
Black-Box Optimization: When the internal structure of the objective function is unknown, as is common in experimental or simulation data, GAs can still drive the search using only numerical evaluations.
Escaping Local Optima: By evaluating a population of diverse candidate solutions in parallel, GAs are more likely to overcome local optima.

2. Parallel Exploration Ability

GAs simultaneously evaluate and evolve an entire population of candidate solutions. Since each individual is assessed independently, multiple candidates can be explored in parallel. This capability allows the efficient use of multiple computational resources (e.g., clusters, GPUs) to cover vast search spaces, significantly enhancing the speed at which global optima are approached, even for computationally expensive problems.

Cases Where Genetic Algorithms Are Not Suitable

While Genetic Algorithms (GAs) have many strengths, there are situations in which they are not the optimal choice:

High Evaluation Cost
GAs evaluate multiple candidate solutions simultaneously over many generations. If each evaluation (fitness computation) is extremely time-consuming or costly—as in expensive simulations or experiments in molecular design and material exploration—it may be impractical to run GA for many generations. In such cases, methods like Bayesian Optimization, which update a probabilistic model (e.g., Gaussian processes) to minimize the number of evaluations, can be more effective.
Smooth and Differentiable Objective Functions
Since GAs do not utilize derivative information, they are especially useful for non-smooth or noisy problems. However, if the objective function is smooth and easily differentiable, gradient-based methods (such as gradient descent or quasi-Newton methods) will typically converge faster to the optimal solution.
Requirement for Exact or Theoretically Guaranteed Solutions
GAs are heuristic methods that seek approximate solutions. In problems where an exact solution or a theoretical convergence guarantee is required (for instance, in certain mathematical programming problems), alternative methods such as integer programming or branch-and-bound techniques may be preferable.
Limited Parallel Computing Resources
GAs benefit greatly from parallel evaluation of individuals. In environments where parallel computing resources are insufficient, the inherent computational load of evaluating a large population can make GA less effective.
Rapidly Changing Objective Functions
If the objective function changes frequently (as in online optimization scenarios), the static nature of the population’s evaluation in each generation may prevent GA from fully exploiting its strengths. In such dynamic settings, online learning methods or adaptive optimization techniques might be more suitable.

In practical applications, it is essential to consider the problem’s characteristics and constraints when choosing between GA and other optimization methods, sometimes even employing hybrid approaches for the best results.

Applications in Materials Science

After explaining the basics of Genetic Algorithms (GAs), we now briefly discuss their applications in materials science. Although GAs are a general-purpose algorithm—often running behind the scenes in software used daily—here we highlight three direct examples of their use:

Molecular Design
In the introduction of AI-driven molecule generation, a Graph-Based Genetic Algorithm (GBGA) is employed to optimize molecular properties. The application of GA in molecular design is widespread because its ability to explore a vast chemical space in parallel using simulations makes it particularly well-suited for this task.
Structural Optimization / Stable Structure Search
GAs are not limited to organic molecules but are also applied to the structural exploration of inorganic materials. By employing genetic operations to generate candidate structures and evaluating them through first-principles calculations, GA efficiently searches for low-energy, stable structures. Software such as UXPEX is an example of this application.
Optimization of Descriptors and Force Field Parameters
For accurate property predictions via machine learning or for obtaining reliable simulation results, selecting the right descriptors and force field parameters is critical. For instance, parameters for SOAP (Smooth Overlap of Atomic Positions) descriptors or empirical potentials used in molecular dynamics can be effectively optimized using Genetic Algorithms (e.g. SOAP_GAS, EZFF).

Conclusion

In this article, we explained the fundamental concepts and key components of Genetic Algorithms (GAs), along with their applications in fields such as Materials Informatics, machine learning, and simulation. GAs evolve a population of candidate solutions by performing operations like selection, crossover, and mutation, gradually guiding the population toward the optimal solution. This process mirrors the trial-and-error approach seen in experimental science, where intuition and experience play crucial roles. Although experimental scientists rarely program GAs directly, understanding that such techniques underpin many machine learning and simulation tools in Materials Informatics can provide valuable insights into modern research methodologies.

Additionally, our discussion highlighted both the strengths and limitations of GAs, emphasizing that while they are robust for black-box, discontinuous problems with vast search spaces, alternative methods like Bayesian optimization or gradient-based approaches may be more suitable in other scenarios. The visual explanations using figures and videos further help to intuitively grasp the behavior and underlying principles of Genetic Algorithms.

Supplementary

The miHub® software provided by MI-6 utilizes Genetic Algorithms in multiple internal processes for deriving experimental candidate points via Bayesian optimization. By employing an advanced Genetic Algorithm, miHub® contributes to more robust analysis.

References

Holland, J. H. Adaptation in Natural and Artificial Systems; MIT Press: Cambridge, MA, 1992. https://mitpress.mit.edu/9780262581110/adaptation-in-natural-and-artificial-systems/
A classic work on the theory behind Genetic Algorithms.
Unger, R.; Moult, J. Genetic Algorithms for Protein Folding Simulations. J. Mol. Biol. 1993, 231, 75–81. https://doi.org/10.1006/jmbi.1993.1258
Demonstrates the application of Genetic Algorithms in protein folding simulations.
Neto, J. C.; Meyer, G. E.; Jones, D. D. Individual Leaf Extractions from Young Canopy Images Using Gustafson–Kessel Clustering and a Genetic Algorithm. Comput. Electron. Agric. 2006, 53 (1), 81–92. https://doi.org/10.1016/j.compag.2005.11.002
Combines clustering and GA for image-based leaf extraction.
Tang, L. L.; Tian, X.; Steward, B. L. Color Image Segmentation with Genetic Algorithm for In‐Field Weed Sensing. Trans. ASABE 2000, 43 (4), 1443–1448. https://doi.org/10.13031/2013.2970
Uses GA in agricultural image segmentation for weed detection.
Lee, C. K. H. A Review of Applications of Genetic Algorithms in Operations Management. Eng. Appl. Artif. Intell. 2018, 75, 66–75. https://doi.org/10.1016/j.engappai.2018.08.011
Reviews GA applications in operations management.
Chen, R.; Liang, C.-Y.; Hong, W.-C.; Gu, D.-X. Forecasting Holiday Daily Tourist Flow Based on Seasonal Support Vector Regression with Adaptive Genetic Algorithm. Appl. Soft Comput. 2015, 30, 133–140. https://doi.org/10.1016/j.asoc.2014.10.022
Combines adaptive GA with SVR for forecasting tourist flow.
Kavitha, A. R.; Chellamuthu, C. Brain Tumour Segmentation from MRI Image Using Genetic Algorithm with Fuzzy Initialisation and Seeded Modified Region Growing (GFSMRG) Method. J. Med. Eng. Technol. 2016, 40 (3), 125–133. https://doi.org/10.1080/13682199.2016.1178412
Proposes a GA-based method for effective brain tumor segmentation.
Ghaheri, A.; Shoar, S.; Naderan, M.; Hoseini, S. S. The Applications of Genetic Algorithms in Medicine. Oman Med. J. 2015, 30 (4), 269–274. https://doi.org/10.5001/omj.2015.82
Summarizes diverse GA applications in medicine.
Paul, C. J.; Steen, L.; Jens, S. H.; Tejs, V.; Thomas, B. Genetic Algorithms for Computational Materials Discovery Accelerated by Machine Learning. npj Comput. Mater. 2019, 5, 82. https://doi.org/10.1038/s41524-019-0181-4
Reports on accelerating materials discovery through GA coupled with machine learning.
Chiho, K.; Rohit, B.; Lihua, C.; Huan, T.; Rampi, R. Polymer Design Using Genetic Algorithm and Machine Learning. Comput. Mater. Sci. 2020, 180, 109–117. https://doi.org/10.1016/j.commatsci.2020.110067
Proposes a hybrid approach combining GA and machine learning for polymer design.
Takahiro, I.; Takashi, T.; Katsuya, S. Materials Informatics Based on Evolutionary Algorithms: Application to Search for Superconducting Hydrogen Compounds. arXiv 2019, arXiv:1908.00746. https://arxiv.org/abs/1908.00746
Explores the use of evolutionary algorithms in materials informatics for superconducting hydrogen compounds.
Bartók, A. P.; Kondor, R.; Csányi, G. On Representing Chemical Environments. Phys. Rev. B 2013, 87 (18), 184115. https://doi.org/10.1103/PhysRevB.87.184115.
Introduces SOAP descriptors, relevant for parameter optimization using GA.