Infrared (IR) spectroscopy is a powerful analytical technique widely used for identifying molecular structures and analyzing chemical compositions across various scientific and industrial fields, including pharmaceuticals, environmental science, and materials research.
However, noise contamination remains a major challenge in IR spectral analysis, as it can distort spectral features, obscure critical absorption bands, and reduce measurement accuracy. This issue is particularly critical in high-throughput applications, where a large number of spectra must be processed rapidly without manual optimization. If not properly managed, noise can lead to misinterpretations of spectral data, compromised chemical identifications, and errors in quantitative analysis.
To ensure high-quality spectral data, effective noise reduction techniques are essential. These techniques range from hardware-based optimizations to advanced computational algorithms.
Problem of Noise in IR Analysis
Noise in IR spectroscopy originates from various sources, including instrumental limitations, environmental interferences, and sample-related variability. The most significant consequence of noise is the reduction in the signal-to-noise ratio (S/N), where S represents the true signal (typically the peak height of an absorbance feature), and N denotes the background noise, commonly quantified using RMS (root mean square) measurements. A lower S/N ratio implies that genuine absorbance peaks from chemical compounds are increasingly masked by baseline fluctuations, making it challenging to resolve subtle or low-intensity signals. As a result, this can lead to:
- Obscured spectral features, hindering accurate identification of functional groups or compounds.
- Loss of resolution and sensitivity, especially for weak peaks that may otherwise provide critical information.
- Compromised quantification, since absorbance measurements may fluctuate due to noise, affecting the calibration and accuracy of concentration calculations.
- Increased analysis time, as more scans or signal averaging may be needed to compensate for poor data quality.
Understanding these noise sources is critical for developing effective noise reduction strategies to enhance spectral interpretation and ensure precise analytical results.
Fig.1 Signal to noise ratio
Types of Noise in IR Spectroscopy
Noise in IR spectroscopy distorts spectral data, reduces resolution, and complicates quantitative analysis. The four primary types of noise—white noise, 1/f noise, shot noise, and humidity-induced noise—each impact spectra differently.
White Noise (Thermal or Johnson-Nyquist Noise)
White noise results from random thermal fluctuations in electronic components, such as the detector and amplifier. This type of noise is evenly distributed across all frequencies and appears as a random scattering of points around the baseline. It causes random variations in absorbance or transmittance values, making it difficult to detect weak absorption bands.
Example: In a weakly absorbing sample, such as a dilute gas-phase species, white noise can obscure small peaks, making spectral interpretation challenging.
1/f Noise (Flicker Noise or Baseline Drift Noise)
1/f noise, also called flicker noise, originates from slow variations in the IR light source or detector sensitivity. It is more pronounced at lower frequencies and appears as a drifting or sloping baseline in spectra. Baseline drift alters the relative intensities of peaks and can create artificial trends, leading to errors in quantitative analysis.
Example: In protein secondary structure analysis using IR, baseline drift can mimic amide I and II band shifts, misrepresenting structural changes.
Shot Noise (Photon Noise or Poisson Noise)
Shot noise arises from statistical fluctuations in photon detection. It follows a Poisson distribution, meaning its impact is greater when fewer photons reach the detector, such as in low-light conditions. This type of noise introduces erratic intensity variations, particularly in weak signal regions, making small absorption bands difficult to distinguish.
Example: In near-infrared (NIR) measurements of pharmaceutical formulations, shot noise can distort low-intensity overtones and combination bands, complicating component identification.
Humidity-Induced Noise (Water Vapor Absorption Noise)
Humidity-induced noise, often called water vapor absorption noise, arises due to the presence of atmospheric water vapor in the sample chamber or optical path. Water molecules have strong absorption bands in the mid-infrared (MIR) region, particularly near 1600 cm⁻¹ and 3700 cm⁻¹, which can overlap with sample peaks, causing signal distortion. The intensity of this noise varies with ambient humidity levels and can lead to misinterpretation of spectral features.
Example: In FTIR spectroscopy of biological tissues, water vapor absorption can overlap with critical protein and lipid bands, making it difficult to accurately analyze biomolecular compositions. Using nitrogen purging or desiccated sample chambers can help minimize this effect.
For the white (thermal) noise, shot noise, and pink (1/f) noise, the spectra represent random fluctuations commonly encountered in instrumental and electronic systems. These fluctuations do not arise from molecular absorption phenomena but instead reflect random variations in detector signals or electronic components. In contrast, the humidity-induced noise spectrum contains features that originate from the true IR absorption of atmospheric water vapor. The peaks observed in this case reflect the vibrational transitions of water molecules in the gas phase, which can appear superimposed on sample spectra during data collection. The absorbance values have physical meaning, as they are the result of actual interactions between IR radiation and interfering molecular species in the environment.
Fig.2 Types of Noise in IR Spectroscopy
Noise Reduction Techniques in IR Spectroscopy
To effectively mitigate noise, different noise reduction techniques can be classified into hardware-based, mathematical filtering, statistical methods, and machine learning-based approaches. Below is an expanded and structured discussion of these techniques, including advanced algorithms like Savitzky-Golay (SG), Wavelet Denoising (WD), Hilbert-Huang Transform (HHT), Principal Component Analysis (PCA) denoising, and deep learning-based methods.
Savitzky-Golay (SG) Smoothing
Savitzky-Golay (SG) smoothing filter is one of the most widely used denoising algorithms in IR spectroscopy due to its ability to preserve spectral features while effectively reducing noise. Unlike simple moving average filters, which can blur spectral peaks, SG smoothing applies a polynomial fitting approach over a sliding window, ensuring that key absorption bands remain sharp and well-defined.
In IR spectroscopy, SG smoothing is particularly useful when analyzing spectra with overlapping peaks, weak signals, or baseline distortions. It enhances signal clarity by reducing random noise fluctuations without significantly distorting the shape of the spectral bands. However, excessive smoothing—caused by selecting an overly large window size—can distort fine spectral structures, leading to peak broadening or loss of weak absorptions. When applied correctly, SG smoothing significantly enhances spectral resolution and improves the accuracy of chemical identification.
SG smoothing is commonly used in pharmaceutical IR spectroscopy, where small variations in spectral peaks indicate the presence of impurities or polymorphs in a drug formulation. If excessive noise obscures these variations, SG filtering can clarify subtle spectral differences, making it easier to distinguish different crystal forms or contaminants in a sample.
Wavelet Denoising (WD)
Wavelet denoising (WD) is a powerful multi-resolution signal processing technique that decomposes spectra into multiple frequency components. Unlike Fourier Transform filtering, which assumes the entire spectrum has a uniform frequency distribution, wavelet denoising can adaptively filter out high-frequency noise while preserving spectral structures at different scales.
In IR spectroscopy, WD is particularly advantageous for handling non-stationary noise, such as environmental fluctuations, baseline drifts, and detector instabilities. It provides a more localized filtering approach, making it superior for datasets where noise characteristics change across different spectral regions. By selecting appropriate wavelet families and thresholding methods, WD can retain fine details while eliminating random noise.
Wavelet denoising is widely applied in environmental IR spectroscopy, where air pollutants, such as CO₂ and SO₂, introduce unwanted absorption bands in spectra. Wavelet-based processing can remove these interferences, allowing for accurate quantification of target compounds in complex atmospheric samples.
Hilbert-Huang Transform (HHT) Denoising
Hilbert-Huang Transform (HHT) applies Empirical Mode Decomposition (EMD) to break down spectral signals into Intrinsic Mode Functions (IMFs). Unlike Fourier-based techniques, which assume a predefined basis function, HHT adaptively extracts signal components based on their intrinsic characteristics.
HHT is especially powerful for analyzing complex, nonlinear, and non-stationary noise in IR spectra. It allows for dynamic noise removal, meaning that different portions of a spectrum can be processed independently. This makes it ideal for cases where traditional filtering methods fail to differentiate between spectral features and noise artifacts. However, HHT is computationally expensive and requires careful selection of IMFs to avoid removing useful spectral information.
HHT is particularly useful in biomedical IR spectroscopy, such as analyzing blood plasma samples for disease biomarkers. IR spectra of biological fluids often contain overlapping peaks and strong baseline fluctuations, making conventional filtering ineffective. HHT can isolate the biomarker-related spectral features while suppressing noise from background proteins and lipids, leading to improved diagnostic accuracy.
Principal Component Analysis (PCA) Denoising
Principal Component Analysis (PCA) is a statistical method used for dimensionality reduction and noise suppression. PCA decomposes spectral data into principal components, identifying the most significant patterns while removing low-variance components, which are typically noise.
PCA denoising is particularly effective for large datasets with repetitive spectral patterns, such as high-throughput IR measurements in industrial quality control. By reconstructing spectra using only the most significant principal components, PCA can significantly enhance signal clarity without distorting key absorption features. However, it may not be as effective for single spectra, where the signal-to-noise ratio varies significantly.
PCA denoising is frequently used in food quality assessment using IR spectroscopy. For example, in adulteration detection in edible oils, PCA can separate authentic oil spectra from noise and contamination effects, allowing for more precise classification of fraudulent samples.
Convolutional Neural Networks (CNN) for Denoising
CNN-based denoising leverages deep learning to extract and remove complex noise patterns from IR spectra. Unlike traditional filtering methods, CNNs learn spectral features directly from large datasets, making them highly effective for real-time spectral analysis and automated high-throughput measurements.
CNNs outperform traditional techniques in handling overlapping peaks, non-uniform baseline distortions, and spectral artifacts. By training on large spectral databases, CNN-based models can generalize across different types of noise, making them adaptive to diverse experimental conditions.
CNN denoising has been successfully applied in microplastic analysis using IR spectroscopy. In this field, spectral contamination from organic matter and environmental particulates can obscure the identification of plastic particles. CNN-based denoising models can accurately reconstruct microplastic spectra, allowing for more precise classification and quantification in environmental monitoring studies.
Method | Advantages | Disadvantages | Best Use Cases |
---|---|---|---|
Savitzky-Golay (SG) Smoothing | Retains spectral features; easy to implement. | Can distort peaks if window size is too large. | General IR spectral smoothing. |
Wavelet Denoising (WD) | Adaptive to various noise types; preserves spectral structure. | Requires careful parameter tuning. | Non-stationary noise removal in IR spectra. |
Hilbert-Huang Transform (HHT) | Effective for non-linear and non-stationary noise. | Computationally expensive. | Complex IR spectra with strong distortions. |
Principal Component Analysis (PCA) Denoising | Reduces random noise and improves spectral resolution. | Less effective for single spectra. | Large datasets with repetitive spectral patterns. |
Convolutional Neural Networks (CNN) for Denoising | Superior noise removal for complex and overlapping peaks. | Requires GPU for training and processing. | High-resolution IR spectral analysis. |
MI-6 noise reduction technology
For instance, we design an automated denoising process for spectral data, where different methods are selected based on spectral characteristics. As shown in figure 3, a workflow designed to enhance spectral data through automated denoising, ultimately improving the accuracy of peak detection. The process begins with raw spectral data, which is analyzed by a model that determines the most appropriate denoising method based on the specific characteristics of each spectrum. These characteristics include parameters such as the level of noise and the full width at half maximum (FWHM) of the peaks. As shown in the leftmost heatmap, various denoising techniques—such as PCA denoising, Savitzky-Golay filtering, the Hilbert-Huang transform, and wavelet transforms—are selected dynamically depending on the spectrum’s profile. Notably, the model may also choose to apply no denoising if that is optimal. This adaptive approach ensures that the denoising strategy is tailored to the data, which is essential for preserving meaningful spectral features while minimizing noise.
The central part of the image compares the performance of peak detection before and after denoising using F1 scores, a metric that balances precision and recall. Before denoising, the data shows relatively low F1 scores across various noise levels and FWHMs, indicating less reliable peak detection. However, after applying the automated denoising process, the scores improve significantly, particularly in conditions with moderate to high noise. This improvement is visually evident in the shift from yellowish tones (low scores) to bluish tones (high scores) in the heatmaps. The comparison of peak detection performance before and after denoising shows significant improvement, with reduced noise levels (from ~2 to 0.2) and better accuracy.
On the right, two example spectra further illustrate the benefits of the process. These plots compare peak detection results before and after denoising. Green triangles indicate correctly predicted peaks (true positives), while magenta triangles show false predictions. The upper spectrum, which originally had a high noise level, exhibits cleaner and more accurate peak detection after denoising. According to the annotation, the average noise level in the dataset was reduced to 0.2, underscoring the effectiveness of the model-driven denoising approach.
Overall, this workflow shows how machine learning can be integrated with signal processing to automatically enhance data quality in spectral analysis. By intelligently choosing the right denoising method for each dataset, it ensures that critical spectral information is retained while minimizing noise, leading to more reliable and accurate downstream analysis like peak detection.
Finally, we are developing hybrid methods such as PCA combined with deep learning, which integrates dimensionality reduction with neural networks to enhance spectral clarity, while Wavelet Transform combined with CNN utilizes wavelet preprocessing to improve the accuracy of CNN-based denoising.
Fig.3 MI-6 automated denoising process
Reference
- Othman, N. (2023). IR Spectroscopy in Qualitative and Quantitative Analysis. In Infrared Spectroscopy - Perspectives and Applications. IntechOpen. https://doi.org/10.5772/intechopen.106625
- Milosevic, M. (2012). Internal Reflection and ATR Spectroscopy. Wiley & Sons, Inc. https://doi.org/10.1002/9781118309742
- Hassan, U., & Anwar, S. (2010). Reducing noise by repetition: Introduction to signal averaging. European Journal of Physics, 31, 453. https://doi.org/10.1088/0143-0807/31/3/003
- Mark, H., & Griffiths, P. (2002). Analysis of noise in Fourier transform infrared spectra. Applied Spectroscopy, 56, 633–639. https://doi.org/10.1366/0003702021955196
- T. Hakkarainen, E. Mikkola, J. Laperre, F. Gensous, P. Fardell, Y. Tallec, C. Baiocchi, K. Paul, M. McNamee, C. Deleu, and E. Metcalfe, "Smoke Gas Analysis by Fourier Transform Infrared Spectroscopy – Summary of the SAFIR Project Results," Fire and Materials, vol. 24, pp. 101–112, 2000, DOI: 10.1002/1099-1018(200003/04)24:23.3.CO;2-U.
- Sadeghi, M., & Behnia, F. (2018). Optimum window length of Savitzky-Golay filters with arbitrary order. arXiv.https://arxiv.org/abs/1808.10489
- Li, W., Li, S., Zhao, Z., & Sun, Z. (2012). Research on applications of wavelet transform threshold method to strong motion signal processing. Advanced Materials Research, 446-449, 2387–2391. https://doi.org/10.4028/www.scientific.net/AMR.446-449.2387
- Wang, H.; Ji, Y. (2018). A Revised Hilbert–Huang Transform and Its Application to Fault Diagnosis in a Rotor System. Sensors, 18(12), 4329. https://doi.org/10.3390/s18124329
- Zavyalov, V. & Bingham, G. & Wojcik, M. & Johnson, H. & Struthers, M. (2009). Application of principal component analysis to lidar data filtering and analysis. Proceedings of SPIE - The International Society for Optical Engineering. 7479. 10.1117/12.830126.
- Sun, C., Pan, M., Zhou, B., & Zhu, Z. (2018). Infrared image denoising based on convolutional neural network. 2018 13th World Congress on Intelligent Control and Automation (WCICA), 499-502. https://doi.org/10.1109/WCICA.2018.8630611