Machine learning predicts the shape of diamond nanocrystals
27-02-2026
An article by researchers from the National Centre for Nuclear Research and the Institute of High Pressures of the Polish Academy of Sciences on this topic was published in Nature Scientific Reports. The researchers used machine learning algorithms to determine the shape and structure of nanodiamond surfaces based on diffraction data, achieving an accuracy of up to 99%.
Machine learning (ML) has become a common method of information processing in various fields of science. ML methods can discover relationships that are invisible to other methods, providing a new perspective on many research problems. Understanding the properties of nanocrystalline materials based on diffraction studies requires the use of specialised software. Due to the specific nature of nanocrystals, precise information about their size, shape and atomic structure contained in diffraction data is not readily available. Complex computational procedures and models using molecular dynamics can provide this information, but due to their complexity and time-consuming nature, they are not accepted as a standard method for characterising nanomaterials.
A breakthrough in this field was brought about by the widespread use of artificial intelligence, especially machine learning methods. ML algorithms have revolutionised the field of information processing and statistical analysis. Data from experiments, as well as proven computer models, can be used to train classifiers that reveal previously unknown relationships between material properties. Early studies have shown that ML techniques such as neural networks can be successfully used to determine the size and shape of nanocrystals. In a new paper published in Nature Scientific Reports, researchers from the National Centre for Nuclear Research (NCBJ) and the Institute of High Pressures of the Polish Academy of Sciences took a closer look at this issue and presented the results for several different machine learning algorithms.
The basis for training three algorithms: Random Forest (RF), Neural Network (NN) and eXtreme Gradient Boosting (XGB), was data from simulations using the Molecular Dynamics method. The inability to use real experimental data for this purpose is due to the lack of availability of data sets of nanomaterials with precisely specified shapes and sizes. However, previous studies have shown that molecular dynamics can accurately reproduce the actual structure of nanograins. The subject of the research was diamond nanocrystals – due to the simplicity of their structure and its great similarity to many commonly used nanomaterials, such as CdSe, ZnO, SiC, and GaN. The work also served as a verification of experimental research previously conducted on real nanodiamonds.
The nanodiamond models were divided into three categories depending on their shape: one-dimensional rods, two-dimensional plates, and three-dimensional superspheres/superellipsoids. Based on these, molecular dynamics simulations were performed to obtain a crystal structure close to that found in the real world. The corresponding X-ray spectra were calculated for the obtained structures. In particular, the S(Q) structure function, which contains information about the crystal structure, was used for further analysis. – During data preparation, it turned out that there was no need to examine all values of this function, as the diffraction data contains largely redundant information. Accurate predictions can be obtained after analysing almost any selected range of values. This is important from the point of view of experimental data, which may differ significantly from theory and require limiting the values of the structure function – explains dr Kazimierz Skrobas from the Synthesis and Characterisation of Materials Division at NCBJ, the first author of the publication.
Machine learning algorithms trained using simulated data were applied to identify real nanograins of various sizes. – All algorithms were able to recognise the shape of the grains with high efficiency (96-99%), although this efficiency depended on the diffraction peaks taken into account in the X-ray spectrum. These results allowed us to conduct in-depth research on the surface of plate-shaped nanodiamonds – adds dr Kazimierz Skrobas. The algorithms were used to characterise the surface topography in terms of the presence of atoms with one or three dangling bonds. Machine learning methods were able to correctly determine the type of surface when the diffraction data did not include the 111 peak. Its presence caused errors and incorrect predictions, which may have been due to insufficiently precise experimental data.
The research group's work has shown that machine learning algorithms such as Random Forest, Artificial Neural Network, and XGB can be successfully used to characterise nanograins. The way these methods work means that, with the right reference data to train the model, they are more effective than conventional analytical tools. The consistency of the results obtained by different algorithms further confirms their validity. What is more, the research has proven that diffraction data is characterised by a high degree of information redundancy, which allows the range of values used to be limited and potentially removes areas prone to errors and lower precision without compromising the quality of the prediction.
The full results of the research are available in the publication:
Skrobas, K., Stefańska-Skrobas, K., Stelmakh, S. et al. Application of machine learning for nanodiamonds shape and surface classification based on X-ray pattern analysis. Sci Rep 15, 40304 (2025). https://doi.org/10.1038/s41598-025-24143-z