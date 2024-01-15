Groundbreaking Machine Learning Model Predicts Regiochemical Outcomes of Complex Molecules

In a groundbreaking development, a new study has utilized machine learning (ML) to accurately predict the regiochemical outcomes of complex molecules. This was achieved by combining open source 13C nuclear magnetic resonance (NMR) data with late-stage functionalization (LSF) data. The model, which outperforms previous ML models and the Fukui function-based index in predicting reactivity, did not require pre-computed molecular properties or 3D molecular information, making it a significant leap forward in the field.

Training a Robust Model

The model was trained on data from Pfizer’s internal dataset, comprising 2,600 reactions involving 647 unique molecules and 823 LSF conditions. A wide range of reactions, including Minisci type functionalizations, P450 catalyzed oxidations, electrochemical methylations, and photoredox alkylations, were represented in the dataset. The use of a comprehensive and varied dataset contributed to the model’s robustness and versatility.

The Power of MPNN

The ML framework used in this study is a message passing neural network (MPNN), a subset of graph convolutional neural networks. The MPNN, representing molecules as graphs, was designed to input basic atomic information and trained to classify each atom in a molecule as reactive or not. This innovative approach to ML modeling in chemistry paves the way for more accurate and efficient predictions.

Automating Reactive Site Identification

An essential part of the study was addressing the challenge of automatically identifying reactive sites in molecules, crucial for regioselectivity prediction. A Glasgow Subgraph Solver was employed to automate this task, significantly enhancing the model’s utility and efficiency. The MPNN model and code have been made available on GitHub, marking this study as the first to disclose predictive LSF models trained on such a large scale dataset in the drug-like chemical space.

Facilitating Drug Synthesis

The ultimate goal of the study was to develop a rapid and accurate method to facilitate the synthesis of drug-like compounds and to expand the known chemical space for exploration. The study also addressed challenges such as the establishment of reactive site identification and the selection of appropriate loss functions for the ML model. These elements are crucial for the accurate prediction of regioselective outcomes in chemical reactions, making this study a significant step towards the advancement of drug synthesis methodologies.