Chemical space travel on your desktop - Towards molecular discovery

The benzene ring (C6H6), consisting of six carbon atoms and six hydrogen atoms, would be one of the most common chemical structures not only for chemists. One might remember exercises in chemistry to build up the structure using plastic molecular modeling blocks. Then, guess how many structures can be built up using six carbon atoms and six hydrogen atoms? The answer is 217, if they are enumerated with considering only the topological connectivity of atoms. However, advanced quantum chemical calculations have demonstrated that the number is far more than the topological enumeration. The calculations are still going on, and the number obtained so far is more than 6,000. A QM (Quantum Mechanics)-based Molecular Discovery project in Data Centric Chemistry has been launched towards the discovery of new molecules by developing a highly accurate QM-based database of brand-new global chemical reaction networks, which are explored by using the advanced theoretical method. The scope of its potential applications expands well from the basic to applied sciences, including drug design, biochemistry, and materials science. Leading researcher Associate Professor Hiroko Satoh (National Institute of Informatics) will introduce more about the project, below.

Big data for seeking new molecules

The number of existing chemical compounds reported so far is around 90 million, including artificially synthesized compounds, and this number continues to increase approximately by 100,000-1 million every year. However, as we search for molecules using an advanced theoretical method based on quantum mechanics (QM), we have realized that the number of molecules that can theoretically exist is actually far bigger. For example, even with the constitution of a benzene ring (C6H6), which is considered to be rather small, the QM-based calculations  yield thousands of molecules. This means that the new QM-based enumeration has a potential to produce rational data with much larger size, so called, QM-based big data. Until now in chemistry, databases have been constructed almost only with experimental data, such as physical properties (e.g., boiling and melting points), molecular structural data measured via spectrometry (e.g., mass spectrometry and X-ray crystallography), and chemical reaction data observed in bench chemistry. What we are promoting in this project is to construct another type of database, namely a database of theoretical data explored through computational chemistry. Our database is designed to acquire the data automatically, while those conventional databases have been constructed by manual input.

Quantum chemistry: essential to simulate chemical reactions

Chemical computations are usually done in the basis of molecular mechanical (MM) or quantum mechanical (QM) methodologies. The MM calculations use parameters of several kinds of force between atoms, which are determined by experimental measurements or quantum mechanical calculations. Since the parameter-based calculations make fast computation possible, this method is often used in structural analysis of large-scaled molecular systems such as biomolecules like proteins. The QM method is able to calculate electronic structures based on the quantum mechanical theory. Since chemical reactions are usually occurred by bond breaking/formation, which are associated with the transfer of electrons, QM-based calculations are essential for the simulation of chemical reactions. But the QM method requires much more computational resources than MM. In practical, to obtain optimal results for a large scaled system within acceptable computation time, QM and MM methods are sometimes used in combination. There is a common problem in QM and MM calculations, which is how to explore the potential energy surface. A development of an automatic method to do the exploration is one of the most important topics in computational chemistry. In 2004, Japanese chemists at Tohoku University, Satoshi Maeda and Koichi Ono, made an important milestone on this problem by developing a new method for automatic exploration of chemical reaction pathways on a potential energy surface, called a Global Reaction Route Mapping (GRRM) method. Whereas existing methods has a limitation of the number of atoms handled up to four, the GRRM method does not have any restriction on the number of atoms. In the Data Centric Chemistry project, we collaborate with the GRRM developers and aim to construct a chemical reaction route map database using the GRRM method. Also, we will develop tools to discover new molecules as well as synthetic routes with the aid of chemoinformatics and datamining technologies. The database and tools together with visualization software will be opened to the public especially for chemists and researchers in related areas studying chemical structures and reactions.

Trail maps for making a guide to the galaxy of chemical reactions

We call the GRRM method a potential method, as contrast to a topological method, which is another way of counting molecular structures considering only of the connectivity between atoms like as a graph with dots and lines corresponding with atoms and bonds, respectively. Molecules located at a valley of its potential energy surface are stable, called equilibrium structures. When a stable molecule changes its structure by chemical reactions, it has to be activated to come over an energy barrier. For example, when a chemical bond breaking and formation are taken place in molecule A to form molecule B, the activation energy to jump over the barrier is needed. Reactions that needs higher activation energy are considered to be more hardly occur. Whereas the potential energy surface is often written in a 2D or 3D way, in fact, it is multidimensional. It is as if chemical structures widely spread in the multidimensional space. The GRRM method automatically explores routes along the potential energy curves, like walking on the hiking trails in the multidimensional chemical space, and draws a global route map of chemical reactions. Furthermore, we plan to make characterization on the explored global reaction route maps so that we develop detailed trail maps for making a guide to the galaxy of chemical reactions for a so-called chemical space travel. In addition to such a development using the chemoinformatics techniques, we will also use datamining and informatics technologies to discover and design new molecules and reactions. Speed-up of the exploration with GRRM is also one of the important goals of our project.

RMapViewer, a chemical reaction route map analysis tool

In July 2014, we opened RMapViewer to the public on the web as the fist results of this project. This is software for visualization and  analysis of the global chemical reaction route maps. As an example, when one chooses CH2O2, a network-style map corresponding to the global map for the chemical constitution explored with GRRM appears on the screen. The network-style map provides a clear visualization of the details of the global map, including trails between molecules, the stability of the molecules, and the height of the activation energies between the molecules. When the first and the second lowest molecules are visualized with a molecular model, one can see that they are two different rotational isomers of formic acid. If one chooses one of the two molecules a reactant and and the other a product, the software searches all possible pathways between them and sorts the pathways in order of likely occurrence, from an energy point of view. By choosing a reaction movie, one can see the transition of the molecules along the reaction between the two molecules with an animation of molecular model. The GRRM method has so far been applied to several kinds of chemical research, such as retrosynthetic design, analysis of reaction intermediates or cluster structures, and mechanistic study of catalytic reactions. We wish RMapViewer will help chemists do their science efficiently and promote their science. In the global reaction route maps, various molecular skeletons are suggested including unknown but theoretically possible structures. Therefore, we expect the outcomes from the project will become useful resources for a wide variety of research, such as for finding motifs for new drug candidates or for the design of new materials with desired properties, also with reference to reported experimental data.

Click here to download RMapViewer. Although only limited functions are implemented to the current version, it allows a hands-on experience in demonstrating chemical reaction route maps. We will keep releasing new various items, such as a program to convert the GRRM output file into an RMapViewer input file and a new analytical/visualization functions. We paln to make it open source by next spring. The movies from a reactant to a product give us a lot of important information about the reaction. Using the movie together with a textbook will be efficient also for chemical education. In future, we will prepare a package of reaction movies extracted from the global reaction maps, which can be downloaded by anybody who are interested in chemical reactions.

(Text in Japanese: Hiroko Satoh, Rue Ikeya. Photographs: Mitsuru Mizutani. Published: September 10, 2014)