Analysis of the Results of Metadynamics Simulations by metadynminer and metadynminer3d

Molecular simulations solve the equation of motion of molecular systems, making the 3D shapes of molecules four-dimensional by adding the time coordinate. These methods have great potential in drug discovery because they can realistically model the structures of protein molecules targeted by drugs, as well as the process of binding of a potential drug to its molecular target. However, routine application of biomolecular simulations is hampered by the very high computational costs of this method. Several methods have been developed to address this problem. One of them, metadynamics, disfavors states of the simulated system that have been already visited and thus forces the system to explore new states. Here we present the package metadynminer and metadynminer3d to analyze and visualize results from metadynamics, in particular those produced by a popular metadynamics package Plumed.

Dalibor Trapl (Department of Biochemistry and Microbiology, University of Chemistry and Technology, Prague) , Vojtech Spiwok (Department of Biochemistry and Microbiology, University of Chemistry and Technology, Prague)

1 Introduction

Molecular simulations and their pioneers Martin Karplus, Michael Levitt, and Arieh Warshel have been awarded the Nobel Prize in 2013 (Karplus 2013). Their methods, in particular the method of molecular dynamics simulation, computationally simulate the motions of atoms in a molecular system. A simulation starts from a molecular system defined by positions (Cartesian coordinates) of the individual atoms. The heart of the method is in a calculation of forces acting on individual atoms and their numerical integration in the spirit of Newtonian dynamics, i.e., the conversion of a force vector to an acceleration vector, then velocity vector and, finally, to a new position of an atom. By repeating these steps, it is possible to reconstruct a record of atomic motions known as a trajectory.

Molecular simulations have great potential in drug discovery. A molecule of drug influences (enhances or blocks) the function of some biomolecule in the patient’s body, typically a receptor, enzyme or other protein. These molecules are called drug targets. The process of design for a new drug can be significantly accelerated with knowledge of the 3D structure (Cartesian coordinates of atoms) of the target. With such knowledge, it is possible to find a “druggable” cavity in the target and a molecule that fits and favorably binds to this cavity to influence its function. Strong binding implies that the drug influences the target even in low doses, hence does not cause side effects by interacting with unwanted targets.

Experimental determination of the 3D structures of proteins and other biomolecules is a very expensive and laborious process. Molecular simulations can, at least in principle, replace such expensive and laborious experiments by computing. In principle, a molecular simulation starting from virtually any 3D shape of a molecule would end up in energetically the most favorable shape. This is analogous with water flowing from mountains to valleys and not in the opposite way.

Unfortunately, this approach is extremely computationally expensive. The integration step of a simulation must be small enough to comprise the fastest motions in the molecular system. In practical simulations, it is necessary to use femtosecond integration steps. This means that it is necessary to carry out thousands of steps to simulate picoseconds, millions of steps to simulate nanoseconds, and so forth. In each step, it is necessary to evaluate a substantial number of interactions between atoms. As a result, it is possible to routinely simulate nano- to microseconds. Longer simulations require special high-performance computing resources.

Protein folding, i.e., the transition from a quasi-random to the biologically relevant 3D structure, takes place in microseconds for very small proteins and in much longer time scales for pharmaceutically interesting proteins. For this reason, prediction of a 3D structure by molecular simulations is limited to few small and fast folding proteins. For large proteins, it is currently impossible or at least far from being routine.

Several methods have been developed to address this problem. Metadynamics (Laio and Parrinello 2002) uses artificial forces to force the system to explore states that have not been previously explored in the simulation. At the beginning of the simulation, it is necessary to chose some parameters of the system referred to as collective variables. For example, numerically expressed compactness of the protein can be used as a collective variable to accelerate its folding from a noncompact to a compact 3D structure. Metadynamics starts as a usual simulation. After a certain number of steps (typically 500), the values of the collective variables are calculated and from this moment this state becomes slightly energetically disfavored due to the addition of an artificial bias potential in the shape of a Gaussian hill. After another 500 steps, another hill is added to the bias potential and so forth. These Gaussian hills accumulate until they “flood” some energy minimum and help the system to escape this minimum and explore various other states (Figure 1). In the analogy of water floating from mountains to valleys, metadynamics adds “sand” to fill valleys to make water flow from valleys back to mountains. This makes the simulation significantly more efficient compared to a conventional simulation because the “water” does not get stuck anywhere.

Using the application of metadynamics, it is possible to significantly accelerate the process of folding. Hopefully, by the end of metadynamics we can see folded, unfolded, and many other states of the protein. However, the interpretation of the trajectory is not straightforward. In standard molecular dynamics simulation (without metadynamics), the state which is the most populated is the most populated in reality. This is not true anymore with metadynamics.

Packages metadynminer and metadynminer3d use the results of metadynamics simulations to calculate the free energy surface of the molecular system. The most favored states (states most populated in reality) correspond to minima on the free energy surface. The state with the lowest free energy is the most populated state in the reality, i.e., the folded 3D structure of the protein.

As an example to illustrate metadynamics and our package, we use an ultrasimple molecule of “alanine dipeptide” (Figure 1). This molecule can be viewed as a “protein” with just one amino acid residue (real proteins have hundreds or thousands of amino acid residues). As a collective variable it is possible to use an angle \(\phi\) defined by four atoms. Biasing of this collective variable accelerates a slow rotation around the corresponding bond. Figure 1 shows the free energy surface of alanine dipeptide as the black thick line. It is not known before the simulation. The simulation starts from the state B. After 500 simulation steps, the hill is added (the hill is depicted as the red line, the flooding potential (“sand”) at the top, the free energy surface with added flooding potential at the bottom). The sum of 1, 10, 100, 200, 500, and 700 hills are depicted as red to blue lines.

At the end of simulation the free energy surface is relatively well flattened (blue line in Fig. 1 bottom). Therefore, the free energy surface can be estimated as a negative imprint of added “sand”:

\[\begin{equation} G(s) = -kT \log(P(s)) = -V(s) = \sum_i w_i \exp(-(s-S_i)^2/{2 \sigma^2}), \tag{1} \end{equation}\]

where \(G\), \(V\), and \(P\) are free energy, metadynamics bias (flooding) potential, and probability, respectively, of a state with a collective variable \(s\), \(k\) is Boltzmann constant, \(T\) is temperature in Kelvins, \(w_i\) is height, \(S_i\) is position and \(\sigma_i\) is width of each hill. The equation can be easily generalized for two or more collective variables.