GROMACS Tutorial

Step Two: Prepare the Ligand Topology

We must now deal with the ligand. But how does one come up with parameters for some species that the force field does not automatically recognize? Proper treatment of ligands is one of the most challenging tasks in molecular simulation. Force field authors spend years of their lives deriving self-consistent force fields, and it is no small task to introduce a new species into this framework. Force field parameters for any new species must be derived and validated in a manner that is consistent with the original force field.

For the OPLS, AMBER, and CHARMM force fields, this derivation often takes the form of various quantum mechanical calculations. The primary literature for these force fields describes the required procedure. For GROMOS force fields, parameterization methodology is less clear, relying on empirical fitting of condensed-phase behavior. That is, some initial charges and Lennard-Jones parameters are calculated for each atom type, evaluated for their accuracy, and refined. While the end result is very satisfactory, i.e. fluids resemble their real-world counterparts, the derivation process can be laborious and frustrating.

For this reason, automated tools are greatly preferred. For each force field, there are methodologies or software programs that purport to give parameters compatible with various force fields. Not all of them are equally accurate. For a few examples, refer to the following table:


AMBER Antechamber Parametrizes molecules using GAFF
acpype A Python interface to Antechamber, writes GROMACS topologies

CHARMM CGenFF The official CHARMM General Force Field server

GROMOS87/GROMOS96 PRODRG 2.5 An automated server for topology generation
ATB A newer server for topology generation, uses GROMOS96 54A7

OPLS-AA LigParGen A server from the Jorgensen group to produce OPLS topologies

Add Hydrogen Atoms to JZ4

In this tutorial, we will be generating the JZ4 topology with the CGenFF server. Click the link in the table above to visit the site. Register for a (free) account and activate it. CGenFF requires a Sybyl .mol2 file as its input, to collect rudimentary atom type information and bonded connectivity. CHARMM is also an all-atom force field, meaning all H atoms are explicitly represented. Crystal structures do not typically assign H coordinates, so they have to be built in. To produce a .mol2 file and add H atoms, use the Avogadro program. Open jz4.pdb in Avogadro, and from the "Build" menu, choose "Add Hydrogens." Avogadro will build all of the H atoms onto the JZ4 ligand. Save a .mol2 file (File -> Save As... and choose Sybyl Mol2 from the drop-down menu) named "jz4.mol2."

Several corrections must be made to jz4.mol2 before it can be used. Open jz4.mol2 in your favorite plain-text editor and you will find:

@<TRIPOS>MOLECULE
*****
 22 22 0 0 0
SMALL
GASTEIGER

@<TRIPOS>ATOM
      1 C4         24.2940  -24.1240   -0.0710 C.3   167  JZ4167     -0.0650
      2 C7         21.5530  -27.2140   -4.1120 C.ar  167  JZ4167     -0.0613
      3 C8         22.0680  -26.7470   -5.3310 C.ar  167  JZ4167     -0.0583
      4 C9         22.6710  -25.5120   -5.4480 C.ar  167  JZ4167     -0.0199
      5 C10        22.7690  -24.7300   -4.2950 C.ar  167  JZ4167      0.1200
      6 C11        21.6930  -26.4590   -2.9540 C.ar  167  JZ4167     -0.0551
      7 C12        22.2940  -25.1870   -3.0750 C.ar  167  JZ4167     -0.0060
      8 C13        22.4630  -24.4140   -1.8080 C.3   167  JZ4167     -0.0245
      9 C14        23.9250  -24.7040   -1.3940 C.3   167  JZ4167     -0.0518
     10 OAB        23.4120  -23.5360   -4.3420 O.3   167  JZ4167     -0.5065
     11 H          25.3133  -24.3619    0.1509 H       1  UNL1        0.0230
     12 H          23.6591  -24.5327    0.6872 H       1  UNL1        0.0230
     13 H          24.1744  -23.0611   -0.1016 H       1  UNL1        0.0230
     14 H          21.0673  -28.1238   -4.0754 H       1  UNL1        0.0618
     15 H          21.9931  -27.3472   -6.1672 H       1  UNL1        0.0619
     16 H          23.0361  -25.1783   -6.3537 H       1  UNL1        0.0654
     17 H          21.3701  -26.8143   -2.0405 H       1  UNL1        0.0621
     18 H          21.7794  -24.7551   -1.0588 H       1  UNL1        0.0314
     19 H          22.2659  -23.3694   -1.9301 H       1  UNL1        0.0314
     20 H          24.5755  -24.2929   -2.1375 H       1  UNL1        0.0266
     21 H          24.0241  -25.7662   -1.3110 H       1  UNL1        0.0266
     22 H          23.7394  -23.2120   -5.1580 H       1  UNL1        0.2921
@<TRIPOS>BOND
     1     4     3   ar
     2     4     5   ar
     3     3     2   ar
     4    10     5    1
     5     5     7   ar
     6     2     6   ar
     7     7     6   ar
     8     7     8    1
     9     8     9    1
    10     9     1    1
    11     1    11    1
    12     1    12    1
    13     1    13    1
    14     2    14    1
    15     3    15    1
    16     4    16    1
    17     6    17    1
    18     8    18    1
    19     8    19    1
    20     9    20    1
    21     9    21    1
    22    10    22    1

The first change that needs to be made is in the MOLECULE heading. Replace "*****" with "JZ4," e.g.:

@<TRIPOS>MOLECULE
JZ4

Next, fix the residue names and numbers such that they are all the same. This .mol2 file only contains one molecule, therefore there should only be one residue name and number specified. After editing, the corrected ATOM section of jz4.mol2 should read:

@<TRIPOS>ATOM
      1 C4         24.2940  -24.1240   -0.0710 C.3     1  JZ4        -0.0650
      2 C7         21.5530  -27.2140   -4.1120 C.ar    1  JZ4        -0.0613
      3 C8         22.0680  -26.7470   -5.3310 C.ar    1  JZ4        -0.0583
      4 C9         22.6710  -25.5120   -5.4480 C.ar    1  JZ4        -0.0199
      5 C10        22.7690  -24.7300   -4.2950 C.ar    1  JZ4         0.1200
      6 C11        21.6930  -26.4590   -2.9540 C.ar    1  JZ4        -0.0551
      7 C12        22.2940  -25.1870   -3.0750 C.ar    1  JZ4        -0.0060
      8 C13        22.4630  -24.4140   -1.8080 C.3     1  JZ4        -0.0245
      9 C14        23.9250  -24.7040   -1.3940 C.3     1  JZ4        -0.0518
     10 OAB        23.4120  -23.5360   -4.3420 O.3     1  JZ4        -0.5065
     11 H          25.3133  -24.3619    0.1509 H       1  JZ4         0.0230
     12 H          23.6591  -24.5327    0.6872 H       1  JZ4         0.0230
     13 H          24.1744  -23.0611   -0.1016 H       1  JZ4         0.0230
     14 H          21.0673  -28.1238   -4.0754 H       1  JZ4         0.0618
     15 H          21.9931  -27.3472   -6.1672 H       1  JZ4         0.0619
     16 H          23.0361  -25.1783   -6.3537 H       1  JZ4         0.0654
     17 H          21.3701  -26.8143   -2.0405 H       1  JZ4         0.0621
     18 H          21.7794  -24.7551   -1.0588 H       1  JZ4         0.0314
     19 H          22.2659  -23.3694   -1.9301 H       1  JZ4         0.0314
     20 H          24.5755  -24.2929   -2.1375 H       1  JZ4         0.0266
     21 H          24.0241  -25.7662   -1.3110 H       1  JZ4         0.0266
     22 H          23.7394  -23.2120   -5.1580 H       1  JZ4         0.2921

Last, notice the strange bond order in the @<TRIPOS>BOND section. All programs seem to have their own method for generating this list, but not all are created equal. There will be issues in constructing a correct topology with matching coordinates if the bonds are not listed in ascending order. To fix this problem, download the sort_mol2_bonds.pl script I have written and execute it:

perl sort_mol2_bonds.pl jz4.mol2 jz4_fix.mol2

Use "jz4_fix.mol2" in the next step.

Generate the JZ4 Topology with CGenFF

The jz4_fix.mol2 file is now ready for use to produce a topology. Visit the CGenFF server, log into your account, and and click "Upload molecule" at the top of the page. Upload jz4_fix.mol2 and the CGenFF server will quickly return a topology in the form of a CHARMM "stream" file (extension .str). Save its contents from your web browser into a file called "jz4.str." You can also download a copy of this file here.

The CHARMM stream file contains all of the topology information - atom types, charges, and bonded connectivity. It also has sections for additional bonded parameters that were generated by analogy for any internal interactions not covered by the force field. CGenFF also provides penalty scores for each parameter, that is, an assessment of how reliable the assigned parameter is. Anything below 10 is considered acceptable for immediate use. Values from 10 - 50 imply that some validation of the topology is warranted, and any penalties larger than 50 generally require manual reparametrization. This penalty scoring is one of the most important features of the CGenFF server. Many other servers generate topologies and are "black boxes," which users are simply left to trust implicitly. Staking your entire research project on an automatic program without verification is very dangerous. A poor ligand topology can lead to significant wasted time and unreliable results. Always validate the topologies of newly parametrized species! At minimum, check the magnitudes of charges and the atom types assigned to the ligand against existing moieties in the force field.

Examine the contents of jz4.str and look at the penalties for the charges and the new dihedral parameters. All of them are very low, suggesting that this topology is of very good quality and can be used directly for our simulation.

The JZ4 topology in CHARMM format is all well and good, but it's not useful if we are trying to run our simulation in GROMACS. Download a suitable version of the cgenff_charmm2gmx.py script from this GitHub site. The script name will include the version, _py2 or _py3, but here for simplicity this information is omitted but you will need to use the actual name of the script that you downloaded for your Python version. Perform the conversion with:

python cgenff_charmm2gmx.py JZ4 jz4_fix.mol2 jz4.str charmm36-jul2022.ff

PLEASE NOTE: This script requires NetworkX package, and there are very specific versions that have been tested. Please see the above-linked GitHub site for version recommendations and the combinations of Python and NetworkX that have been tested. Deviations from these versions may lead to syntax issues that may cause the scripts to fail.

NOTE the following screen output at the end of the successful conversion:

============ DONE ============
Conversion complete.
The molecule topology has been written to jz4.itp
Additional parameters needed by the molecule are written to jz4.prm, which needs to be included in the system .top
============ DONE ============

This ligand introduces new bonded parameters that are not part of the existing force field, and these parameters are written to a file called "jz4.prm," which is in the format of a GROMACS .itp file. We will deal with this file shortly, but it is important to note its existence. The ligand topology is now written to "jz4.itp," which contains the ligand [ moleculetype ] definition.

Build the Complex

From pdb2gmx, we have a file called "3HTB_processed.gro" that contains the processed, force field-compliant structure of our protein. We also have "jz4_ini.pdb" from cgenff_charmm2gmx.py that has all of the necessary H atoms and matches the atom names in the JZ4 topology. Convert this .pdb file to .gro format with editconf:

gmx editconf -f jz4_ini.pdb -o jz4.gro

Copy 3HTB_processed.gro to a new file to be manipulated, for instance, call it "complex.gro," as the addition of JZ4 to the protein will generate our protein-ligand complex. Next, simply copy the coordinate section of jz4.gro and paste it into complex.gro, below the last line of the protein atoms, and before the box vectors, like so:

  163ASN      C 1691   0.621  -0.740  -0.126
  163ASN     O1 1692   0.624  -0.616  -0.140
  163ASN     O2 1693   0.683  -0.703  -0.011
   5.99500   5.19182   9.66100   0.00000   0.00000  -2.99750   0.00000   0.00000   0.00000

becomes (added text in bold green)...

  163ASN      C 1691   0.621  -0.740  -0.126
  163ASN     O1 1692   0.624  -0.616  -0.140
  163ASN     O2 1693   0.683  -0.703  -0.011

    1JZ4     C4    1   2.429  -2.412  -0.007
    1JZ4     C7    2   2.155  -2.721  -0.411
    1JZ4     C8    3   2.207  -2.675  -0.533
    1JZ4     C9    4   2.267  -2.551  -0.545
    1JZ4    C10    5   2.277  -2.473  -0.430
    1JZ4    C11    6   2.169  -2.646  -0.295
    1JZ4    C12    7   2.229  -2.519  -0.308
    1JZ4    C13    8   2.246  -2.441  -0.181
    1JZ4    C14    9   2.392  -2.470  -0.139
    1JZ4    OAB   10   2.341  -2.354  -0.434
    1JZ4     H1   11   2.531  -2.436   0.015
    1JZ4     H2   12   2.366  -2.453   0.069
    1JZ4     H3   13   2.417  -2.306  -0.010
    1JZ4     H4   14   2.107  -2.812  -0.407
    1JZ4     H5   15   2.199  -2.735  -0.617
    1JZ4     H6   16   2.304  -2.518  -0.635
    1JZ4     H7   17   2.137  -2.681  -0.204
    1JZ4     H8   18   2.178  -2.476  -0.106
    1JZ4     H9   19   2.227  -2.337  -0.193
    1JZ4    H10   20   2.458  -2.429  -0.214
    1JZ4    H11   21   2.402  -2.577  -0.131
    1JZ4    H12   22   2.374  -2.321  -0.516

   5.99500   5.19182   9.66100   0.00000   0.00000  -2.99750   0.00000   0.00000   0.00000

Since we have added 22 more atoms into the .gro file, increment the second line of complex.gro to reflect this change. There should be 2636 atoms in the coordinate file now.

Build the Topology

Including the parameters for the JZ4 ligand in the system topology is very easy. Just insert a line that says #include "jz4.itp" into topol.top after the position restraint file is included. The inclusion of position restraints indicates the end of the "Protein" moleculetype section.

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

; Include water topology
#include "./charmm36-jul2022.ff/tip3p.itp"

becomes...

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

; Include ligand topology
#include "jz4.itp"

; Include water topology
#include "./charmm36-jul2022.ff/tip3p.itp"

The ligand introduces new dihedral parameters, which were written to "jz4.prm" by the cgenff_charmm2gmx.py script. At the TOP of topol.top, insert an #include statement to add these parameters:

; Include forcefield parameters
#include "./charmm36-jul2022.ff/forcefield.itp"

; Include ligand parameters
#include "jz4.prm"

[ moleculetype ]
; Name            nrexcl
Protein_chain_A     3

The placement of this #include statement is critical - it must appear before any [ moleculetype ] entry because all parameters have to be defined before any molecules can be constructed. It must also appear AFTER the #include statement for the parent force field, because all atom types have to be known before bonded parameters can be introduced that make use of them.

The last adjustment to be made is in the [ molecules ] directive. To account for the fact that there is a new molecule in complex.gro, we have to add it here, like so:

[ molecules ]
; Compound        #mols
Protein_chain_A     1
JZ4                 1

The topology and coordinate file are now in agreement with respect to the contents of the system.

Back: pdb2gmx Next: Solvation

Site design and content copyright Justin Lemkul
Problems with the site? Send them to the Webmaster