GROMACS Tutorial

Step Two: Prepare the Ligand Topology

pdb2gmx solv ions minim equil md analysis

We must now deal with the ligand. But how does one come up with parameters for some species that the force field does not automatically recognize? Proper treatment of ligands is one of the most challenging tasks in molecular simulation. Force field authors spend years of their lives deriving self-consistent force fields, and it is no small task to introduce a new species into this framework. Force field parameters for any new species must be derived and validated in a manner that is consistent with the original force field.

For the OPLS, AMBER, and CHARMM force fields, this derivation often takes the form of various quantum mechanical calculations. The primary literature for these force fields describes the required procedure. For GROMOS force fields, parameterization methodology is less clear, relying on empirical fitting of condensed-phase behavior. That is, some initial charges and Lennard-Jones parameters are calculated for each atom type, evaluated for their accuracy, and refined. While the end result is very satisfactory, i.e. fluids resemble their real-world counterparts, the derivation process can be laborious and frustrating.

For this reason, automated tools are greatly preferred. For each force field, there are methodologies or software programs that purport to give parameters compatible with various force fields. Not all of them are equally accurate. For a few examples, refer to the following table:


AMBER Antechamber Parametrizes molecules using GAFF
acpype A Python interface to Antechamber, writes GROMACS topologies

CHARMM CGenFF The official CHARMM General Force Field server

GROMOS96 ATB A newer server for topology generation, uses GROMOS96 54A7

OPLS-AA LigParGen A server from the Jorgensen group to produce OPLS topologies

Add Hydrogen Atoms to JZ4

In this tutorial, we will be generating the JZ4 topology with the CGenFF server. Click the link in the table above to visit the site. Register for a (free) account and activate it. CGenFF requires a Sybyl .mol2 file as its input, to collect rudimentary atom type information and bonded connectivity. CHARMM is also an all-atom force field, meaning all H atoms are explicitly represented. Crystal structures do not typically assign H coordinates, so they have to be built in. To produce a .mol2 file and add H atoms, use the Avogadro program. Open jz4.pdb in Avogadro, and from the "Build" menu, choose "Add Hydrogens." Avogadro will build all of the H atoms onto the JZ4 ligand. Save a .mol2 file (File -> Save As... and choose Sybyl Mol2 from the drop-down menu) named "jz4.mol2."

Several corrections must be made to jz4.mol2 before it can be used. Open jz4.mol2 in your favorite plain-text editor and you will find:

@<TRIPOS>MOLECULE
*****
 22 22 0 0 0
SMALL
GASTEIGER

@<TRIPOS>ATOM
      1 C4         24.2940  -24.1240   -0.0710 C.3   167  JZ4167     -0.0650
      2 C7         21.5530  -27.2140   -4.1120 C.ar  167  JZ4167     -0.0613
      3 C8         22.0680  -26.7470   -5.3310 C.ar  167  JZ4167     -0.0583
      4 C9         22.6710  -25.5120   -5.4480 C.ar  167  JZ4167     -0.0199
      5 C10        22.7690  -24.7300   -4.2950 C.ar  167  JZ4167      0.1200
      6 C11        21.6930  -26.4590   -2.9540 C.ar  167  JZ4167     -0.0551
      7 C12        22.2940  -25.1870   -3.0750 C.ar  167  JZ4167     -0.0060
      8 C13        22.4630  -24.4140   -1.8080 C.3   167  JZ4167     -0.0245
      9 C14        23.9250  -24.7040   -1.3940 C.3   167  JZ4167     -0.0518
     10 OAB        23.4120  -23.5360   -4.3420 O.3   167  JZ4167     -0.5065
     11 H          25.3133  -24.3619    0.1509 H       1  UNL1        0.0230
     12 H          23.6591  -24.5327    0.6872 H       1  UNL1        0.0230
     13 H          24.1744  -23.0611   -0.1016 H       1  UNL1        0.0230
     14 H          21.0673  -28.1238   -4.0754 H       1  UNL1        0.0618
     15 H          21.9931  -27.3472   -6.1672 H       1  UNL1        0.0619
     16 H          23.0361  -25.1783   -6.3537 H       1  UNL1        0.0654
     17 H          21.3701  -26.8143   -2.0405 H       1  UNL1        0.0621
     18 H          21.7794  -24.7551   -1.0588 H       1  UNL1        0.0314
     19 H          22.2659  -23.3694   -1.9301 H       1  UNL1        0.0314
     20 H          24.5755  -24.2929   -2.1375 H       1  UNL1        0.0266
     21 H          24.0241  -25.7662   -1.3110 H       1  UNL1        0.0266
     22 H          23.7394  -23.2120   -5.1580 H       1  UNL1        0.2921
@<TRIPOS>BOND
     1     4     3   ar
     2     4     5   ar
     3     3     2   ar
     4    10     5    1
     5     5     7   ar
     6     2     6   ar
     7     7     6   ar
     8     7     8    1
     9     8     9    1
    10     9     1    1
    11     1    11    1
    12     1    12    1
    13     1    13    1
    14     2    14    1
    15     3    15    1
    16     4    16    1
    17     6    17    1
    18     8    18    1
    19     8    19    1
    20     9    20    1
    21     9    21    1
    22    10    22    1

The first change that needs to be made is in the MOLECULE heading. Replace "*****" with "JZ4," e.g.:

@<TRIPOS>MOLECULE
JZ4

Next, fix the residue names and numbers such that they are all the same. This .mol2 file only contains one molecule, therefore there should only be one residue name and number specified. After editing, the corrected ATOM section of jz4.mol2 should read:

@<TRIPOS>ATOM
      1 C4         24.2940  -24.1240   -0.0710 C.3     1  JZ4        -0.0650
      2 C7         21.5530  -27.2140   -4.1120 C.ar    1  JZ4        -0.0613
      3 C8         22.0680  -26.7470   -5.3310 C.ar    1  JZ4        -0.0583
      4 C9         22.6710  -25.5120   -5.4480 C.ar    1  JZ4        -0.0199
      5 C10        22.7690  -24.7300   -4.2950 C.ar    1  JZ4         0.1200
      6 C11        21.6930  -26.4590   -2.9540 C.ar    1  JZ4        -0.0551
      7 C12        22.2940  -25.1870   -3.0750 C.ar    1  JZ4        -0.0060
      8 C13        22.4630  -24.4140   -1.8080 C.3     1  JZ4        -0.0245
      9 C14        23.9250  -24.7040   -1.3940 C.3     1  JZ4        -0.0518
     10 OAB        23.4120  -23.5360   -4.3420 O.3     1  JZ4        -0.5065
     11 H          25.3133  -24.3619    0.1509 H       1  JZ4         0.0230
     12 H          23.6591  -24.5327    0.6872 H       1  JZ4         0.0230
     13 H          24.1744  -23.0611   -0.1016 H       1  JZ4         0.0230
     14 H          21.0673  -28.1238   -4.0754 H       1  JZ4         0.0618
     15 H          21.9931  -27.3472   -6.1672 H       1  JZ4         0.0619
     16 H          23.0361  -25.1783   -6.3537 H       1  JZ4         0.0654
     17 H          21.3701  -26.8143   -2.0405 H       1  JZ4         0.0621
     18 H          21.7794  -24.7551   -1.0588 H       1  JZ4         0.0314
     19 H          22.2659  -23.3694   -1.9301 H       1  JZ4         0.0314
     20 H          24.5755  -24.2929   -2.1375 H       1  JZ4         0.0266
     21 H          24.0241  -25.7662   -1.3110 H       1  JZ4         0.0266
     22 H          23.7394  -23.2120   -5.1580 H       1  JZ4         0.2921

Last, notice the strange bond order in the @<TRIPOS>BOND section. All programs seem to have their own method for generating this list, but not all are created equal. There will be issues in constructing a correct topology with matching coordinates if the bonds are not listed in ascending order. To fix this problem, download the sort_mol2_bonds.pl script I have written (change the .txt extension to .pl) and execute it:

perl sort_mol2_bonds.pl jz4.mol2 jz4_fix.mol2

Use "jz4_fix.mol2" in the next step.

Generate the JZ4 Topology with CGenFF

The jz4_fix.mol2 file is now ready for use to produce a topology. Visit the CGenFF server, log into your account, and and click "Upload molecule" at the top of the page. Upload jz4_fix.mol2 and the CGenFF server will quickly return a topology in the form of a CHARMM "stream" file (extension .str). Save its contents from your web browser into a file called "jz4.str."

The CHARMM stream file contains all of the topology information - atom types, charges, and bonded connectivity. It also has sections for additional bonded parameters that were generated by analogy for any internal interactions not covered by the force field. CGenFF also provides penalty scores for each parameter, that is, an assessment of how reliable the assigned parameter is. Anything below 10 is considered acceptable for immediate use. Values from 10 - 50 imply that some validation of the topology is warranted, and any penalties larger than 50 generally require manual reparametrization. This penalty scoring is one of the most important features of the CGenFF server. Many other servers generate topologies and are "black boxes," which users are simply left to trust implicitly. Staking your entire research project on an automatic program without verification is very dangerous. A poor ligand topology can lead to significant wasted time and unreliable results. Always validate the topologies of newly parametrized species! At minimum, check the magnitudes of charges and the atom types assigned to the ligand against existing moieties in the force field.

Examine the contents of jz4.str and look at the penalties for the charges and the new dihedral parameters. All of them are very low, suggesting that this topology is of very good quality and can be used directly for our simulation. Download the JZ4 topology in GROMACS format, which is provided by the CGenFF server (click "Convert to GROMACS" and then "Download GROMACS format" to obtain a ZIP archive with the coordinates and topology).

The files produced by the CGenFF server will require some modification to be used directly. The server will provide a topology called "jz4_fix_gmx.top," which is a GROMACS system topology, meaning it is intended for standalone use. We will need to adapt it for our purpose. First, copy it to a file called "jz4_gmx.itp":

cp jz4_fix_gmx.top jz4_gmx.itp

Make the following modifications to "jz4_gmx.itp":

  1. Remove the #include "charmm36.ff/forcefield.itp" line
  2. Replace the #include "charmm36.ff/jz4_ffbonded.itp" line with the actual content of that file
  3. Remove all lines from "; Include water topology" to the end of the file
  4. Rename the [moleculetype] from Other to JZ4
  5. Change #include "posre.itp" to #include "posre_jz4.itp"

A summary of the changes made in "jz4_gmx.itp" are shown below, with added/modified lines in blue and deleted lines shown in red.

; Topology generated using molcal v1.0
; For use with CGenFF force-field version 4.6
;
; Include forcefield parameters
#include "./charmm36.ff/forcefield.itp"
;
; Include ligand specific parameters
# include "./charmm36.ff/jz4_ffbonded.itp"
[ dihedraltypes ]
;      i        j        k        l  func         phi0         kphi  mult
  CG2R61    CG321    CG321    CG331     9     0.000000     0.167360     3
;
[ moleculetype ]
; Name            nrexcl
JZ4                 3
;
[ atoms ]
; Include Position restraint file
#ifdef POSRES
#include "posre_jz4.itp"       
#endif    

; Include water topology
#include "./charmm36.ff/tip3p.itp"
    
#ifdef POSRES_WATER
; Position restraint for each water oxygen
[ position_restraints ]
;  i funct       fcx        fcy        fcz
   1    1       1000       1000       1000
#endif   
    
; Include topology for ions
#include "./charmm36.ff/ions.itp"
    
[ system ]
; Name    
    Ligand
    
[ molecules ]   
; Compound        #mols
Other           1

Build the Complex

From pdb2gmx, we have a file called "3HTB_processed.gro" that contains the processed, force field-compliant structure of our protein. We also have "jz4_fix_gmx.pdb" from the CGenFF server that has all of the necessary H atoms and matches the atom names in the JZ4 topology. Convert this .pdb file to .gro format with editconf:

gmx editconf -f jz4_ini.pdb -o jz4.gro

Copy 3HTB_processed.gro to a new file to be manipulated, for instance, call it "complex.gro," as the addition of JZ4 to the protein will generate our protein-ligand complex. Next, simply copy the coordinate section of jz4.gro and paste it into complex.gro, below the last line of the protein atoms, and before the box vectors, like so:

  163ASN      C 1691   0.621  -0.740  -0.126
  163ASN     O1 1692   0.624  -0.616  -0.140
  163ASN     O2 1693   0.683  -0.703  -0.011
   5.99500   5.19182   9.66100   0.00000   0.00000  -2.99750   0.00000   0.00000   0.00000

becomes (added text in bold green)...

  163ASN      C 1691   0.621  -0.740  -0.126
  163ASN     O1 1692   0.624  -0.616  -0.140
  163ASN     O2 1693   0.683  -0.703  -0.011
    1jz4     C1    1   2.429  -2.412  -0.007
    1jz4     C2    2   2.155  -2.721  -0.411
    1jz4     C3    3   2.207  -2.675  -0.533
    1jz4     C4    4   2.267  -2.551  -0.545
    1jz4     C5    5   2.277  -2.473  -0.430
    1jz4     C6    6   2.169  -2.646  -0.295
    1jz4     C7    7   2.229  -2.519  -0.308
    1jz4     C8    8   2.246  -2.441  -0.181
    1jz4     C9    9   2.392  -2.470  -0.139
    1jz4    O10   10   2.341  -2.354  -0.434
    1jz4    H11   11   2.531  -2.436   0.015
    1jz4    H12   12   2.366  -2.453   0.069
    1jz4    H13   13   2.417  -2.306  -0.010
    1jz4    H14   14   2.107  -2.812  -0.407
    1jz4    H15   15   2.199  -2.735  -0.617
    1jz4    H16   16   2.304  -2.518  -0.635
    1jz4    H17   17   2.137  -2.681  -0.204
    1jz4    H18   18   2.178  -2.476  -0.106
    1jz4    H19   19   2.227  -2.337  -0.193
    1jz4    H20   20   2.458  -2.429  -0.214
    1jz4    H21   21   2.402  -2.577  -0.131
    1jz4    H22   22   2.374  -2.321  -0.516
   5.99500   5.19182   9.66100   0.00000   0.00000  -2.99750   0.00000   0.00000   0.00000

Since we have added 22 more atoms into the .gro file, increment the second line of complex.gro to reflect this change. There should be 2636 atoms in the coordinate file now.

Build the Topology

Including the parameters for the JZ4 ligand in the system topology is very easy. The ligand introduces new dihedral parameters, which means it is essential that we include the ligand topology at the TOP of topol.top, thus insert an #include statement to add these parameters:

; Include forcefield parameters
#include "./charmm36-jul2022.ff/forcefield.itp"

; Include ligand parameters
#include "jz4_gmx.itp"

[ moleculetype ]
; Name            nrexcl
Protein_chain_A     3

The placement of this #include statement is critical - it must appear before any [ moleculetype ] entry because all parameters have to be defined before any molecules can be constructed. It must also appear AFTER the #include statement for the parent force field, because all atom types have to be known before bonded parameters can be introduced that make use of them.

The last adjustment to be made is in the [ molecules ] directive. To account for the fact that there is a new molecule in complex.gro, we have to add it here, like so:

[ molecules ]
; Compound        #mols
Protein_chain_A     1
JZ4                 1

The topology and coordinate file are now in agreement with respect to the contents of the system.

Back: pdb2gmx Next: Solvation

Site design and content copyright Justin Lemkul
Problems with the site? Send them to the Webmaster