PubChem generates [1-3] a theoretical 3D description of each compound in the
PubChem Compound database that is not too large (<= 50 non-hydrogen atoms), is
not too flexible (<= 15 rotatable bonds), consists of only organic elements (H,
C, N, O, F, P, S, Cl, Br, and I), has only a single covalent unit (i.e., not a
salt or a mixture), and contains only atom types recognized by the MMFF94s force
field [4-5]. At the time of launch, this includes more than 17 million of the
19.5 million records (+87%) in the PubChem Compound database. Considering only
the parent forms of salts are considered for 3D (e.g., acetic acid not sodium
acetate) and that nearly 0.5 million compounds have a parent with 3D
information, ~90% of PubChem Compound has 3D information.
Each provided theoretical 3D conformer is not at an energy minimum and may not
represent the lowest energetic form in vacuum, solvent, or a binding pocket.
Rather, the theoretical 3D description is a low energy conformer selected from a
conformer model (a theoretical description of the conformational flexibility of
a chemical structure consisting of multiple 3D representations or poses sampled
using an RMSD {root mean squared distance} threshold) describing
energetically-accessible and (potentially) biologically relevant conformations
of a chemical structure. More details on aspects of the methodology used may be
found here:
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/presentations/PubChem3D_pt1.pdf
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/presentations/PubChem3D_pt2.pdf
While a conformer model consisting of up to 500 conformers per compound may
be created, only a single conformer per compound is being provided at this time.
There are over 1.7 billion conformers for the 17 million compound records with
3D information (over +1.5 TB in size). Disseminating such a large amount of
information in bulk download format is not currently feasible. Additionally,
neighboring and clustering using multiple conformers per compound makes
visualization and interpretation per compound difficult at best. Therefore, only
a single conformer per compound is being provided or utilized in the initial
PubChem3D release. Bulk download of 3D conformers included in this release are
found on the PubChem FTP site:
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/
There are four Entrez indexes available to query compound records based on 3D
information. These are “Volume3D”, “XStericQuadrupole3D”, “YStericQuadrupole3D”,
and “ZStericQuadrupole3D”. The index “Volume3D” provides the ability to query
the single 3D conformer per compound selected as a part of this release by
volume or volume range, e.g., “0:200[volume3d]” will return those compounds with
a 3D volume in the range 0-200 Angstroms**3. The steric quadrupoles essentially
correspond to the extents of the compound, where X, Y, and Z correspond to the
length, width, and height. For example, to find very long, near-linear
compounds, one may give the PubChem Compound Entrez query “50:100[x3d] AND
0:1[y3d] AND 0:1[z3d]”.
The integrated 3D information may be viewed for each compound record by clicking
on the “3D” tab now available in the Compound Summary page. This displays an
image of the 3D structure of a compound conformer. There is a helper application
that may be downloaded and installed on your PC, Mac, or Linux computer.
Alternatively, there is a web-based viewer that animates (via a series of
images) the 3D structure of the molecule:
http://pubchem.ncbi.nlm.nih.gov/vw3d/vw3d.cgi
PubChem neighbors conformers by similarity [6] taking into account shape and
features. This is indicated by “Similar Conformers” for each compound record
with a 3D description. The neighboring relationships may be visualized in the
form of an overlay of a compound (known as the reference conformer) with its
similar conformer neighbor (known as the fit conformer). The shape aligned
overlay of neighbored compounds may be downloaded and visualized using the
aforementioned web-based viewer or helper application.
Download of the 3D information from either the PubChem Download Facility,
Compound Summary page or FTP site includes new 3D properties, including MMFF
partial charges, volume, steric quadrupoles moments, steric octopole moments,
and MMFF94 energy (with coulombic terms removed). All 3D conformer data download
is separated from the traditional 2D information provided by PubChem. In the
PubChem Download Facility, there is now a 3D check box to indicate 3D
information is desired. Similarily, download of information from the Compound
Summary provides a choice between 2D and 3D information.
For further assistance, please contact info@ncbi.nlm.nih.gov.
Thank you,
- the PubChem team
:-= Bibliography =-:
[1] Omega, version 2.1. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA:
2006.
[2] Omega, version 2.2. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA:
2007.
[3] Omega, version 2.3. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA:
2008.
[4] Halgren TA. Merck Molecular Force Field: I. Basis, Form, Scope,
Parameterization and Performance of MMFF94. J. Comp. Chem. 1996;17:490-519.
[5] Halgren TA. Merck Molecular Force Field: VI. MMFF94s Option for Energy
Minimization Studies. J. Comp. Chem. 1999;20:720-729.
[6] OEShape, version 1.7.0. OpenEye Scientific Software, Inc.; Santa Fe, NM,
USA: 2008.