PubChem3D Release Notes

March 2011

Release Notes Feburary 2009

The PubChem3D project computes a theoretical 3-D description of PubChem Compound records. For more details, see our JChemInf thematic series:

PubChem generates [1-3] a theoretical 3-D description of each compound in the PubChem Compound database that is not too large (<= 50 non-hydrogen atoms), is not too flexible (<= 15 rotatable bonds), consists of only organic elements (H, C, N, O, F, Si, P, S, Cl, Br, and I), is a single covalent unit (i.e., not a salt or a mixture), and contains only atom types recognized by the MMFF94s force field [4-5].  Currently, this includes more than 28.5 million of the 32.0 million records (+89%) in the PubChem Compound database.  Considering only the parent forms of salts are considered for 3-D (e.g., acetic acid not sodium acetate) and that more than 0.8 million compounds have a parent with 3-D information, 92% of PubChem Compound may be considered to have 3-D information.

Each provided theoretical 3-D conformer is not at an energy minimum and may not represent the lowest energetic form in vacuum, solvent, or a binding pocket.  Rather, the theoretical 3-D description consists of low energy conformers selected from a conformer model (a description of the conformational flexibility of a chemical structure consisting of multiple 3-D representations or poses sampled using average atom pair- wise RMSD {root mean squared distance} threshold) describing energetically-accessible and (potentially) biologically relevant conformations of a chemical structure.

A conformer model consisting of up to 500 conformers per compound is allowed.  The average is currently ~110.  This count is much to large to handle routinely.  As such, a diverse conformer ordering is now available.  The diverse ordering is such that the first "N" conformers selected represent the overall diversity of the conformer model for a compound.  This allows one to select the degree of coverage that is computationally feasible while ensuring maximal coverage of the shape and feature diversity present for the compound.

Available 3-D aware tools, including the structure clustering, download facility, score matrix service, and the PubChem 3-D viewer, allow a range of diverse conformers to be used.  The PubChem FTP site provides either one or ten diverse conformers per compound:

An available "Similar Conformers" neighboring relationship uses multiple conformers.  At this time, only the three most diverse conformers are being used; however, additional diverse conformers per compound may be added.

There are multiple Entrez indexes available to query compound records based on 3-D information.  These include the compound level properties:

ConformerCount3D - Count of conformers per compound
ConformerModelRmsd3D - Minimum RMSD difference of conformers per compound
EffectiveRotorCount3D - Rotatable bond count considering ring flexibility
FeatureCount3D - Total count of features per compound [6-8]
FeatureAcceptorCount3D - Count of acceptor features per compound [6-8]
FeatureAnionCount3D - Count of anion features per compound [6-8]
FeatureCationCount3D - Count of cation features per compound [6-8]
FeatureDonorCount3D - Count of donor features per compound [6-8]
FeatureHydrophobeCount3D - Count of hydrophobe features per compound [6-8]
FeatureRingCount3D - Count of ring features per compound [6-8]

And conformer level properties [6-8] (using the default conformer only):
Volume3D - Conformer analytic volume
XStericQuadrupole3D - Steric quadrupole roughly corresponding to length
YStericQuadrupole3D - Steric quadrupole roughly corresponding to width
ZStericQuadrupole3D - Steric quadrupole roughly corresponding to height

E.g., to find compounds with a default conformer in the range 0-200 Angstroms**3: "0:200[volume3d]"

E.g., to find very long, near-linear compounds: "50:100[x3d] AND 0:1[y3d] AND 0:1[z3d]" Term=50%5BXStericQuadrupole3D%5D+%3A+100%5BXStericQuadrupole3D%5D+AND +0%5BYStericQuadrupole3D%5D+%3A+1%5BYStericQuadrupole3D%5D+AND+0%5BZStericQuadrupole3D%5D +%3A+1%5BZStericQuadrupole3D%5D

The integrated 3-D information may be viewed for each compound record by clicking on the "3D" tab available in the Compound Summary page.  This displays an image of the 3-D structure of a compound conformer.  If one clicks this 3-D image, you a prompted to view in PubChem3D helper application or the PubChem3D web-based viewer.

The helper application for your PC, Mac, or Linux computer can be found here:

The web-based viewer that animates (via a series of images) the 3-D structure of the molecule can be found here:

PubChem neighbors conformers by similarity [6-8] taking into account shape and features. This is indicated by Similar Conformers for each compound record with a 3-D description. The neighboring relationships may be visualized in the form of an overlay of a compound (known as the reference conformer) with its similar conformer neighbor (known as the fit conformer). The shape aligned overlay of neighbored compounds may be downloaded and visualized using the aforementioned web-based viewer or helper application. It is also possible to download superposition information in bulk (CSV format or stored as a property of the reference conformer); however, one must also download the PubChem3D conformers corresponding to the superposition and apply the provided rotation/translation (in that order) matrix/vector to the fit (second) conformer to yield the resulting superposition.

Download of the 3-D information from either the PubChem Download Facility, Compound Summary page, 3-D web-based viewer, or FTP site includes the 3D properties, including: MMFF partial charges; volume; steric monopole, quadrupole, and octopole moments; MMFF94 energy (with coulombic terms removed); shape fingerprint; self-overlap volumes used in ST and CT similarity computation; conformer model RMSD; the conformer model diverse ordering; and pharmacophore features.

All 3-D conformer data download is separated from the traditional 2-D information provided by PubChem. In the PubChem Download Facility, there is now a 3-D check box to indicate 3-D information is desired. Similarly, download of information from the Compound Summary provides a choice between 2-D and 3-D information.

For further assistance, please contact

Thank you,
- the PubChem team

:-= Bibliography =-:

[1] OEOmega, version 2.2. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2007.

[2] OEOmega, version 2.3. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2008.

[3] OEOmega, version 2.4. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2009.

[4] Halgren TA. Merck Molecular Force Field: I. Basis, Form, Scope, Parameterization and Performance of MMFF94. J. Comp. Chem. 1996;17:490-519.

[5] Halgren TA. Merck Molecular Force Field: VI. MMFF94s Option for Energy Minimization Studies.  J. Comp. Chem. 1999;20:720-729.

[6] OEShape, version 1.7.0. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2008.

[7] OEShape, version 1.7.2. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2009.

[8] OEShape, version 1.8.0. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2010.

:-= History =-:
2011 Mar 06 - PubChem3D version 2.0 release. Major rewrite of release notes.
2010 Jan 05 - Added link to a third presentation. Added missing "Fair Use Disclaimer".

:-=  Fair Use Disclaimer  =-:

Databases of molecular data on the NCBI FTP site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. They are designed to provide and encourage access within the scientific community to sources of current and comprehensive information. Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein. However, some submitters of the original data may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims and, therefore, cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in the molecular databases.