PubChem3D Release Notes

February 2009

Release Notes March 2011

PubChem generates [1-3] a theoretical 3D description of each compound in the PubChem Compound database that is not too large (<= 50 non-hydrogen atoms), is not too flexible (<= 15 rotatable bonds), consists of only organic elements (H, C, N, O, F, P, S, Cl, Br, and I), has only a single covalent unit (i.e., not a salt or a mixture), and contains only atom types recognized by the MMFF94s force field [4-5]. At the time of launch, this includes more than 17 million of the 19.5 million records (+87%) in the PubChem Compound database. Considering only the parent forms of salts are considered for 3D (e.g., acetic acid not sodium acetate) and that nearly 0.5 million compounds have a parent with 3D information, ~90% of PubChem Compound has 3D information.

Each provided theoretical 3D conformer is not at an energy minimum and may not represent the lowest energetic form in vacuum, solvent, or a binding pocket. Rather, the theoretical 3D description is a low energy conformer selected from a conformer model (a theoretical description of the conformational flexibility of a chemical structure consisting of multiple 3D representations or poses sampled using an RMSD {root mean squared distance} threshold) describing energetically-accessible and (potentially) biologically relevant conformations of a chemical structure. More details on aspects of the methodology used may be found here:

ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/presentations/PubChem3D_pt1.pdf
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/presentations/PubChem3D_pt2.pdf

While a conformer model consisting of up to 500 conformers per compound may be created, only a single conformer per compound is being provided at this time. There are over 1.7 billion conformers for the 17 million compound records with 3D information (over +1.5 TB in size). Disseminating such a large amount of information in bulk download format is not currently feasible. Additionally, neighboring and clustering using multiple conformers per compound makes visualization and interpretation per compound difficult at best. Therefore, only a single conformer per compound is being provided or utilized in the initial PubChem3D release. Bulk download of 3D conformers included in this release are found on the PubChem FTP site:

ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/

There are four Entrez indexes available to query compound records based on 3D information. These are "Volume3D", "XStericQuadrupole3D", "YStericQuadrupole3D", and "ZStericQuadrupole3D". The index "Volume3D" provides the ability to query the single 3D conformer per compound selected as a part of this release by volume or volume range, e.g., "0:200[volume3d]" will return those compounds with a 3D volume in the range 0-200 Angstroms**3. The steric quadrupoles essentially correspond to the extents of the compound, where X, Y, and Z correspond to the length, width, and height. For example, to find very long, near-linear compounds, one may give the PubChem Compound Entrez query "50:100[x3d] AND 0:1[y3d] AND 0:1[z3d]".

The integrated 3D information may be viewed for each compound record by clicking on the "3D" tab now available in the Compound Summary page. This displays an image of the 3D structure of a compound conformer. There is a helper application that may be downloaded and installed on your PC, Mac, or Linux computer. Alternatively, there is a web-based viewer that animates (via a series of images) the 3D structure of the molecule:

http://pubchem.ncbi.nlm.nih.gov/vw3d/vw3d.cgi

PubChem neighbors conformers by similarity [6] taking into account shape and features. This is indicated by "Similar Conformers" for each compound record with a 3D description. The neighboring relationships may be visualized in the form of an overlay of a compound (known as the reference conformer) with its similar conformer neighbor (known as the fit conformer). The shape aligned overlay of neighbored compounds may be downloaded and visualized using the aforementioned web-based viewer or helper application.

Download of the 3D information from either the PubChem Download Facility, Compound Summary page or FTP site includes new 3D properties, including MMFF partial charges, volume, steric quadrupoles moments, steric octopole moments, and MMFF94 energy (with coulombic terms removed). All 3D conformer data download is separated from the traditional 2D information provided by PubChem. In the PubChem Download Facility, there is now a 3D check box to indicate 3D information is desired. Similarily, download of information from the Compound Summary provides a choice between 2D and 3D information.
For further assistance, please contact info@ncbi.nlm.nih.gov.

Thank you,
- the PubChem team

:-= Bibliography =-:
[1] Omega, version 2.1. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2006.
[2] Omega, version 2.2. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2007.
[3] Omega, version 2.3. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2008.
[4] Halgren TA. Merck Molecular Force Field: I. Basis, Form, Scope, Parameterization and Performance of MMFF94. J. Comp. Chem. 1996;17:490-519.
[5] Halgren TA. Merck Molecular Force Field: VI. MMFF94s Option for Energy Minimization Studies. J. Comp. Chem. 1999;20:720-729.
[6] OEShape, version 1.7.0. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2008.