Substance/Compound Summary Page Help
 
 
 
BRIEF TABLE OF CONTENTS
FOR THIS HELP DOCUMENT:
 
  Molecule overview  
  Substance vs. Compound summary page
(illustrated example)
 
 
  Categories of information displayed per molecule  
 
  Tool buttons BioActivity Summary icon. Click to read more about what you can do with this tool. Chemical Structure Search icon. Click to read more about what you can do with this tool. PubChem3D (PC3D) icon. Click to read more about what you can do with the 3D viewer. Download ASN.1 file. Click to read more about this download format. Download XML file. Click to read more about this download format. Download SDF file. Click to read more about this download format.  
 
  Links and related information  
  Properties  
  BioActivity data  
  Related compounds  
  Related substances  
  Other links  
  Chemical vendors  
  LinkOut  
 
  Data Processing  
  Depositor supplied data (PubChem Substance)  
  NCBI-generated data (PubChem Compound)  
  Reporting errors  
 
  References  
  Citing PubChem resources  
  Data usage and citation guidelines  
 
 
 

VIEW COMPLETE LIST OF
PUBCHEM HELP DOCUMENTS:
 
  About PubChem  
  Searching PubChem  
  Content of PubChem Record Displays  
  Tools & Data Analysis  
  Downloading Data  
  Depositing Data  
  PubChem Data Specifications  
  Reference Materials  
 
 

2D & 3D STRUCTURES
 
3D conformer of Ibuprofen (CID 3672). Click on the image to read more about the 2D and 3D structures displayed on PubChem summary pages. 2D structure of Ibuprofen (CID 3672). Click on the image to read more about the 2D and 3D structures displayed on PubChem summary pages.
 
 


 
Molecule overview back to top

molecule name | unique identifier | folder tabs | table of contents | structure snapshots | properties

Name of molecule back to top

  • Substance name

  • Compound name
    • The molecule name that appears at the top of a PubChem Compound page is generally the highest scoring term from the filtered list of depositor-supplied synonyms. A synonym that matches a MeSH term, however, is given priority. (The data processing section of this document provides more details about synonyms.)

Unique identifier (UID) back to top

Each record within a PubChem database receives a unique identifier (UID), which is an integer that remains stable even if the content of the record is modified or enhanced over time. While "UID" is the generic term that applies to all PubChem (and other NCBI database) records, the UID is sometimes referred to by other names, depending upon the specific database being used. For example:
  • Substance identifier (SID)
    • A substance identifier (SID) is the permanent identifier for a depositor-supplied molecule. These are found in the PubChem Substance database. Each SID corresponds to a unique external registry ID provided by a PubChem data source. (The data processing section of this document provides more details about SIDs.)

  • Compound identifier (CID)
    • A compound identifier (CID) is the permanent identifer for a unique chemical structure. These are found in the PubChem Compound database. Each stereoisomer of a compound has its own CID. It is also possible for different tautomeric forms of the same compound to have different CID's. (The data processing section of this document provides more details about CIDs.)

Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, "2244" is a valid identifier in both the PubChem Substance and PubChem Compound database, where:
SID: 2244 is the PubChem Substance database record for cytidylate, and
CID: 2244 is the PubChem Compound database record for aspirin.

Folder tabs are present on PubChem Substance summary pages: back to top

Note: A PubChem summary page will not display the "Chemical Structure" and "Deposited Record" folder tabs if you retrieve a record directly from the PubChem Compound database. (Those folder tabs appear only when you are looking at PubChem Substance record.) A PubChem Compound summary page will, however, contain a "Related Substances: Same Structure" link, which retrieves all of the corresponding depositor-supplied records from the PubChem Substance database. The substance records will contain both folder tabs, allowing you to return to the compound record simply by clicking the "Chemical structure" folder tab.

Table of Contents for a Substance or Compound back to top

  • The table of contents on a PubChem summary page lists the categories of information that are available for the particular substance or compound. A PubChem Substance summary page is based on the data submitted by an individual depositor. A PubChem Compound summary page, on the other hand, displays data organized by NCBI automated data processing, serving as a hub of information for each unique chemical structure. PubChem Compound summary pages therefore tend to list more topics in their table of contents.

    (As a convenience to users, some of the annotations present on a Compound summary page are also displayed on a corresponding Substance summary page. In such cases, the source of any information block added by NCBI data processing is noted in the lower right hand corner of the grey-bordered box that surrounds it. That allows data from external sources to be differentiated from data that was submitted by the depositor.)


  • The "Show subcontent titles" option lists the topics that are available in each information category.


  • The default display of a compound/substance summary page shows only the first topic in each information category. Use the "show more" link at the bottom of an information category to see all of the topics it contains. After an information category has been expanded to show all of its contents, a "show first sub-section only" link appears and allows you to contract the section back to its default display.


  • The "Expand all contents" button expands all sections of the page at once.


Structure Snapshots back to top

    2D Structure
2D structure of Ibuprofen (CID 3672). Click on the image to open Ibuprofen's PubChem Compound summary page, from which you can launch a 2D view of the molecule that can be resized as desired.

  • The chemical structure graphic on a PubChem Substance summary page displays the structure that was provided by the depositor.


  • The 2D structure folder tab on a PubChem Compound summary page (and in the "Chemical Structure" folder tab of a PubChem Substance summary page) displays the standardized chemical structure for the molecule, as generated by NCBI data processing.

 
    3D Conformer
3D conformer of Ibuprofen (CID 3672). Click on the image to open Ibuprofen's PubChem Compound summary page, from which you can launch an interactive 3D view of the molecule using the PubChem3D (PC3D) application.


 
 

TIP:  Find compounds that are similar in 3D but not 2D structure

3D conformer of Zoliprofen (CID 68758). Click on the image to open Zoliprofen's PubChem Compound summary page, from which you can launch an interactive 3D view of the molecule using the PubChem3D (PC3D) application. 3D conformer of Ibuprofen (CID 3672). Click on the image to open Ibuprofen's PubChem Compound summary page, from which you can launch an interactive 3D view of the molecule using the PubChem3D (PC3D) application. Each PubChem Compound record has links to other compounds that are similar in 2D structure ("similar compounds") or similar in 3D structure ("similar conformers").

It is also possible to find compounds that are similar in 3D structure but not in 2D structure to a given PubChem Compound, as is true for Ibuprofen (CID 3672) and Zoliprofen (CID 68758) illustrated here.

To find compounds that are similar in 3D but not 2D structure, follow these steps:

  1. Open a PubChem Compound record of interest. (For example, open CID 3672: ibuprofen.)

  2. Follow the link for "Related Compounds:Similar Compounds" to retrieve the 2D-similar compounds. (For example, view the list of ibuprofen's similar compounds.)
    Then use the browser's back button to go to the previous display (showing the original compound of interest: CID 3672: ibuprofen).

  3. Follow the link for "Related Compounds:Similar Conformers" to retrieve the 3D-similar compounds. (For example, view the list of ibuprofen's similar conformers.)

  4. Click on the "Advanced" link at the top of the PubChem Compound search results page. The "History" section will list your recent searches, for example:
    -----------------------------------------------------------------------------------------
           Add to
    Search Builder Query                                                 Items found Time
    -----------------------------------------------------------------------------------------
    #07    Add     Similar Conformers for PubChem Compound (Select 3672)     521     17:07:23
    #06    Add     Similar Compounds for PubChem Compound (Select 3672)     2245     16:52:02
    -----------------------------------------------------------------------------------------           
  5. Find the difference between the data sets retrieved by search #07 and #06. To do that, you can either:

    (a) Click on "Add" beside search search #07, then do the same for search search #06. That will add the searches to the query box at the top of the advanced search page using the default Boolean operator "AND." Highlight the "AND" in the query box and change it to "NOT", so the search now appears as: #07 NOT #06. Then press the "Search" button to view the list of records that are similar in 3D shape but not in 2D shape to your original compound.

    OR

    (b) Click on the "Edit" link beneath the query box on the Advanced search page and simply enter the following string:

    #07 NOT #06

    Then press the "Search" button to view the list of records that are similar in 3D shape but not in 2D shape to your original compound.

    Note that your search numbers might be different from those shown here, if you did earlier searches in the Entrez system before trying these examples. Use the search numbers that appear on your Advanced search page.

 

Properties (subset of computed properties) back to top

  • A subset of a compound's chemical and physical properties is shown in the "Links and Related Data" column of a PubChem Compound summary page. These properties have have been computed during NCBI data processing and are shown here as part of the molecule overview information. The complete set of properties are shown in the "chemical and physical properties" section of a compound summary page.


  • Properties are not computed for substances. However, each PubChem Substance summary page has a "chemical structure" folder tab, which displays the corresponding PubChem Compound and its computed properties.

Comparison of Substance and Compound Summary Page back to top


 

As illustrated below, the PubChem Substance database is a repository of depositor-supplied data. Although PubChem does not curate submitted records, it does generate a corresponding PubChem Compound record for every unique chemical structure submitted to the Substance database, using the automated data processing procedures described below. The depositor-supplied substance record is then linked to the corresponding NCBI-generated compound record, and vice versa.

If a depositor submits a substance that has a chemical structure already present in an existing PubChem Compound record, a reciprocal link between the new substance record and the existing compound record is created, and the compound record is updated to reflect the newly available substance.

In this way, the PubChem Compound database serves as a: (1) non-redundant, standardized set of molecules that are unique in their chemical 2D structure; (2) hub for accessing all of the depositor-supplied substance records that have the same chemical 2D structure, and (3) easy mechanism for finding the substance records that are likely to contain specific types of information about the molecule, based on their substance categorization classification.

 
PubChem Substance database PubChem Compound database

Repository of depositor-supplied data
Example: subset of deposited records for ibuprofen:

Illustration of four sample PubChem Substance records that contain depositor supplied data for ibuprofen. Click on the image to retrieve this subset of records in the PubChem Substance database, or click on the link beneath the image to retrieve all deposited records for ibuprofen.


Non-redundant database, information hub
Example: NCBI-generated record for ibuprofen:

Illustration of the PubChem Compound record for ibuprofen, generated by NCBI data processing procedures to serve as a hub of information for the compound. Click on the image to open the live ibuprofen record in the PubChem Compound database.


The illustration above shows a subset of PubChem Substance records for ibuprofen.

If desired, you can also view all PubChem Substance records for ibuprofen.

Note that the name given to the molecule (and the content of the records) may vary, because individual depositors may provide different types of information for the same chemical.

back to top The illustration above shows the PubChem Compound record for ibuprofen, which:

- contains a standardized chemical structure that serves as a non-redundant representative of the molecule in the corresponding "Deposited records" (PubChem Substance records).

- contains information about the molecule added by NCBI automated data processing, such as chemical and physical properties, computed 3D conformers, biomedical annotations, and more;

- links to all PubChem Substance records that contain the same chemical structure, classifying them by "substance categorization."

 
Categories of information, as available, for a molecule back to top

The categories of information listed on a molecule's summary page reflect the data that are available for the particular substance or compound and may include:
identification | related records | use and manufacturing | pharmacology | biomedical effects and toxicity | safety and handling | environmental fate and exposure potential | exposure standards and regulations | monitoring and analysis methods | literature | patents | biomolecular interactions and pathways | biological test results | classification (ontologies, substance categorization) | chemical and physical properties
Some information on the molecule's summary page comes from the data depositor and some is added at NCBI by automated data processing. The source of any information block added by NCBI data processing is noted in the lower right hand corner of the grey-bordered box that surrounds it. The table below provides examples of the types of information that may be present in each category and the information source.
Molecule Information Category Description
Identification back to top The amount and type of information present in the "Identification" section of a PubChem summary page depends on whether you are viewing a substance or compound record, as noted below:
back to top PubChem Substance record: The "Identification" section of a PubChem Substance summary page displays information provided by the data depositor:

  • Depositor-Supplied Synonyms - The list of molecule names that was supplied by the depositor in their PubChem Substance record. (The data processing section of this document provides more details about synonyms.)


  • Substance Information
    • SID - The Substance [SID] is the permanent identifier, assigned by PubChem, for a depositor-supplied molecule. Each SID corresponds to a unique external registry ID provided by a PubChem data source. (The data processing section of this document provides more details about SIDs.)
      • Deposit Date - The date on which the data was first deposited into PubChem Substance
      • Modify Date - The date on which the PubChem Substance record was last revised.
      • Substance Version - A "substance version" dropdown menu appears only if a substance record has been modifed by the depositor since it was first submitted to PubChem. It allows you to view earlier version(s) of the substance, as available.

    • Data Source
      • Depositor - The name of the individual or organization that deposited the record into the PubChem Substance database, with a link to the depositor's web site.
      • External ID - The depositor's external identifier for their PubChem Substance record, with a link to the original record on the depositor's web site.

back to top PubChem Compound record: The "Identification" section of a PubChem Compound summary page displays information derived from NCBI data processing and from PubChem Substance records that contain the same chemical structure:

  • Depositor-Supplied Synonyms - A PubChem Compound record displays a filtered list of synonyms that have been used consistently and only for the given chemical struture and its isotopes, stereoisomers, and tautomers. Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list. If all synonyms have been filtered out for this reason, the PubChem Compound record will not display any depositor-supplied synonyms. (The data processing section of this document provides more details about synonyms.)


  • Compound Information
    • CID - The Compound Identifier (CID) is the permanent identifer for a unique chemical structure. Each stereoisomer of a compound has its own CID. It is also possible for different tautomeric forms of the same compound to have different CID's. (The data processing section of this document provides more details about CIDs.)
      • Create Date - The date on which the PubChem Compound record was created.

    • Descriptors
      • IUPAC Name - The standard name for the chemical, based on nomenclature rules from the The International Union of Pure and Applied Chemistry (IUPAC).
      • InChI - InChI is short for IUPAC International Chemical Identifier. It is a linear notation that uses ASCII characters to represent the 2D shape of a molecule. (More information about InChI is available from the InChI web page and from a Jan-Feb 2009 publication.)
      • InChI Key - A fixed-length (25-character) condensed digital representation of the InChI identifier. (More information about InChIKey is available from the InChI web page.)
      • Canonical SMILES - The Simplified Molecular-Input Line-Entry system, or SMILES, is a specification in form of a line notation for describing the structure of chemical molecules using short ASCII strings (more about SMILES...).
Related Records back to top The amount and type of information present in the "Related Records" section of a PubChem summary page depends on whether you are viewing a substance or compound record, as noted below:
back to top PubChem Substance record: The "Related Records" section of a PubChem Substance summary page provides links to the following categories of substances:

  • Related Substances
    • Same - The molecules in this group are exactly the same, including connectivity, isotopes and stereochemistry.
    • Same Connectivity - The molecules in this group have the same regular chemical connectivity, ignoring isotopes and stereochemistry.
    • Similar Substances - All substances shown have a similarity score [Tanimoto] >90%. If you want to find substances with different scores, you can visit the PubChem structure search page.
back to top PubChem Compound record: The "Related Records" section of a PubChem Compound summary page provides links to the following categories of records in the PubChem Substance and PubChem Compound database, as available:

  • Related Compounds with Annotation - the subset of compounds that are related to the one currently displayed AND that have biomedical annotations such as:
    • Medication information - the subset of related compounds that are associated with medication information from DailyMed. more...
    • Cited literature - the subset of related compounds that have literature references in PubMed. These references were either cited by the depositors of the compounds and/or associated with the compounds by NCBI automated data processing. more...
    • 3D Structure - the subset of related compounds associated with an experimentally resolved structure in NCBI's Molecular Modeling Database (MMDB), which contains 3D structures of proteins, RNA, and DNA, many of which are bound to small molecules. The compounds are associated with MMDB either because they were derived from a 3D structure record (i.e., they were the small molecules bound to the protein, DNA, or RNA), or because they have the same chemical structure, including same connectivity, isotopes and stereochemistry, as the compound in the 3D structure record. more...
    • Tested in BioAssays - the subset of related compounds that have associated biological test results in the PubChem BioAssay database, many with available molecular target data. more...

      NOTE: Within each category above, the related compounds are sorted according to the degree of annotation available. For example, in the case of bioassay tested compounds, those that are more efficacious/active/probes are shown first.

  • Related Compounds -- all of the compounds (whether or not they have biomedical annotation) that are related in the following ways to the compound currently displayed:
    • Same, Connectivity - Molecules that have the same regular chemical connectivity, ignoring isotopes and stereochemistry.
    • Same, Stereochemistry - Molecules that have the same connectivity and stereochemistry, ignoring isotopes.
    • Same, Isotopes - Molecules that have the same connectivity and isotopes, ignoring stereochemistry.
    • Similar Compounds - All compounds shown have a similarity score [Tanimoto] >90%. If you want to find compounds with different scores, you can visit the PubChem structure search page. (The data processing section of this document describes how the chemical structures are validated and standardized and how similar compounds are identified.)
    • Similar Conformers - Compounds that are similar in 3D shape, including shape/feature, and pharmacophore complementariness.
      The data processing section of this document describes how the 3D conformers are computationally generated and the method by which similar conformers are identified.
      If you want to view the similar conformers in a particular sort order (e.g., by shape then feature similarity, or by feature similarity then shape), or if you want to see only a subset of similar conformers that have specific properties, you can visit the PubChem structure search page and use the "3D Conformer" tab.

  • Related Substances
    • All - All PubChem Substance records that contain this chemical structure, either as an independent molecule or as a component of a mixture.
    • Same Structure - PubChem Substance records that have exactly the same chemical structure, including connectivity, isotopes and stereochemistry.
    • Mixture - PubChem Substance records that contain this molecule as a component of a mixture.
Use and Manufacturing back to top A PubChem summary page displays use and manufacturing information, if/as available for a given molecule. Examples of information in this section include:

  • Medication information from DailyMed


  • Manufacturing information from HSDB, such as manufacturing methods, formulations, production and distrubion patterns, consumption patterns, etc.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Pharmacology back to top A PubChem summary page displays pharmacology information, if/as available for a given molecule. Examples of information in this section include:

  • Pharmacological action information from MeSH


  • Therapeutic information from HSDB, such as therapeutic uses, drug warnings, dosing information, etc.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Biomedical Effects and Toxicity back to top A PubChem summary page displays biomedical effects and toxicity information, if/as available for a given molecule. Examples of information in this section include:

  • Metabolic information from HSDB, such as absorption, distribution and excretion, metabolism/metabolites, biological half-life, mechanism of action, etc.


  • Toxicology references from ChemIDplus
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Safety and Handling back to top A PubChem summary page displays safety and handling information, if/as available for a given molecule. Examples of information in this section include:

  • Safety references from ChemIDplus


  • Safey and handling information from HSDB, such as fire potential, reactivities and incompatibilities, skin/eye/respiratory irritations, protective equipment and clothing, stability/shelf life, storage conditions, disposal methods, decomposition, etc.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Environmental Fate and Exposure Potential back to top A PubChem summary page displays environmental fate and exposure potential information, if/as available for a given molecule. Examples of information in this section include:

  • An environmental fate/exposure summary from HSDB, as well as additional information from that resource on specific topics such as biodegredation, abiotic degredation, bioconcentration, soil adsorption/mobility, volatilization from water/soil, probable routes of human exposure, etc.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Exposure Standards and Regulations back to top A PubChem summary page displays exposure standards and regulations information, if/as available for a given molecule. Examples of information in this section include:

  • OSHA standards, NIOSH recommendations, and threshold limit values from HSDB
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Monitoring and Analysis Methods back to top A PubChem summary page displays monitoring and analysis methods information, if/as available for a given molecule. Examples of information in this section include:

  • Clinical and analytic laboratory methods from HSDB
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Literature back to top A PubChem summary page displays literature references, if/as available for a given molecule. Examples of information in this section include:
  • Scrollable table that lists all PubMed Citations associated with the chemical structure:

    • The table is a unified list that includes depositor provided PubMed citations plus NLM curated PubMed Citations. (If you'd like to see the two sets of PubMed citations separately, click on the link for "show all 2 sub-sections (Depositor Provided PubMed Citations, NLM Curated PubMed Citations)", which appears beneath the scrollable table.)

    • The disk icon disk icon to save data as comma separated value (*.csv) file in the upper right hand corner of the table enables you to save the list of PubMed citations as a comma separated value (*.csv) file.

    • If you want to link to, or display, the table of PubMed citations in another web page, you can use the share button Share button that appears in upper right corner of a table-based widget to either:


  • Depositor Provided PubMed Citations (from depositor)

    • In a PubChem Substance record, the "Depositor Provided PubMed Citations" section lists only the PubMed citations that have been provided by the depositor of that particular substance.

    • In a PubChem Compound record, the "Depositor Provided PubMed Citations" section displays a concatenated list of all PubMed records that have been cited by the depositors of all PubChem Substance records that contain the same chemical structure as the compound (including same connectivity, isotopes and stereochemistry).

  • NLM Curated PubMed Citations (from MeSH)

    • The "NLM Curated PubMed Citations" section links to all PubMed records that are tagged with the same MeSH term that has been associated with a particular compound.

    Note: The "NLM Curated PubMed Citations" and "Depositor Provided PubMed Citations" might have some -- though not necessarily complete -- overlap. Each set of PubMed records might contain items that are not present in the other set.
Whenever information on a PubChem summary page has been derived from an external data source, the source is noted in the lower right hand corner of the grey box that surrounds the information block. (In the Literature section, the "Depositor Provided PubMed Citations" are internal to PubChem, because they because they were explicitly noted by submitters of PubChem Substance records. However, citations that were derived from MeSH are external to PubChem, so MeSH is noted as the source of the papers listed in the "NLM Curated PubMed Citations" box.) The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Patents back to top A PubChem summary page displays Patent information, if/as available for a given molecule. Examples of information in this section include:
  • Scrollable table that lists patents associated with the chemical structure:

    • In a PubChem Substance record, the patents table lists only the patent identifiers that have been provided by the depositor of that particular substance.

    • In a PubChem Compound record, the patents table displays a concatenated list of all patent identifiers that have been cited by the depositors of all PubChem Substance records that contain the same chemical structure as the compound (including same connectivity, isotopes and stereochemistry).

  • The disk icon disk icon to save data as comma separated value (*.csv) file in the upper right hand corner of the scrollable table enables you to save the list of patents as a comma separated value (*.csv) file.

  • If you want to link to, or display, the patents table in another web page, you can use the share button Share button that appears in upper right corner of a table-based widget to either:


  • The association between a patent identifier and a chemical is based on an assertion by a PubChem depositor that these patents are relevant to the chemical. It is not possible to validate chemical-patent associations in an automated way.

  • Click on a patent identifier to open the corresponding patent on the patent office's web site.

Tip: If you want to retrieve all the PubChem Substance or Compound records that are associated with a specific patent identifier, enter the following query on a PubChem search page:
xxxxxxxxx[Patent]

For example:

EP0821690A1[Patent]
If you are searching the PubChem Compound database, the above query will retrieve all of the compounds (i.e., a non-redundant list of the unique chemical structures) associated with the patent. If you are searching the PubChem Substance database, it will retrieve all of the deposited chemical structures associated with the patent. (A separate section of this document provides an illustrated example that shows the difference between substance and compound records.)

Biomolecular Interactions and Pathways back to top A PubChem summary page displays biomolecular interactions and pathways information, if/as available for a given molecule. Examples of information in this section include:

  • Protein Bound 3-D Structures (from Structure) - links to experimentally resolved 3D structures that contain biomolecules bound to the small molecule.


  • Biosystems and Pathways (from BioSystems) - biological systems, such as pathways, that include the small molecule as a component.


  • Interactions (from DrugBank) - proteins with which the small molecule interacts. Each protein is listed in a separate information block, which summarizes the nature of the interaction and provides links to related information.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Biological Test Results back to top A PubChem substance or compound summary page displays biological test results, if/as available, for the chemical structure currently displayed. (Note that you can embed biological test results displays within your own web pages, for a PubChem Compound or Substance of interest, by using the BioActivity Widget.)

Examples of information in this section include:

  • Graphical summary of the bioassays that have tested the chemical structure, categorizing the bioassays by:


    • bioactivity outcomes (active, inactive, inconclusive, unspecified)
    • top targets (names of genes associated with the protein sequence identifiers that were named as targets in the experiments)
    • bioactivity types (IC50, EC50, Potency, Ki, etc.)
    • bioassay types (screening, confirmatory, summary, other).

    The bar graphs are clickable and serve as filters that allow you to view the desired subset of data in the table that appears beneath the bar graphs. By default, the table lists all of the bioactivity results that are currently available for the chemical structure.

    If you click on a bar graph of interest, such as "bioactivity outcomes: active," the table will be refreshed to display only the subset of data you have selected. If desired, you can also refresh the bar graph display to reflect the current subset of data by clicking on the option to "Apply Filters to Charts." To revert to the default display that shows all data, click on the option to "Reset All Filters."

  • Scrollable table that lists the chemical structure(s) in PubChem tested in experiments reported in the PubChem BioAssay database.

    • The number of PubChem bioassays listed in the table for a given chemical structure depends on whether you are viewing a PubChem substance record (which was provided by a submitter) or a PubChem compound record (which was generated at NCBI and serves as an information hub for what is known about the chemical structure; illustrated example):

      • On a PubChem Substance summary page, the biological test results table will list only the bioassays that tested the exact substance currently displayed, and that were reported in the PubChem BioAssay database.

      • On a PubChem Compound summary page, the biological test results table will list all bioassays that tested any substance that has the same chemical structure (including connectivity, isotopes and stereochemistry) as the compound currently being viewed.

    • Examples of information in the table include:

      • PubChem identifier (SID or CID) of the molecule that has been tested in bioactivity experiments.

      • Results of the experiment, including bioactivity outcome, bioactivity type, and value. The assays are sorted by outcome, grouping together assays in which the chemical structure:

        (a) was used as a chemical probe
        (b) was shown to be active
        (c) was shown to be inactive
        (d) had inconclusive activity
        (e) had unspecified activity

        Click on the "Outcome" column header to reverse the sort order, if desired.

      • Assay ID (AID) of the PubChem BioAssay record that reports the results of the experiment which tested the molecule's bioactivity. (The data processing section of this document describe the method by which bioassay records are associated with a PubChem substance or compound record.) Click on an AID to open more detailed information about that experiment in the PubChem BioAssay database.

  • BioActivity Summary links appear beneath the scrollable table only on PubChem Compound summary pages. These links bring together biological activity data for the following sets of chemical structures:

    • "This Compound" -- all PubChem Substances that have the same chemical structure (including same connectivity, isotopes and stereochemistry) as the compound currently displayed.

    • "Similar Compounds" -- chemical structures with a Tanimoto score >90%.

    • "Similar Conformers" -- compounds that have a 3D similarity by shape/feature and pharmacophore complementariness.

    A separate document describes what you will see on a BioActivity Summary display.

Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Classification back to top A PubChem summary page displays classification information, if/as available for a given molecule. Examples of information in this section include ontologies and substance categorization classification:

Ontologies - hierarchical organizations of terms that describe the compound's identity (i.e., chemical classification) and/or its activities (i.e., biological and chemical roles, pharmacological uses). For example:

  • Medical Subject Headings (MeSH) is the National Library of Medicine (NLM)'s controlled vocabulary thesaurus of medical terms. It is used for indexing literature from thousands of the world's leading biomedical journals for the MEDLINEŽ/PubMEDŽ database, and for cataloging medical books, documents, and audiovisual materials, in order to facilitate retrieval of medical information at various levels of specificity. The data processing section of this document describes the method by which MeSH terms are associated with PubChem Compound records.


  • Other sources of ontologies, such as ChEBI and KEGG, are depositors of records into the PubChem Substance database. If substance records from those depositors are associated with corresponding PubChem Compound records (because they contain the same chemical structure, including same connectivity, isotopes, and stereochemistry), then the classification information from the substance records is added to the "Classification: Ontologies" section of the PubChem Compound record.
Note: The PubChem Classification Browser can be used to browse PubChem data using a variety of hierarchical classification systems that have been associated with substances, compounds, and bioassays.
Substance Categorization Classification - The subheaders in this section of a PubChem Compound record reflect the various categories of depositors that have submitted corresponding PubChem Substance records. This allows you to quickly find the corresponding PubChem Substance records that are likely to contain a given type of information, such as Chemical Reactions. The categories may include:

  • BiologicalProperties - PubChem Substance records from depositors that provide information about the biological properties of a substance or compound.
  • Chemical Reactions - PubChem Substance records from depositors that provide information about the reactivity, synthesis, or known reactions of a substance or compound.
  • Imaging Agents - PubChem Substance records from depositors that provide information about the contrast agent or imaging agent used in, for example, MRI's.
  • Journal Publishers - PubChem Substance records from journal publisher that have articles published about a substance or compound.
  • Metabolic Pathways - PubChem Substance records from depositors that provide information on the metabolic pathways involving a substance or compound.
  • Molecular Libraries Screening Center Network - PubChem Substance records from depositors that are part of the NIH Molecular Libraries Screening Center Network (MLSCN).
  • NIH Substance Repository - PubChem Substance records from depositors that are members of the NIH Molecular Libraries Small Molecule Repository servicing the MLSCN.
  • Physical Properties - PubChem Substance records from depositors that provide information about the experimental physical properties of a substance or compound.
  • Protein 3D Structures - PubChem Substance records from depositors that provide information about the experimental 3-D structure of a substance or compound. (Most of the molecule records that fall into this depositor category are derived from Molecular Modeling Database records, which generally contain the 3-D structures of biomolecules, such as a proteins, that may be bound to the substance or compound.)
  • Substance Vendors - PubChem Substance records from depositors that are sellers of a substance or compound.
  • Theoretical Properties - PubChem Substance records from depositors that provide information about the theoretical properties of a substance or compound.
  • Toxicology - PubChem Substance records from depositors that provide information about the toxicological properties of a substance or compound.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

Chemical and Physical Properties back to top A PubChem summary page displays chemical and physical properties, if/as available for a given molecule. Examples of information in this section include:

  • Properties derived from HSDB, such as color/form, odor, melting point, density/specific gravity, dissociation contants, solubilities, spectral properties, etc.


  • Properties computed during automated data processing for PubChem Compound records, such as molecular formula and weight, exact mass, monoisotopic mass, H-bond donor/acceptor, counts of heavy atoms, isotope atoms, rotatable bonds, atom and bond stereocenters, rings, etc.
Whenever information on a PubChem summary page has been derived from an external data source (such as those mentioned here), the source is noted in the lower right hand corner of the grey box that surrounds the information block. The data processing section of this document describes how the association is made between the PubChem compound/substance record and the data source.

 
Tool Buttons back to top

BioActivity Summary icon. Click to read more about what you can do with this tool. Chemical Structure Search icon. Click to read more about what you can do with this tool. PubChem3D (PC3D) icon. Click to read more about what you can do with the 3D viewer. Download ASN.1 file. Click to read more about this download format. Download XML file. Click to read more about this download format. Download SDF file. Click to read more about this download format.   Click on a button to jump to its description.
BioActivity Summary icon. Click to read more about what you can do with this tool. BioActivity Summary
(only for compounds)
back to top This icon opens the BioActivity Summary page page for this compound, for compounds that have similar chemical structure, or for compounds that have similar 3D conformers. The BioActivity Summary page reports the available biological screening results for a single or a set of chemical samples. This service provides a way to examine and compare biological outcomes across multiple biological tests. Additional information about the BioActivity Services is provided in a separate document and in the publications about the PubChem BioAssay database.

Chemical Structure Search icon. Click to read more about what you can do with this tool. Chemical Structure Search back to top This icon opens the PubChem Structure Search page and transfers this compound's isomeric SMILES string into the search field. The Structure Search page enables you to retrieve identical or similar molecules based on a variety of criteria, and if desired, to filter the results based on chemical and physical properties, stereochemistry, presence of particular chemical elements, bioactivity, depositor category, data source, etc. A separate Structure Search help document provides more information about using the tool.

PubChem3D (PC3D) icon. Click to read more about what you can do with the 3D viewer. PubChem3D (PC3D) views
(only for compounds)
back to top This icon allows you to open an interactive view of the 3D conformers that were generated for the molecule during NCBI data processing, in your choice of either the browser window (using PubChem Web-based 3D Viewer) or the free PubChem3D Viewer desktop application.

Because 3D conformers are generated only for molecules in the PubChem Compound database, this icon appears only on PubChem Compound summary pages (and in the "Chemical Structure" folder tab of PubChem Summary pages).

Additional details about PubChem3D are available from:

Download ASN.1 file. Click to read more about this download format. Download ASN.1 file Details of the PubChem Data Specifications for Small Molecule and Assay data (ASN.1 format) are provided in a file on the PubChem FTP site. back to top The scope of data saved when you use any download button depends on the type of PubChem record displayed:

When you are viewing a PubChem Substance record, this button allows you to download the structure data that was submitted by the depositor (including identifiers, coordinates, and properties) in the format you select (ASN.1, XML, or SDF).

When you are viewing a PubChem Compound record, this button allows you to download the data for the standardized structure that was generated by NCBI data processing (including identifiers, computed 2D and 3D coordinates, and properties) in the format you select (ASN.1, XML, or SDF).

Whether you are viewing a Substance or Compound record, the download buttons will not save additional information, such as biological annotations, that are displayed on the molecule's PubChem summary page in your web browser.

Download XML file. Click to read more about this download format. Download XML file Details of the PubChem Data Specifications for Small Molecule and Assay data (XML format) are provided in a file on the PubChem FTP site.
Download SDF file. Click to read more about this download format. Download SDF file Details of the PubChem Data Specifications for SDF files are provided in files on the PubChem FTP site:
SD Field Descriptions (PDF file)
SD Field Descriptions (text file)

 
Links and related information back to top

Properties | BioActivity | Related Compounds | Related Substances | Other Links | Chemical Vendors | LinkOut
The Entrez retrieval system, which provides the search interface for PubChem and other NCBI databases, is designed to provide integrated access to previously disparate data and make it possible to collect related information on a topic of interest within and across Entrez databases (illustrated example).

As part of Entrez, PubChem implements data processing steps to identify such associations and present them as link options on search results pages and on the displays of individual records.

In the display of an individual PubChem record, the associations are available in the "Links and Related Information" panel in the right margin of the page. Those links retrieve related data only for the particular substance or compound displayed.

On a search results page, the associations are available in the "Find Related Data" box that appears in the right margin. Those links will retrieve related data for all of the records in your search results (default), or for the subset of records you have selected with checkboxes.

The number and type of links that are displayed depend upon the data available for a particular PubChem record(s) and can include the following:
Link Group Link Name Description
Properties Properties back to top A subset of a compound's chemical and physical properties is shown in the "Links and Related Data" column of a PubChem compound summary page. These properties have have been computed during NCBI data processing and are part of the molecule overview information. The complete set of properties are shown in the "chemical and physical properties" section of a compound summary page.

Properties are not computed for substances. However, each PubChem substance summary page has a "chemical structure" folder tab, which displays the corresponding PubChem compound and its computed properties.

BioActivity Data Links back to top BioActivity links are generally found on PubChem Compound summary pages, in both the right hand margin and in the Biological Test Results section of the page. The links retrieve all bioassays that have been done on PubChem Substances containing the same chemical structure (including same connectivity, isotopes and stereochemistry), similar chemical structures, or similar 3D conformers, as noted below. BioActivity links appear on a PubChem Substance summary page only if the depositor submitted bioassay data for that particular substance.
This Compound back to top Links to PubChem BioAssay records that tested this particular chemical structure. Specifically, the links include all BioAssays that tested PubChem Substances which have the same chemical structure (including same connectivity, isotopes and stereochemistry) as the compound currently displayed.

The data processing section of this document describes the method by which PubChem BioAssay records are associated with PubChem Substance and Compound records.
with Similar Compounds back to top Links to PubChem BioAssay records that tested this particular chemical structure, or that tested similar chemical structures with a Tanimoto score >90%.

The data processing section of this document describes how the chemical structures are validated and standardized and how similar compounds are identified.
with Similar Conformers back to top Links to PubChem BioAssay records that tested this particular molecule, or that tested similar conformers (compounds that have a 3D similarity by shape/feature and pharmacophore complementariness).

The data processing section of this document describes how the 3D conformers are computationally generated and how similar conformers are identified.
Related Compounds Same, Connectivity back to top The molecules in this group have the same regular chemical connectivity, ignoring isotopes and stereochemistry.
Same, Stereochemistry back to top Molecules that have the same connectivity and stereochemistry, ignoring isotopes.
Same, Isotopes back to top Molecules that have the same connectivity and isotopes, ignoring stereochemistry.
Similar Compounds back to top All compounds shown have a similarity score [Tanimoto] >90%.

If you want to find compounds with different scores, you can visit the PubChem structure search page.

The data processing section of this document describes how the chemical structures are validated and standardized and how similar compounds are identified.
Similar Conformers back to top Compounds that are similar in 3D shape, including shape/feature, and pharmacophore complementariness.

The data processing section of this document describes how the 3D conformers are computationally generated and the method by which similar conformers are identified.

If you want to view the similar conformers in a particular sort order (e.g., by shape then feature similarity, or by feature similarity then shape), or if you want to see only a subset of similar conformers that have specific properties, you can visit the PubChem structure search page and use the "3D Conformer" tab.
Parent Compound back to top Link to the "parent" compound of the record.

A parent is conceptually the "important" part of the molecule when the molecule has more than one covalent component. Specifically, a parent component must have at least one carbon and contain at least 70% of the heavy (non-hydrogen) atoms of all the unique covalent units (ignoring stoichiometry). Note that this is a very empirical definition and is subject to change.

For example, the "parent" compound in tetracyline hydrochloride (CID 54704426) and tetracyline metaphosphate (CID 54729668) is tetracycline (CID 54675776).
Unique Components back to top If a molecule is composed of more than one covalent component (e.g., Tylenol codeine), a separate PubChem Compound exists for each of its unique components (Tylenol and codeine). The "unique components" link retrieves those separate records, which are acid/base neutralized forms of unique components.
Related Substances All back to top All PubChem Substance records that contain this chemical structure, either as an independent molecule or as a component of a mixture.
Same Structure back to top PubChem Substance records that have exactly the same chemical structure, including connectivity, isotopes and stereochemistry.
Mixture back to top PubChem Substance records that contain this molecule as a component of a mixture.
Other Links Protein Structure back to top Experimentally resolved 3D structures for biomolecules such as protein, DNA, or RNA that are bound to the substance or compound currently displayed. The data processing section of this document describes the method by which the 3D structure records are associated with PubChem Substance and Compound records.
PubMed back to top Literature references for this compound/substance from the PubMed database. The specific set of references associated with a PubChem record depend on the type of record:
  • A PubChem Substance record links only to the literature references that have been cited by the depositor of that record.


  • A PubChem Compound record contains two sets of PubMed links:

    • Depositor-Supplied PubMed Citations - concatenated list of all PubMed records that have been cited by the depositors of all PubChem Substance records that contain the same chemical structure as the compound (including same connectivity, isotopes and stereochemistry). This set of references is accessible from the "Links and Related Information: Other Links: PubMed" in the right hand margin, and from the "Literature: Depositor-Supplied PubMed Citations" section, of a PubChem Compound summary page.


    • NLM-Curated PubMed Citations - all PubMed records tagged with MeSH terms that have been associated with the PubChem Compound through the method described in the data processing section of this document. This set of references is accessible from the "Literature: NLM Curated PubMed Citations" section of a PubChem Compound summary page.


    Note: The "NLM Curated PubMed Citations" and "Depositor Provided PubMed Citations" might have some -- though not necessarily complete -- overlap. Each set of PubMed records might contain items that are not present in the other set.
Gene back to top Some depositors (for example, the Comparative Toxicogenomics Database) include Gene IDs in the records they submit to the PubChem Substance database. In such cases, a Gene link will appear on the PubChem summary page for those substances, and on the summary page of the corresponding PubChem Compound record.
Taxonomy back to top Some depositors (for example, the Comparative Toxicogenomics Database) include Taxonomy IDs in the records they submit to the PubChem Substance database. In such cases, a Taxonomy link will appear on the PubChem summary page for those substances, and on the summary page of the corresponding PubChem Compound record.
OMIM back to top Some depositors (for example, the Comparative Toxicogenomics Database) include OMIM identifiers in the records they submit to the PubChem Substance database. In such cases, an OMIM link will appear on the PubChem summary page for those substances, and on the summary page of the corresponding PubChem Compound record.
NLM Toxicology Link back to top This link opens the corresponding ChemIDplus record that is associated with the currently displayed molecule. The association was made by the method described in the data processing section of this document.
Chemical Structure Search back to top This link opens the PubChem Structure Search page and transfers this compound's isomeric SMILES string into the search field. The Structure Search page enables you to retrieve identical or similar molecules based on a variety of criteria, and if desired, to filter the results based on chemical and physical properties, stereochemistry, presence of particular chemical elements, bioactivity, depositor category, data source, etc. A separate Structure Search help document provides more information about using the tool.
Chemical Vendors Chemical Vendors back to top Links to vendors can appear in PubChem Substance and PubChem Compound records, and the links are made in the following ways:

Some depositors to the PubChem Substance database have identified themselves as vendors when they established their PubChem deposition account, and therefore fall under the "Substance Vendor" depositor category.

A PubChem Substance record from such a depositor will have a "Chemical Vendors" link that lists only the depositor of that particular record.

A PubChem Compound record, on the other hand, will list all the "Chemical Vendors" that have deposited substance records with the same chemical structure as the compound (including same connectivity, isotopes and stereochemistry).

LinkOut LinkOut back to top Links that external resources have created, using LinkOut, to the PubChem record(s) you are viewing.

LinkOut is a service that allows you to link directly from PubMed and other NCBI databases (including PubChem) to a wide range of information and services beyond the NCBI systems. LinkOut aims to facilitate access to relevant online resources in order to extend, clarify, and supplement information found in NCBI databases.

"LinkOut" will only appear in the "Links and Related Information" panel of a PubChem summary page only if an external resource has created a link to the PubChem record(s) you are viewing.


 
Data Processing:sources of the information displayed on a PubChem Substance/Compound summary page back to top

Depositor supplied data: PubChem Substance Database
NCBI-generated records: PubChem Compound Database
Reporting errors (in substance records or compound records)

Depositor supplied data:  PubChem Substance Database back to top


repository | data fields | substance identifier (SID) | no curation | corresponding compound | substance classification (ontologies) | depositor categories
  • Repository:  The PubChem Substance database is a repository of depositor supplied data. The same chemical may be represented in many PubChem substance records, and those records vary in their information content, reflecting the amount and types of information provided by their depositors. (View a list of data sources.)


  • Data fields:  A separate file describes the procedure for "PubChem Substance Deposition using SD File Format" and describes the types of data that may be present in a depositor's record. The only required data field is the depositor's unique external registry ID, and the many other allowable data fields that are described reflect the wide range of information that can appear in an individual substance record. We encourage depositors to provide, at a minimum, a chemical graph and one or more names (synonyms) for the molecule.


  • Substance Identifier (SID) - An NCBI unique identifier, called the substance identifier (SID), is assigned by PubChem to each unique external registry ID provided by a PubChem data depositor. The SID is an integer that identifies the depositor's record within the PubChem Substance database, and will never be reused for another substance record. A depositor may "revoke" (or otherwise deprecate) a PubChem SID at any time for any reason. However, the link to the "revoked" PubChem SID lives on in perpetuity. There will be a message stating the depositor deprecated the SID, but the link to the archived information will still be available. In addition, the PubChem CID's pointed to by the old version of a PubChem SID at the time it was versioned or deprecated will also be available.
    Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, "2244" is a valid identifier in both the PubChem Substance and PubChem Compound database, where:
    SID: 2244 is the PubChem Substance database record for cytidylate, and
    CID: 2244 is the PubChem Compound database record for aspirin.
  • No curation:  PubChem doesn't have curators and never changes/edits substance records. They remain as supplied by our depositors.


  • Corresponding compound:   Although PubChem does not curate submitted records, it does generate a corresponding compound record for every unique chemical structure submitted to the Substance database, using the NCBI automated data processing procedures described below, in order to provide a non-redundant view of the molecules in PubChem. The depositor-supplied substance record is then linked to the corresponding NCBI-generated compound record, and vice versa. If a depositor submits a substance that has a chemical structure already represented in an existing PubChem Compound record, a reciprocal link between the new substance record and the existing compound record is created, and the compound record is updated to reflect the newly available substance. (See illustrated example of substance records and corresponding compound record.)


  • Substance classification (ontologies):  Some depositors of PubChem Substance records, such as ChEBI and KEGG, maintain ontologies of terms that describe the substances they deposit into the PubChem. Substance records from those depositors display applicable terms from their ontologies in the "Classification:Ontologies" section of the PubChem Substance summary page. (If those substance records are associated with corresponding PubChem Compound records (because they contain the same chemical structure, including same connectivity, isotopes, and stereochemistry), then the classification information from the substance records is also added to the "Classification: Ontologies" section of the corresponding PubChem Compound record. The PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies.)


  • Depositor Categories:   Each lab or organization that deposits data into PubChem falls into one of the categories below. The category indicates the type of information you can expect to find for a molecule in that depositor's PubChem substance records or on the depositor's site.

    Each depositor's category is displayed on the list of data sources web page. The categories are also displayed in the "Classification" section of the corresponding PubChem Compound records, under the subheader "Substance Categorization Classification." That section of a PubChem Compound record allows you to quickly find the corresponding PubChem Substance records that are likely to contain a given type of information, such as Chemical Reactions.


Depositor Category Meaning
Biological Properties back to top Depositor provides information about the biological properties of a substance or compound.
Chemical Reactions back to top Depositor provides information about the reactivity, synthesis, or known reactions of a substance or compound.
Imaging Agents back to top Depositor provides information about the contrast agent or imaging agent used in, for example, MRI's.
Journal Publishers back to top Depositor is a journal publisher and has articles published about a substance or compound.
Metabolic Pathways back to top Depositor provides information on the metabolic pathways involving a substance or compound.
Molecular Libraries Screening Center Network back to top Depositor is part of the NIH Molecular Libraries Screening Center Network (MLSCN).
NIH Substance Repository back to top Depositor is an NIH Molecular Libraries Small Molecule Repository servicing the MLSCN.
Physical Properties back to top Depositor provides information about the experimental physical properties of a substance or compound.
Protein 3D Structures back to top Depositor provides information about the experimental 3-D structure of a substance or compound. (Most of the molecule records that fall into this depositor category are derived from Molecular Modeling Database records, which generally contain the 3-D structures of biomolecules, such as a proteins, that may be bound to the substance or compound.)
Substance Vendors back to top Depositor is a seller of a substance or compound.
Theoretical Properties back to top Depositor provides information about the theoretical properties of a substance or compound.
Toxicology back to top Depositor provides information about the toxicological properties of a substance or compound.

NCBI-generated records:  PubChem Compound Database back to top


validate & standardize chemical structures | identify unique chemical structures (compound identifier (CID), compound descriptors) | identify SAME compounds | gather and validate chemical name synonyms | compute 3D conformers | compute chemical and physical properties | identify SIMILAR molecules | add biomedical annotations | create links to related information
  • Validate & standardize the chemical's structure back to top

    When a molecule is submitted to the PubChem Substance database, the NCBI data processing procedures execute a series of steps to confirm the structure is "valid" and to generate a canonic tautomeric form:


    • The validation steps consist of:
      • Atom verification: do all atoms correspond to a known atomic element? E.g., "*" is not a known atom
      • Implicit hydrogens are assigned to organic elements using simple valence rules, e.g., methane "C" gets four implicit hydrogens assigned to it.
      • Functional group standardization: common incorrect and hypervalent representations of functional groups are "fixed", e.g., nitro groups represented by N(=O)=O become [N+](=O)[O-]
      • Atom valences are validated: do all atoms have an "allowed" valence? E.g., five bonds to carbon is not valid

    • The standardization steps consist of:
      • Valence bond (VB) canonicalization: equivalent/alternate VB/tautomeric forms of a structure are normalized into a single representation
      • Aromaticity detection: structure aromaticity is detected and validated to be kekulizable
      • StereoChemistry detection: SP3 and SP2 stereo centers are detected and stereo-wedge placement standardized
      • Explicit hydrogen assignment: implicit hydrogens are converted to be explicit

    • Subsequent additional processing includes 2D coordinate layout assignment.


    • The resulting validated and standardized chemical structure appears in the corresponding PubChem Compound record, which is linked to the original PubChem Substance record. The PubChem Substance record displays the chemical structure that was submitted by the depositor. (See illustrated example of substance records and corresponding compound record.)


  • Identify each unique chemical structure back to top


    • Assign Compound Identifier (CID) back to top

      A compound identifier (CID) is an integer assigned to each unique chemical structure (unique chemical atom connectivity graph) that has been found among the millions of records deposited into PubChem by data submitters. Each stereoisomer of a compound has its own CID. It is also possible for different tautomeric forms of the same compound to have different CID's. The complete collection of unique chemical structures is housed in the PubChem Compound database, which serves as a non-redundant view of the molecules in PubChem.

      The chemical structure represented by a CID is permanent. The URLs to PubChem Compound summary pages (e.g., http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3672) are stable (always live), regardless if any (or no) substance points to them.
      Note: Although identifiers are unique within a PubChem database, the same integer can be used as an identifier in two or more different databases. For example, "2244" is a valid identifier in both the PubChem Substance and PubChem Compound database, where:
      SID: 2244 is the PubChem Substance database record for cytidylate, and
      CID: 2244 is the PubChem Compound database record for aspirin.
    • Apply Compound descriptors back to top

      NCBI calculates the compound descriptors from the standardized (2D) chemical structure. Each descriptor is a text representation of the chemical structure, using the syntax specified by the following organizations/rules:


      • IUPAC Name - The standard name for the chemical, based on nomenclature rules from the The International Union of Pure and Applied Chemistry (IUPAC).


      • InChI - InChI is short for IUPAC International Chemical Identifier. It is a linear notation that uses ASCII characters to represent the 2D shape of a molecule. (More information about InChI is available from the InChI web page and from a Jan-Feb 2009 publication.)


      • InChI Key - A fixed-length (25-character) condensed digital representation of the InChI identifier. (More information about InChIKey is available from the InChI web page.)


      • Canonical SMILES - The Simplified Molecular-Input Line-Entry system, or SMILES, is a specification in form of a line notation for describing the structure of chemical molecules using short ASCII strings (more about SMILES...).


  • Identify Same compounds back to top

    After the chemical structures are validated and standardized, the PubChem data processing procedure identifies the compounds that are the same in the following ways:

    • Same, Connectivity - Molecules that have the same regular chemical connectivity, ignoring isotopes and stereochemistry.


    • Same, Stereochemistry - Molecules that have the same connectivity and stereochemistry, ignoring isotopes.


    • Same, Isotopes - Molecules that have the same connectivity and isotopes, ignoring stereochemistry.


    • Same Parent - Molecules that have the same "parent" compound.
      A parent is conceptually the "important" part of the molecule when the molecule has more than one covalent component. Specifically, a parent component must have at least one carbon and contain at least 70% of the heavy (non-hydrogen) atoms of all the unique covalent units (ignoring stoichiometry). Note that this is a very empirical definition and is subject to change.

      For example, the "parent" compound in tetracyline hydrochloride (CID 54704426) and tetracyline metaphosphate (CID 54729668) is tetracycline (CID 54675776).

      The PubChem data processing procedures identifies all compounds that have the same parent, and allows easy access to that data set through the "same parent" link that appears on the compound summary pages. This makes it easy to find, for example, variants of a medication composed of an active chemical and a salt.
  • Gather and validate chemical name synonyms back to top

    The next step in PubChem data processing, after the identification of unique chemical structures and same compounds, is to gather and validate all of the synonyms that have been used for those molecules.

    Various data depositors might use different terms to refer to the same chemical structure.

    An individual PubChem Substance record shows only the synonyms that were provided by the depositor of that record. Different sets of synonyms might be provided by different submitters for the same molecule. The complete set of the synonyms that have been provided by all depositors of a particular chemical structure is referred to as the "unfiltered" list of synonyms. (For example, see the various synonyms provided by submitters of individual PubChem Substance records for ibuprofen (shown in the "Identification" section of each record), or view the total, unfiltered list of synonyms that depositors have used for ibuprofen.)

    The corresponding PubChem Compound record shows a "filtered" list of synonyms, derived from all PubChem Substance records containing the same structure, that have been found to consistently refer to that specific chemical structure. (For example, see the filtered list of synonyms in the PubChem Compound record for ibuprofen (CID 3672), in the "Identification" section of that record.)

    The "filtered" list of synonyms is created in the following way:


    1. All depositor-supplied synonyms for a given chemical structure are gathered from the corresponding PubChem Substance records to create the complete, "unfiltered" list of synonyms for that molecule.

    2. PubChem data processing identifies the subset of depositor-supplied synonyms that have been used consistently and only for the given chemical struture and its isotopes, stereoisomers, and tautomers. (Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list.) The resulting subset of consistent depositor-supplied synonyms represents the "filtered" list, which appears in the PubChem Compound record for the given chemical structure.

    3. Each synonym is given a score that determines the order in which the synonyms are shown. The score takes into account the frequency, readability, and consistency of each synonym:

      • Frequency - the number of times a synonym is provided by depositors for a particular chemical structure. Most commonly used synonym(s) show first.
      • Readability - The readability score is determined by the size of the synonym, the count of non-alphabetic characters, and capitalization, etc., so easily readable synonyms (e.g., ibuprofen) are shown before chemical names (e.g., 2-[4-(2-methylpropyl)phenyl]propanoic acid).
      • Consistency - The PubChem Substance records assigned to a synonym need to be consistent at any of the following levels (high to low): exact same structure, same stereo form, same connectivity, same parent structure, same parent stereo form, or same parent connectivity.
      • Equation - The equation used to determine the score of a synonym (the "clean synonym weight") is:
             59 * log((8- "synonym consistency level") * "synonym readability score" * "synonym frequency")

    4. A MeSH tree icon MeSH icon, which indicates the term has an exact match to a term in the National Library of Medicine's Medical Subject Heading (MeSH) database. appears beside any synonym that has an exact match to a term in the National Library of Medicine's Medical Subject Heading (MeSH) database.

Additional notes about synonyms:


Some PubChem Compound records do not show a list of depositor-supplied synonyms (e.g., CID 444098). This usually means there are no depositor-supplied synonyms that are consistently used only for this chemical structure. If desired, you can see the synonyms that data depositors have used for this structure, using either approach below:

  • retrieve the PubChem Substance records that have the "same structure" and view the synonyms that individual depositors used for the chemical structure.

    OR

  • view the complete "unfiltered" list of synonyms found in all of the PubChem Substance records that have the "same structure" by inserting the CID of interest into the following URL format:

    http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?q=nama&cid=_____&namedisopt=Unfiltered

    For example:

    http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?q=nama&cid=444098&namedisopt=Unfiltered
Note that some of the synonyms in the "unfiltered" list have also been used by various depositors for *other* chemical structures in the PubChem Substance database. That is why those synonyms are removed from the "filtered" list that appears in a PubChem Compound record.

The molecule name shown at the top of a PubChem summary page is selected from the list of synonyms in the record. The molecule name at the top of a PubChem Substance summary page is generally the synonym that was listed first by the data depositor in their submitted record. The molecule name at the top of a PubChem Compound summary page is generally the highest scoring term from the filtered list of depositor-supplied synonyms. A synonym that matches a MeSH term, however, is given priority in both cases.


  • Compute chemical and physical properties back to top

    The following properties are computed for the 2D chemical structure and the 3D conformer of each unique, standardized chemical structure in the PubChem Compound database:

    2D compound properties:

    • Molecular Weight
    • Molecular Formula
    • XLogP
    • H-Bond Donor
    • H-Bond Acceptor
    • Rotatable Bond Count
    • Exact Mass
    • MonoIsotopic Mass
    • Topological Polar Surface Area
    • Heavy Atom Count
    • Formal Charge
    • Complexity
    • Isotope Atom Count
    • Defined Atom Stereocenter Count
    • Undefined Atom Stereocenter Count
    • Defined Bond Stereocenter Count
    • Undefined Bond Stereocenter Count
    • Covalently-Bonded Unit Count
    • Feature 3D Acceptor Count

    3D conformer properties:

    • Feature 3D Acceptor Count
    • Feature 3D Anion Count
    • Feature 3D Ring Count
    • Effective Rotor Count
    • Conformer Sampling RMSD
    • CID Conformer Count

    Properties from HSDB:

    If the compound is represented in the Hazardous Substance Databank (HSDB), properties that have been noted in the corresponding HSDB record are also annotated on the PubChem Compound record, such as:


    • Color/Form
    • Odor
    • Melting Point
    • Density/Specific Gravity
    • Dissociation Constants
    • Octanol/Water Partition Coefficient
    • Solubilities
    • Spectral Properties
    • Vapor Pressure
    • Other Chemical/Physical Properties

    The section of this document on biomedical annotations describes the method by which a PubChem Compound record is associated with an HSDB record.

  • Identify Similar molecules (structure clustering for compounds/substances) back to top

    After the chemical structures are standardized, and after 3D conformers are generated for each molecule, the PubChem data processing procedure identifies similar molecules in two different ways:


    1. chemical analogs ("similar compounds"), using the Tanimoto score calculated from the 2D structure fingerprint

    2. 3D similarity ("similar conformers"), based on shape/feature and pharmacophore complementariness.

    You can use the "similar compounds" and "similar conformers" links on a PubChem summary page to retrieve molecules that have 2D or 3D similarity, respectively, to a compound of interest. (A separate part of this document provides a tip on how to find compounds that are similar in 3D but not 2D structure to a compound of interest.)

    More details about the methods used to calculate each type of similarity are below:


    1. Chemical analogs:  "Similar Compounds" back to top

      After chemical structures are validated and standardized for the PubChem Compound database, molecules that have a similar chemical structure are identified using the method described below. The "similar compounds" link on a PubChem Compound summary page will retrieve all compounds that have a similarity score [Tanimoto] >90%. If you want to find compounds with different scores, you can visit the PubChem structure search page.


        Similarity links are pre-computed in PubChem using a dictionary-based fingerprint at 90% using the Tanimoto score equation:

      Tanimoto = AB / ( A + B - AB )

      Where:

      Tanimoto is the Tanimoto score, a fraction between 0 and 1.
      AB is the count of bits set after bit-wise & of fingerprints A and B
      A is the count of bits set in fingerprint A
      B is the count of bits set in fingerprint B


      Each similarity link is equivalent to a chemical structure similarity search of the PubChem Compound database yielding all chemical structures with a Tanimoto score that is 90% or above.

      In addition to the Tanimoto equation above, PubChem uses a "boost" scheme that assigns a similarity score of:

      104% to structures with identical stereo, isotope, and connectivity.
      103% to structures with identical connectivity and either stereo or isotope.
      102% to structures with identical connectivity.
      101% to structures that are tautomers of the query.


      The cases of "boosted" scores greater than 101% correspond to cases that originally would have had a score of 100% similarity. However, in the case where tautomers get an artificial score of 101%, their natural score could be much lower, sometimes as low as 60%, especially for small compounds where the tautomeric system is a large part of the structure.

      There are 881 substructure-keys (skeys) in each fingerprint. Each bit in the fingerprint represents the presence (or absence) of a particular chemical substructure (e.g., a carboxylic acid) or a particular count of the same. These skeys are similar in nature to the well-known MDL MACCS skeys fingerprints.

       


    2. 3D similarity by shape/feature and pharmacophore complementariness:   "Similar Conformers" back to top

      After 3D conformers are generated for the molecules in the PubChem Compound database, representative conformers are compared to each other in order to identify molecules that have a similar 3D shape ("similar conformers"), regardless of whether their 2D chemical structures are similar.

      The method for identifying similar conformers is described in:

      Bolton EE, Kim S, Bryant SH. PubChem3D: Similar conformers. J Cheminform. 2011 May 9;3(1):13. doi:10.1186/1758-2946-3-13. [PubMed PMID: 21554721] [Free Full Text on J Cheminform]

      and

      Kim S, Bolton EE, Bryant SH. PubChem3D: Biologically relevant 3-D similarity. J Cheminform. 2011 Jul 22;3(1):26. doi:10.1186/1758-2946-3-26. [PubMed PMID: 21781288] [Free Full Text on J Cheminform]

      Both articles are Part of the PubChem3D thematic series with the (BMC) Journal of Cheminformatics: http://www.jcheminf.com/series/pubchem3d.
      As explained by Kim et al.:

      • PubChem3D uses two 3-D similarity measures: shape-Tanimoto (ST) and color-Tanimoto (CT).
        • The ST score is a measure of shape similarity.
        • the CT score quantifies the similarity of 3-D orientation of functional groups used to define pharmacophores (henceforth referred to simply as "features") between conformers by checking the overlap of fictitious "color" atoms used to represent the six functional group types: hydrogen-bond donors, hydrogen-bond acceptors, cation, anion, hydrophobes, and rings.
      • The PubChem "Similar Conformers" 3-D neighboring requires the STST-opt = 0.8 and CTST-opt = 0.5 for two molecules to become neighbors of each other.

      The "similar conformers" link on a PubChem Compound summary page will therefore retrieve all compounds that have 3D similarity, as identified using this approach. If you want to view the similar conformers in a particular sort order (e.g., by shape then feature similarity, or by feature similarity then shape), or if you want to see only a subset of similar conformers that have specific properties, you can visit the PubChem structure search page and use the "3D Conformer" tab.

      A separate section of this document provides a tip on how to find compounds that are similar in 3D but not 2D structure.


  • Associate biomedical annotations from various resources with a compound back to top

    Annotation sources:   MeSH | Other Classification Ontologies | DailyMed | HSDB | ChemIDplus | DrugBank | Structure | BioSystems | PubChem BioAssay

    A wide range of information may exist for a compound, in the literature and in external databases, beyond the information that has been provided in individual PubChem Substance records. To facilitate access to that information, the PubChem data processing procedures use the methods described below to associate chemical structures with external data sources, and to insert the information and links in the corresponding PubChem Compound record. The source of any information inserted in this way is noted in the lower right hand corner of the grey-bordered box that surrounds an information block. (Note that the PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies, including MeSH and other ontologies noted below).


    • Medical Subject Headings (MeSH) back to top

      The National Library of Medicine (NLM)'s Medical Subject Headings (MeSH) is a controlled vocabulary thesaurus of medical terms that is arranged in both an alphabetic and a hierarchical structure. It is used for indexing literature from thousands of the world's leading biomedical journals for the MEDLINE®/PubMED® database, and for cataloging medical books, documents, and audiovisual materials, in order to facilitate retrieval of medical information at various levels of specificity.

      Because MeSH terms provide a portal to a wealth of medical information about the compounds represented in PubChem, the following method is used to associate PubChem Compound records (or more technically, their CIDs) with corresponding MeSH terms:

      1. Start with the filtered list of synonyms in a PubChem Compound record:


        • Synonyms that have been used consistently and only for the given chemical structure and its isotopes, stereoisomers, and tautomers are retained.
        • Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list.

      2. Identify the subset of filtered synonyms that match a MeSH term.


      3. Identify the synonym (and therefore MeSH term) from that subset that has been most frequently used by depositors of PubChem Substance records containing the "same structure."


      Once a MeSH term is assigned to a PubChem Compound record (i.e., to a CID), the following additional information is linked to the PubChem Compound record:


      • all PubMed records tagged with that MeSH term are linked to the CID and accessible from the "Literature: NLM Curated PubMed Citations" section of the PubChem Compound record.

        Note: The "Literature: Depositor Provided PubMed Citations" section of the PubChem Compound record is a concatenated list of all PubMed records that have been cited by the depositors of all PubChem Substance records that contain the same chemical structure as the compound (including same connectivity, isotopes and stereochemistry).
        The "NLM Curated PubMed Citations" and "Depositor Provided PubMed Citations" might have some -- though not necessarily complete -- overlap. Each set of PubMed records might contain items that are not present in the other set.
      • Some MeSH headings have pharmacological actions associated with them. For example, the MeSH heading "aspirin" is associated with the MeSH term "cyclooxygenase inhibitors." If a MeSH term has been associated with a CID, and that MeSH term has one or more pharmacological actions associated with it, all of the associated pharmacological actions are inserted into the "Pharmacology" section of the PubChem Compound record.


      If a block of information on a PubChem Substance/Compound summary page was derived from MeSH, the information source (MeSH) is noted in the lower right hand corner of the grey-bordered box that surrounds it.



    • Other Classification Ontologies back to top

      Some depositors of PubChem Substance records, such as ChEBI and KEGG, maintain ontologies of terms that describe the substances they deposit into the PubChem. If substance records from those depositors are associated with corresponding PubChem Compound records (because they contain the same chemical structure, including same connectivity, isotopes, and stereochemistry), then the classification information from the substance records is added to the "Classification: Ontologies" section of the PubChem Compound record. The PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies.



    • DailyMed back to top

      The National Library of Medicine (NLM)'s DailyMed resource provides high quality information about marketed drugs, including FDA labels (package inserts).

      Medication information from DailyMed is displayed in a PubChem record by identifying connections between DailyMed -> MeSH -> PubChem Compound records. Specifically:

      1. If the drug name in a DailyMed record has an exact match to a MeSH term, or to any of the MeSH term's synonyms, then a connection is made between the DailyMed record and the MeSH term.

        If the drug name in a DailyMed record does not match any MeSH term, then each name in the drug's list of active ingredients is compared to MeSH. If an exact match is found, then a connection is made between the DailyMed record and the MeSH term.


      2. If that MeSH term has been annotated on any PubChem Compound records, then a connection is made between the PubChem Compound record and the DailyMed record.


      3. If one or more DailyMed records map to the same MeSH term, then links to all of those DailyMed records will appear in the "Use and Manufacturing: Medication Information" section of the PubChem Compound record(s) that have been annotated with that MeSH term.


      If a block of information on a PubChem Substance/Compound summary page was derived from DailyMed, the information source (DailyMed) is noted in the lower right hand corner of the grey-bordered box that surrounds it.


    • Hazardous Substance Data Bank (HSDB) back to top

      The National Library of Medicine (NLM)'s Hazardous Substances Data Bank (HSDB) is a comprehensive, peer-reviewed toxicology data for about 5,000 chemicals. Information from HSDB is displayed in a PubChem record if there is a match between the molecule name in the HSDB and PubChem record. The name matching is done using the following method:


      • The synonyms present in an HSDB record are filtered in a similar way to the synonyms shown in a PubChem Compound record:
        • Synonyms that have been used consistently and only for the given chemical structure and its isotopes, stereoisomers, and tautomers are retained.
        • Any synonym that has been used for two or more different chemical structures in the PubChem Substance database is filtered out of the list.

      • If there is a match between any filtered synonyms in the PubChem Compound and HSDB record:
        • A link between the records is created.
        • Information from the HSDB record is then displayed in the PubChem Compound record as biological annotations.
        • The annotations from HSDB can include, for example:
          • Methods of Manufacturing
          • Formulations/Preparations
          • Therapeutic Uses
          • Mechanism of Action
          • Toxicity Summary
          • Reactivities and Incompatibilities
          • Decomposition
          • Environmental Fate
          • Bioconcentration
          • OSHA Standards
          • Threshold Limit Values
          • and more...
            (See the section of this document on "categories of information, as available, for a molecule" to see additional types of information imported from HSDB into PubChem.)

      • If a block of information on a PubChem Substance/Compound summary page was derived from HSDB, the information source (HSDB) is noted in the lower right hand corner of the grey-bordered box that surrounds it.


    • ChemIDplus back to top

      The National Library of Medicine (NLM)'s Chemical Identification Plus Database (ChemIDplus) resource is an online dictionary of chemicals, including names, synonyms, and chemical structures.

      ChemIDplus is a depositor of records into the PubChem Substance database. If a substance record from ChemIDplus is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the safety and toxicology information from the ChemIDplus record is added to the PubChem Compound record.

      If a block of information on a PubChem Substance/Compound summary page was derived from ChemIDplus, the information source (ChemIDplus) is noted in the lower right hand corner of the grey-bordered box that surrounds it.



    • DrugBank back to top

      The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information.

      DrugBank is a depositor of records into the PubChem Substance database. If a substance record from DrugBank is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the interaction information from the DrugBank record is added to the "Biomolecular Interactions and Pathways" section of the PubChem Compound record.

      If a block of information on a PubChem Substance/Compound summary page was derived from DrugBank, the information source (DrugBank) is noted in the lower right hand corner of the grey-bordered box that surrounds it.



    • Structure back to top

      The NCBI's Structure database, also known as the Molecular Modeling Database (MMDB), is a depositor of records into the PubChem Substance database. It contains experimentally resolved 3D structures of proteins, RNA, and DNA, derived from the Protein Data Bank (PDB), with value-added features such as explicit chemical graphs, interactive views of the biomolecule's biologically active form ("biological unit"), interactions schematics that depict the contacts among the molecular components, as well as links to similar 3D structures, similar sequences, information about chemicals bound to the structures, literature, and more.

      Many of the experimentally resolved 3D structures include biomolecules such as protein, DNA, or RNA bound to small molecules. In such cases, the small molecule data are extracted from the 3D structure record and deposited into the PubChem Substance database, with a link back to the original structure record from which they came.

      If a substance record from the Structure database is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the "Biomolecular Interactions and Pathways" section of the PubChem Compound record will contain a thumbnail image and link to the original 3D structure record.

      If a block of information on a PubChem Substance/Compound summary page was derived from the Structure database, the information source (Structure) is noted in the lower right hand corner of the grey-bordered box that surrounds it.



    • BioSystems back to top

      A biosystem is a group of molecules that interact in a biological system, such as a pathway, complex, or disease. The NCBI's BioSystems database provides integrated access to biological systems and their component genes, proteins, and small molecules, as well as literature describing those biosystems and other related data throughout Entrez.

      If a small molecule component of a biosystem is also present in a PubChem Substance and/or PubChem Compound record, then a link is made between the PubChem record and the biosystem.

      Specifically, the BioSystems data processing procedure includes the following steps to identify associations between biosystems and PubChem records:


      • BioSystem records from source databases are parsed for small molecule identification numbers, including PubChem Compound IDs (CIDs), PubChem Substance IDs (SIDs), and external registry names such as local identifiers assigned to a substance by a the source database. The types of BioSystem<->PubChem links that are made depend upon the type of identifiers that were found:
        • If SIDs are present in the source record, links are established to the corresponding PubChem Substance records and to associated CIDs in PubChem Compound.
        • If CIDs are present in the source record, links to the corresponding PubChem Compound records are made (however, the links are not extended to associated PubChem Substances).
        • If external registry names are present, those identifiers are mapped to the corresponding SIDs and links are made to those records in PubChem Substance as well as to associated CIDs in PubChem Compound.

      If a block of information on a PubChem Substance/Compound summary page was derived from the BioSystems database, the information source (BioSystems) is noted in the lower right hand corner of the grey-bordered box that surrounds it.



    • PubChem BioAssay back to top

      The PubChem BioAssay database contains bioactivity screens of chemical substances described in PubChem Substance, providing information such as the following for a given chemical:


      • bioactivity outcomes (active, inactive, inconclusive, unspecified)
      • molecular targets (proteins and/or genes)
      • bioactivity data (IC50, EC50, Potency, Ki, etc.)

      PubChem BioAssay database also provides a description of each bioassay, which may include the experiment's rationale/purpose, the relationship between the assay target and a biological process or disease state, as well as assay protocols specific to that screening procedure. (As an example, see the description of bioassay AID 1575, "Summary assay for the identification of compounds that inhibit NOD1.") These descriptions are searchable directly in the BioAssay database.

      The "Biological Test Results" section of a PubChem Substance/Compound summary page contains an excerpt of the bioactivity data available for the chemical, and you can follow the link for "BioActivity Summary: This Compound" to see the rest of the data in the PubChem BioAssay database itself. (As an example of what you will see when you follow that link, see the PubChem BioAssay data summary for CID 3672, Ibuprofen.)

      The association between a PubChem Substance or Compound and BioAssay data are made in the following way:

      Depositors of bioassays submit their data to two PubChem databases: (1) they submit biological activity test results into the PubChem BioAssay database, and (2) they submit the descriptions of the substances that were tested into the PubChem Substance database. A direct link is then made between each BioAssay record and its corresponding Substance records.

      In addition, if a substance record is associated with a corresponding PubChem Compound record (because they contain the same chemical structure, including same connectivity, isotopes and stereochemistry), then the links to the BioAssay data are also added to the "Biological Test Results" section of the PubChem Compound record.

      Therefore:

      • A PubChem Substance record will link only to the bioassays that are associated directly with that particular substance (i.e., with that particular SID).


      • A PubChem Compound record will link to all of the bioassays that tested any PubChem Substance (i.e., any SID) containing the same chemical structure (including same connectivity, isotopes and stereochemistry) as the PubChem Compound.

      If a block of information on a PubChem Substance/Compound summary page was derived from the BioAssay database, the information source (BioAssay) is noted in the lower right hand corner of the grey-bordered box that surrounds it.

      Note: BioAssay data -- classification of gene/protein targets -- The PubChem Substance/Compound summary page displays an overview of bioassay information available. The details about individual experiments are available on the corresponding PubChem BioAssay summary pages. Those pages also display the Gene Ontology (GO) classification of the gene/protein target(s) that were tested by the bioassay. The GO terms are associated with each gene/protein in an automated way as part of the NCBI BioSystems database data processing procedures, using the method described in the Biosystems help document. All GO terms that apply to the gene(s)/protein(s) tested by the bioassay are shown in the GO hierarchy, including: (1) biological processes, (2) cellular components, and (3) molecular functions. Clicking on any GO term in the hierarchy will retrieve all bioassays that have tested a protein(s) associated with that term. As an example, see the GO terms for the protein target that was tested by the glucocorticoid receptor (GR) redistribution assay (AID 450).

      The PubChem Classification Browser can be used to browse/retrieve substances and compounds using a variety of classification ontologies, including GO.


Reporting errors back to top

in substance records | in compound records
If you notice an error in a PubChem record, we appreciate your feedback. The place to which you can send a report depends upon whether you are viewing a substance or compound:

Substance records: PubChem doesn't have curators and never changes/edits substance records. They remain as supplied by our depositors, just as with GenBank records. Therefore, please send error reports directly to the depositor. Contact information for each depositor is accessible from the PubChem Substance Data Source page. Once the error is corrected by the depositor, PubChem will implement it at next update.

Compound records: If you notice any errors in PubChem compound records, such as in properties or descriptors, please send a report to the NCBI help desk: info@ncbi.nlm.nih.gov. The PubChem staff will then look into the error and make a correction at next update.

 
References back to top


Citing PubChem Resources: back to top

Please refer to the PubChem Publications page if you are referencing the overall PubChem Substance, Compound, or BioAssay database, or the various PubChem tools. That page lists recommended citations as well as additional articles that have been written about the PubChem resources and how they can be used.

PubChem Data Usage and Citation Guidelines: back to top

Please see the PubChem Data Usage and Citation Guidelines page for information about how to cite individual or multiple records from a PubChem database.


 Revised 21 March 2014