PubChem Classification Browser Help
 
 
 
BRIEF TABLE OF CONTENTS
 
  What is the Classification Browser?
Overview
Illustrated examples of use
Input options
Choose a classification
Use the desired mode
Browse
Search
by keyword
by PubChem unique identifier
Select display settings
Output
Tree View
List View
Nodes
Saving search results
Classification Widget
Data Processing
method for applying classifications to PubChem
Substance and Compound records
BioAssay records
Log of changes to Classification Browser
References
 
 
 


ILLUSTRATED EXAMPLES OF USE
 
Thumbnail image showing a TREE VIEW of search results for PubChem Compounds that have been annotated with the MeSH term Methotrexate.  Click on the image to view illustrated examples of how the Classification Browser can be used.
 
 


 
What is the PubChem Classification Browser? back to top

Overview back to top

The PubChem Classification Browser allows you to browse the distribution of PubChem data among nodes in the hierarchy of interest, thereby providing an aggregate view of PubChem data. It also allows you to search for PubChem records annotated with the desired hierarchy/term, providing a powerful way to quickly find the subset of PubChem records.

Note that the Browser operates only on the subset of PubChem records that have been annotated with terms from the available hierarchies, and not all PubChem records. The methods by which records are annotated with terms from the available hierarchies are described in a subsequent section of this document.

Examples of how the PubChem Classification Browser can be used back to top

  1. Find PubChem Compounds classified as "Methotrexate"
  2. Find PubChem BioAssays that tested protein targets involved in "DNA repair"

1.  Find PubChem Compounds classified as "Methotrexate" *
PubChem Classification Browser TREE VIEW of search results for PubChem Compounds that have been annotated with the MeSH term Methotrexate. PubChem Classification Browser LIST VIEW of search results for PubChem Compounds that have been annotated with the MeSH term Methotrexate.
* This illustration shows the numbers of PubChem Compounds that have been annotated with the MeSH term "Methotrexate" as of October 2, 2012. The numbers of records retrieved by a live search will be different as new data are added to PubChem. The number of records retrieved by the term "methotrexate" may also vary from those shown here if a different classification system is searched, because the methods used to associate a particular classification term and PubChem record depend on the classification system and the PubChem data type. The data processing section of this document provides additional details. The Tree View and List View are also discussed in more detail in a separate section of this document.


2.  Find PubChem BioAssays that tested protein targets involved in "DNA repair"
PubChem Classification Browser TREE VIEW of search results for PubChem BioAssays whose protein targets have been annotated with the Gene Ontology term DNA repair. PubChem Classification Browser LIST VIEW of search results for PubChem BioAssays whose protein targets have been annotated with the Gene Ontology term DNA repair.
This illustration shows the numbers of PubChem BioAssay records whose protein targets have been annotated with the Gene Ontology (GO) term "DNA repair" as of October 2, 2012. The numbers of records retrieved by a live search will be different as new data are added to PubChem. The data processing section of this document describes the method by which BioAssay records are associated with terms from the Gene Ontology classification system.

 
Input Options back to top


Choose a classification to explore back to top

The classification hierarchies that are currently available in the Classification Browser are listed below. When the Browser firsts loads, the MeSH tree is displayed by default. You can select a different classification at any time by using the "Select classification" menu.

The "Select classification" menu lists all of the available hierarchies, and the "Data type counts to display" menu changes based on which hierarchy is selected. For example, if the MeSH classification is selected, then the counts to display menu only presents the choices of "None", "Compound", and "Substance." If the Gene Ontology classification is selected, then the counts to display menu only presents the choices of "None" and "Assay." The table below summarizes the available classifications and the PubChem data types to which each classification has been applied.

Note: If you choose to search by PubChem unique identifier, the input type you select (CID or AID) will determine what choices appear in the "Select classification" menu. Because BioAssay data have been annotated with Gene Ontology (GO) terms, only that classification will appear in the menu if you choose to search by AID. Conversely, the other classifications, but not GO, will appear if you choose to search by CID.

Available classifications (in the "Select Classification" menu) and the types of PubChem data they can be used to retrieve:

MeSH, ChEBI, Gene Ontology (GO), KEGG, LIPID MAPS, WHO ATC Code, WIPO (International Patent Classification (IPC))

Classification System: Description Can be used to retrieve PubChem:
MeSH The National Library of Medicine (NLM)'s Medical Subject Headings (MeSH) is a controlled vocabulary thesaurus of medical terms that is arranged in both an alphabetic and a hierarchical structure.

MeSH is used for indexing literature from thousands of the world's leading biomedical journals for the MEDLINE®/PubMED® database, and for cataloging medical books, documents, and audiovisual materials, in order to facilitate retrieval of medical information at various levels of specificity.

The MeSH classification tree includes a branch of terms for Chemicals and Drugs. PubChem uses an automated name matching procedure to apply MeSH terms to chemical structures, enabling retrieval of substances and compounds through the Classification Browser.

The MeSH tree, as shown in the Classification Browser, also includes terms from the MeSH Supplementary Concept Records (SCR). The SCR is a separate thesaurus that contains approximately 200,000 chemical names in addition to those already in MeSH. Terms from the SCR are grouped under nodes named "Supplementary Records" in the Classification Browser. This enhancement greatly increased the number of PubChem records that can be browsed/retrieved with the MeSH Classification; for example, the number of PubChem Compounds linked to MeSH increased from 12,983 to 82,812 (as of Oct. 19, 2012).

Substances Compounds back to top 
ChEBI The European Bioinformatics Institute (EBI)'s Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on "small" chemical compounds.

ChEBI is a depositor of small molecules into the PubChem Substance database. It maintains an ontology of terms used to describe the small molecules, and includes relevant terms in their deposited substance records. ChEBI terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the ChEBI substances. This enables retrieval of substances and compounds through the Classification Browser using the ChEBI ontology.
Substances Compounds back to top 
Gene Ontology (GO) Gene Ontology (GO) project is an initiative to standardize the representation of gene and gene product attributes across species and databases and provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data.

GO is a depositor of data into the NCBI BioSystems database. It maintains an ontology of terms that includes three branches: biological processes, cellular components, and molecular functions. Terms from each of those branches are associated with genes and proteins through the method described in the data processing section of the BioSystems help document.

If any of those genes or proteins tagged with GO terms are targets of a biological test in the PubChem BioAssay database, the bioassay record is also tagged with the GO term(s). This enables retrieval of bioassays through the Classification Browser using Gene Ontology.

In addition, all substances (SIDs) and compounds (CIDs) found to be active in the assay (AID) are also tagged with the GO term(s) that were applied the assay.

Substances Compounds BioAssays back to top
KEGG The Kyoto Encyclopedia of Genes and Genomes (KEGG), by the Kanehisa Laboratory of the Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan, is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information.

KEGG is a depositor of small molecules into the PubChem Substance database. It maintains an ontology of terms used to describe the small molecules, and includes relevant terms in their deposited substance records. KEGG terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the KEGG substances. This enables retrieval of substances and compounds through the Classification Browser using the KEGG ontology.

Substances Compounds back to top 
LIPID MAPS LIPID Metabolites And Pathways Strategy (LIPID MAPS) is a multi-institutional effort to identify and quantitate, using a systems biology approach and sophisticated mass spectrometers, the lipid species in mammalian cells, and to quantitate the changes in these species in response to perturbation.

LIPID MAPS is a depositor of small molecules into the PubChem Substance database. It maintains an ontology of terms used to describe the small molecules, and includes relevant terms in their deposited substance records. LIPID MAPS terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the LIPID MAPS substances. This enables retrieval of substances and compounds through the Classification Browser using the LIPID MAPS ontology.

Substances Compounds back to top 
WHO ATC Code The World Health Organization (WHO) Anatomical Therapeutic Chemical (ATC) classification system divides active substances into different groups according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties.

Drugs are classified in groups at five different levels. The 1st level divides drugs into fourteen main groups; the 2nd level places them into pharmacological/therapeutic subgroups (2nd level). The 3rd and 4th levels are chemical/pharmacological/therapeutic subgroups and the 5th level is the chemical substance. The 2nd, 3rd and 4th levels are often used to identify pharmacological subgroups when that is considered more appropriate than therapeutic or chemical subgroups. Substances classified in the same ATC 4th level cannot be considered pharmacotherapeutically equivalent since their mode of action, therapeutic effect, drug interactions and adverse drug reaction profile may differ.

Medicinal products are classified according to the main therapeutic use of the main active ingredient, on the basic principle of only one ATC code for each route of administration (i.e. pharmaceutical forms with similar ingredients and strength will have the same ATC code). Immediate and slow release tablets will normally have the same ATC code. A medicinal product can be given more than one ATC code if it is available in two or more strengths or routes of administration with clearly different therapeutic uses. Normally, different stereoisomeric forms will have separate ATC codes. Prodrugs are usually assigned separate ATC codes if the dosages used are different and/or the nonproprietary name of the prodrug and the active drugs are different.

Substances Compounds back to top 
WIPO The World Intellectual Property Organization (WIPO) International Patent Classification (IPC), established by the Strasbourg Agreement 1971, provides for a hierarchical system of language independent symbols for the classification of patents and utility models according to the different areas of technology to which they pertain.

PubChem uses the IPC classifications available from the EPO (European Patent Office).

The IPC divides technology into eight sections with approximately 70,000 subdivisions. Each subdivision has a symbol consisting of Arabic numerals and letters of the Latin alphabet. The appropriate IPC symbols are indicated on each patent document, of which more than 1,000,000 were issued each year in the last 10 years. The IPC symbols are allotted by the national or regional industrial property office that publishes the patent document. For PCT documents, IPC symbols are allotted by the International Searching Authority (ISA). The Classification is indispensable for the retrieval of patent documents in the search for "prior art." Such retrieval is needed by patent-issuing authorities, potential inventors, research and development units, and others concerned with the application or development of technology.

The Substance-Patent associations are made by depositors.WIPO terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the KEGG substances. This enables retrieval of substances and compounds through the Classification Browser using the WIPO classification.

Substances Compounds Patents back to top

* The data processing section of this document describes the method by which terms from each classification hierarchy are associated with the PubChem data types noted above.


Use the desired mode back to top

  • Browse -- Simply select the desired hierarchy from the pull-down menu to browse through its terms and/or view distribution of PubChem data among its nodes.

    • Some classifications have subcategories. For example, Gene Ontology is subdivided into 3 subcategories: biological process, cellular component, and molecular function. If subcategories are available, then a second dropdown menu appears allowing users to choose a subcategory of interest. The Classification Browser displays a single subcategory at a time.

    • If desired, use the display settings to show the counts of PubChem records associated with the hierarchy. You can also hide zero count nodes in order to view only the subset of nodes that contain your search term AND have links to PubChem data.

    • Note that PubChem record counts reflect the subset of PubChem records that have been explicitly annotated with terms from the selected hierarchy, and not all PubChem records.

    • The methods by which classification terms are annotated on PubChem records are described in the data processing section of this document.

  • Search -- As an alternative to browsing an entire classification tree, you can search for a specific keyword to find nodes in a classification that contain the keyword in their node name and/or node description, or search for a specific or PubChem unique identifier (UID) to view the classification terms that have been annotated on a chemical structure (CID) or bioassay (AID).

    • Keyword - Enter a term of interest in the search box to find the nodes in the selected hierarchy that contain your search string in the node name and/or node description. If desired, you can display the counts of PubChem records associated with those nodes and click on the resulting counts to retrieve the records.

      • Your search string can appear in the node name or in the node description.
        • For example, a search of Gene Ontology for "DNA repair" (without the quotes) will find entries such as "DNA repair" (GO:0006281) and "regulation of DNA repair" (GO:0006282), and more. A search of MeSH for "DNA repair" (without the quotes) will retrieve the exact term "DNA repair," and it will also retrieve terms such as "DNA Polymerase beta," which contains the search string in its description. Choose to display counts for BioAssays in order to see the number of PubChem BioAssay records associated with the terms retrieved.

      • Some classifications have subcategories. For example, Gene Ontology is subdivided into 3 subcategories: biological process, cellular component, and molecular function. If subcategories are available, then a second dropdown menu appears allowing users to choose a subcategory of interest. Note that a search is executed only within the currently selected subcategory. To see if your search term exists in other subcategories of the hierarchy, do a separate search in each one.

      • By default, the results of a search are shown in "List" view rather than "Tree" view. This displays a non-redundant list of terms that contain your search string, even if a term appears in multiple branches of the selected hierarchy. If a term appears in multiple branches, click on the [+] to view all of the classifications of that term within the selected hierarchy.

      • If you toggle to "Tree" view using the display control at the top of the page, all nodes that contain your search string (as all or part of the node name, or as part of the node's description), will be underlined. A given term may appear multiple times in the "Tree" view, if that term appears in multiple branches of the selected hierarchy.

      • If desired, use the display settings to show the counts of PubChem records associated with the hierarchy. You can also hide zero count nodes in order to view only the subset of nodes that contain your search term AND have links to PubChem data.

    • PubChem unique identifier - Enter a compound identifier (CID), substance identifier (SID), or bioassay identifier (AID) to view the classification terms that have been annotated the chemical structure or bioassay, respectively. Note that the input type you select (CID, SID, or AID) will determine what choices appear in the "Select classification" menu; as noted in the table listing "available classifications and the types of PubChem data they can be used to retrieve," most of the classification systems have been applied to PubChem compounds and substances, and one of the classifications (GO) has been applied only to BioAssay data.

      • CID - Use this option to enter a PubChem compound identifier (CID) of interest and view the classification terms with which it has been associated. The "Select Classification" menu will be set to "MeSH" by default, and you can change to another classification system at any time. The data processing section of this document describes the method by which classification terms have been applied to PubChem compounds.

      • SID - Use this option to enter a PubChem substance identifier (SID) of interest and view the classification terms with which it has been associated. The "Select Classification" menu will be set to "MeSH" by default, and you can change to another classification system at any time. The data processing section of this document describes the method by which classification terms have been applied to PubChem substances.

      • AID - Use this option to enter a PubChem assay identifier (AID) of interest and view the GO classification terms that have been applied to the gene/protein target tested by the bioassay. The data processing section of this document describes the method by which classification terms have been applied to PubChem bioassays.


Select desired display settings back to top

Whether you are browsing a classification hierarchy to see the distribution of PubChem data among its nodes, or searching to find nodes that contain your term(s) of interest in the node name or description, you can customize the display in the following ways:
  • Choose which PubChem record counts to display using the "Data type counts to display" menu. The options include:

    • None - displays all of the terms present in the classification hierarchy, whether or not the terms are associated with any PubChem data. This option appears regardless of which hierarchy you have selected.

    • Substances - displays the number of PubChem Substance records that have been annotated with terms from the classification hierarchy you have selected.

    • Compounds - displays the number of PubChem Compound records that have been annotated with terms from the hierarchy you have selected.

    • BioAssays - displays the number of PubChem BioAssay records whose protein targets have been annotated with terms from the hierarchy you have selected.

    Note: The "counts" options that are available will depend on which classification hierarchy you have selected. For example, the option to see counts for Substances and Compounds will appear if you are viewing the MeSH, ChEBI, KEGG, or LIPIDMAPS hierarchies. The option to see counts for BioAssays will appear if you are viewing the Gene Ontology (GO) hierarchy. The associations between the terms and PubChem records have been made using the methods described in the data processing section of this document.



  • Choose which nodes to display in the seleted hierarchy -- If you choose to display PubChem record counts for Substances, Compounds, or BioAssays, then another menu, "Display zero count nodes?" also appears, allowing you to indicate which nodes should be displayed in the hierarchy:

    • display all nodes - The "Display zero count nodes: YES" setting displays all nodes in the classification hierarchy you have selected, regardless of whether a given node is associated with PubChem data.

    • hide zero count nodes - The "Display zero count nodes: NO" setting displays only the nodes that have links to the data type (substances, compounds, or bioassays) that you selected in "Data Type Counts to Display," and hides nodes that are not associated with the PubChem data type of interest.

    (The "Display zero count nodes?" menu will not be shown if the setting of "Data Type Counts to Display: None" is in effect.)

 
Output back to top


Tree View back to top

The "Tree" view is available whether you are just browsing through a hierarchy, or whether you have searched for specific terms within the hierarchy. PubChem Classification Browser TREE VIEW of search results for PubChem Compounds that have been annotated with the MeSH term Methotrexate.
  • Hierarchical view of terms - displays the hierarchical order of terms within each branch of the classification. Some terms appear in multiple branches of a hierarchy, and the "Tree" view will show each instance of that term in the hierarchy's branches.

    • For example, the term "Methotrexate" appears in two different branches of the MeSH tree: "Biological Factors" and "Heterocyclic Compounds," and will therefore appear twice in the "Tree" view of that hierarchy. (In contrast, the "List" view shows each term only once, but enables you to view the multiple classifications of that term within the hierarchy, if desired.)

  • Underlined nodes contain your search term - If you search for a term within the hierarchy and chose to display the results in "Tree" view, any nodes that contain your search string (in the node name or node description) will be underlined.

    • For example, search for the term "immunosuppressant" in the MeSH hierarchy. By default, results will be shown in "List" view. Change to "Tree" view. Any terms that are underlined contain the string "immunosuppressant" in the node name or node description. (Mouse over the "?" icon, if/as available, to view the node description.)

  • Expand/contract nodes - Click on the triangle that appears to the left of a term to expand/contract its node.


  • Display zero count nodes? - If you choose to display PubChem record counts, , another menu, "Display zero count nodes?" appears, allowing you to indicate which nodes to display in the hierarchy (all nodes, or hide zero count nodes). Select "Display zero count nodes?: NO" in order to view only the nodes that contain links to PubChem data type of interest.

    • For example, search for the term "immunosuppressant" in the MeSH hierarchy, display the results in "Tree" view, and choose to see counts for PubChem Compounds. Then click "Display zero count nodes?: NO" in order to see only the nodes that contain links to records in the PubChem Compound database.

List View back to top

The "List" view option only appears if you search for specific terms within the hierarchy, and is the default view for displaying the search results. PubChem Classification Browser LIST VIEW of search results for PubChem Compounds that have been annotated with the MeSH term Methotrexate.
  • Non-redundant list of terms - Each term from the hierarchy that contains your search string (either in the node name or the node description) will be listed only once, regardless of how many times that term appears in the tree structure of the hierarchy.

    • For example, search for "Methotrexate" in the MeSH hierarchy:

      • Each MeSH term that contains your search string will appear only once in the "List" view (default) of search results, even if that term appears in multiple branches of the MeSH tree. The sample search for Methotrexate finds two MeSH terms: "Methotrexate" and "Tetrahydrofolate Dehydrogenase". The MeSH term "Methotrexate" appears only once in the "List" view of search results, even though it is present in two branches, "Biological Factors" and "Heterocyclic Compounds."

        (In contrast, the "Tree" view of search results may show a given term more than once, if that term appears in multiple branches of the selected hierarchy.)

      • In either view, you can display counts for PubChem Substance or Compounds to retrieve the corresponding chemical structures.

  • Classification - The "List" view of search results shows the classification of each term that contains your search string. If a term appears in multiple branches of the selected hierarchy, only the first classification will be shown by default.

    Click on the plus [+] to see all classifications of the term within the selected hierarchy.
    Click on any term within a classification to open a pop-up window that displays the node in "Tree" view.

    • For example, search for "Methotrexate" in the MeSH hierarchy.

      • The plus [+] beside "Classification" for "Methotrexate" indicates that the term appears in multiple branches of the hierarchy. Click on the [+] to see all classifications of the term, in this case:
        MeSH Tree > Biological Factors > Pigments, Biological > Pterins > Aminopterin > Methotrexate
        MeSH Tree > Heterocyclic Compounds > Heterocyclic Compounds, 2-Ring > Pteridines > Pterins > Aminopterin > Methotrexate

      • In contrast, the other MeSH term ("Tetrahydrofolate Dehydrogenase") that was retrieved by the search for appears only once in the MeSH tree. It therefore does not show a plus [+] beside "Classification," and instead shows a single lineage:
        MeSH Tree > Enzymes and Coenzymes > Enzymes > Oxidoreductases > Oxidoreductases Acting on CH-NH Group Donors > Tetrahydrofolate Dehydrogenase

      • In any of the classifications, click on a term of interest to open a pop-up window that displays the node in "Tree" view.

  • Display zero count nodes? - If you choose to display counts of PubChem records, you can use the option to "Display zero count nodes: NO" in order to view only the nodes that contain links to PubChem data type of interest.

    • For example, search for the term "Immunosuppressant" in the MeSH hierarchy, display the results in default "List" view, and choose to see counts for PubChem Compounds. Then select the option, "display zero count nodes: NO," in order to see only the nodes that contain links to records in the PubChem Compound database.

Nodes back to top

  • Actions/icons available for a node (question mark, arrow, count) - Whether you are displaying a selected hierarchy in "Tree" view or "List" view, several icons may appear beside terms in the hierarchy:

    • Question Mark - appears only if a description is available for a given term. The description is displayed in a pop-up that appears when you mouse over the icon, and is derived from the source hierarchy.

    • Arrow - opens a page about the term on the web site of the source hierarchy.

    • Count - a box showing the number of PubChem records that are linked to a term appears only if: (a) you have chosen to display counts for Substance, Compound, or BioAssay records, and (b) term has been associated with the PubChem data using the methods described in the data processing section of this document. Choose the option to "hide zero count nodes" if you want to view only the terms that have links to PubChem data,


  • Display PubChem record counts for desired data type: Once you are viewing the desired classification hierarchy, you can select or change options at any time in the "Data type counts to display" menu. The options include:

    • None - displays all of the terms present in the classification hierarchy, whether or not the terms are associated with any PubChem data. This option appears regardless of which hierarchy you have selected.

    • Substances - displays the number of PubChem Substance records that have been annotated with terms from the classification hierarchy you have selected.

    • Compounds - displays the number of PubChem Compound records that have been annotated with terms from the hierarchy you have selected.

    • BioAssays - displays the number of PubChem BioAssay records whose protein targets have been annotated with terms from the hierarchy you have selected.

    Note: The "counts" options that are available will depend on which classification hierarchy you have selected. For example, the option to see counts for Substances and Compounds will appear if you are viewing the MeSH, ChEBI, KEGG, or LIPIDMAPS hierarchies. The option to see counts for BioAssays will appear if you are viewing the Gene Ontology (GO) hierarchy. The associations between the terms and PubChem records have been made using the methods described in the data processing section of this document.



  • Show/hide zero count nodes - If you choose to display PubChem record counts for Substances, Compounds, or BioAssays, then another menu, "Display zero count nodes?" also appears, allowing you to indicate which nodes should be displayed in the hierarchy:

    • display all nodes - The "Display zero count nodes: YES" setting displays all nodes in the classification hierarchy you have selected, regardless of whether a given node is associated with PubChem data.

    • hide zero count nodes - The "Display zero count nodes: NO" setting displays only the nodes that have links to the data type (substances, compounds, or bioassays) that you selected in "Data Type Counts to Display," and hides nodes that are not associated with the PubChem data type of interest.

    (The "Display zero count nodes?" menu will not be shown if the setting of "Data Type Counts to Display: None" is in effect.)

Saving search results back to top

  • Save PubChem records for a given node - To save the PubChem Substance, Compound, or BioAssay records associated with a node of interest, choose the display setting that shows the counts for the record type of interest. Click on the box that shows the PubChem record count for a node of interest. That will retrieve the data in the Entrez search system, where a you can choose to save the records in the desired format. The Entrez Help document provides additional detail about using that system, including displaying and saving a set of records.

Embed a classification widget in your own web page back to top

  • As an alternative to using the Classification Browser directly on the PubChem web site, you can use thePubChem Widget to display the classification for a PubChem Compound/Substance/BioAssay of interest in your own web page.

 
Data Processing back to top


By what method are the classifications applied to PubChem records? back to top

The PubChem data processing procedures use the methods described below to associate PubChem records with terms from various classifications. This enables hierarchical organization of the data, and facilitates retrieval of chemical structures and bioactivity data that have been tagged with a given classification term.

The method by which a term is applied to a PubChem record depends upon the type of PubChem record (e.g., Substance/Compound record or BioAssay record) and the classification hierarchy in question:

Substance and Compound Records: back to top
  • Association of Medical Subject Headings (MeSH) with compounds & substances:

    The National Library of Medicine (NLM)'s Medical Subject Headings (MeSH) is a controlled vocabulary thesaurus of medical terms that is arranged in both an alphabetic and a hierarchical structure. It is used for indexing literature from thousands of the world's leading biomedical journals for the MEDLINE®/PubMED® database, and for cataloging medical books, documents, and audiovisual materials, in order to facilitate retrieval of medical information at various levels of specificity.

    The MeSH classification tree includes a branch of terms for Chemicals and Drugs. The PubChem data processing procedure looks for matches between chemical names in the "Chemicals and Drugs" branch of the MeSH tree and the chemical names/synonyms listed in a Compound record. If a match is found, the MeSH term is applied to the PubChem Compound record, and to all of the PubChem Substance records that have the same chemical structure (including connectivity, isotopes and stereochemistry) as the PubChem Compound. In addition, a link is made between the PubChem Compound record and the PubMed records that have been tagged with that MeSH term, thereby providing a portal from the chemical structure to the medical literature.

    For additional information, see the PubChem Substance/Compound Summary Page Help document, which describes the details of the name-matching method that is used to associate PubChem Compound records (or more technically, their CIDs) with corresponding MeSH terms.

  • Association of Other Classification Ontologies with compounds & substances:

    Some depositors of PubChem Substance records, such as ChEBI, KEGG, LIPID MAPS maintain ontologies of terms that describe the substances they deposit into the PubChem. Substance records from those depositors display applicable terms from their ontologies in the "Classification:Ontologies" section of the PubChem Substance summary page.

    Some depositors of PubChem Substance records associate their substance records with patents. In that case, applicable terms from the World Intellectual Property Organization (WIPO) International Patent Classification (IPC) will appear in the "Classification:Ontologies" section of the PubChem Substance summary page.

    If any of those substance records are in turn associated with corresponding PubChem Compound records (because they contain the same chemical structure, including same connectivity, isotopes, and stereochemistry), then the classification information from the substance records is also added to the "Classification: Ontologies" section of the corresponding PubChem Compound record.

  • Because these associations have been made, the PubChem Classification Browser can be used to browse/retrieve substances and compounds in a hierarchical manner, using a variety of classification ontologies.

BioAssay records: back to top
  • Association of Gene Ontology (GO) terms with gene/protein targets of PubChem BioAssays:

    The BioAssay database contains the results of tests measuring the effect of various small molecules on gene/protein target(s). When possible, the gene/protein target(s) are associated with corresponding terms from the Gene Ontology (GO) project. This is done in an automated way as part of the NCBI BioSystems database data processing procedures, using the method described in the Biosystems help document.

    As a result of these associations, the PubChem BioAssay summary pages display the Gene Ontology (GO) classification of the gene/protein target(s) that were tested by the bioassay.

    In addition, the PubChem Classification Browser can be used to browse/retrieve bioassays whose gene/protein targets have been annotated with the GO term(s) of interest.

 
Log of Changes to the Classification Browser back to top
04 APR 2013 The WIPO Classification @@@@ is now available. World Intellectual Property Organization (WIPO) International Patent Classification (IPC) has been added to the list of available classifications in the PubChem Classification Browser. The WIPO classification can be used to browse/retrieve Compounds, Substances, or Patents in PubChem.
08 APR 2012 A PubChem Classification Widget is now available. It enables you to displays the classification for a PubChem Compound/Substance/BioAssay of interest, in your own web page.
10 JAN 2013 It is now possible to search by PubChem substance identifier (SID) in the Classification Browser, enabling you to view the classification terms that have been annotated on the deposited chemical structure of interest.
20 DEC 2012 It is now possible to search by PubChem unique identifier (compound identifier (CID) or assay identifier (AID)) in the Classification Browser, enabling you to view the classification terms that have been annotated on the chemical structure or bioassay of interest.
19 OCT 2012 The MeSH classification was enhanced to include terms from the MeSH Supplementary Concept Records (SCR), which include chemical names in addition to those already in MeSH terms. The Classification Browser displays terms from the SCR under nodes named "Supplementary Records." This enhancement greatly increased the number of PubChem records that can be browsed/retrieved with the MeSH Classification; for example, the number of PubChem Compounds linked to MeSH increased from 12,983 to 82,812.
05 OCT 2012 Initial release of Classification Browser, beta version.
 
References back to top


Citing PubChem Resources: back to top

Please refer to the PubChem Publications page if you are referencing the overall PubChem Substance, Compound, or BioAssay database, or the various PubChem tools. That page lists recommended citations as well as additional articles that have been written about the PubChem resources and how they can be used.

PubChem Data Usage and Citation Guidelines: back to top

Please see the PubChem Data Usage and Citation Guidelines page for information about how to cite individual or multiple records from a PubChem database.


 Revised 11 August 2016