The classification hierarchies that are currently available in the Classification Browser are listed below. When the Browser firsts loads, the MeSH tree is displayed by default. You can select a different classification at any time by using the "Select classification" menu.
The "Select classification" menu lists all of the available hierarchies, and the "Data type counts to display" menu changes based on which hierarchy is selected. For example, if the MeSH classification is selected, then the counts to display menu only presents the choices of "None", "Compound", and "Substance." If the Gene Ontology classification is selected, then the counts to display menu only presents the choices of "None" and "Assay." The table below summarizes the available classifications and the PubChem data types to which each classification has been applied.
Note: If you choose to search by PubChem unique identifier, the input type you select (CID or AID) will determine what choices appear in the "Select classification" menu. Because BioAssay data have been annotated with Gene Ontology (GO) terms, only that classification will appear in the menu if you choose to search by AID. Conversely, the other classifications, but not GO, will appear if you choose to search by CID.
Available classifications (in the "Select Classification" menu) and the types of PubChem data they can be used to retrieve:
MeSH,
ChEBI,
Gene Ontology (GO),
KEGG,
LIPID MAPS,
WHO ATC Code,
WIPO (International Patent Classification (IPC))
Classification System: |
Description |
Can be used to retrieve PubChem: |
MeSH |
The National Library of Medicine (NLM)'s Medical Subject Headings (MeSH) is a controlled vocabulary thesaurus of medical terms that is arranged in both an alphabetic and a hierarchical structure.
MeSH is used for indexing literature from thousands of the world's leading biomedical journals for the MEDLINE®/PubMED® database, and for cataloging medical books, documents, and audiovisual materials, in order to facilitate retrieval of medical information at various levels of specificity.
The MeSH classification tree includes a branch of terms for Chemicals and Drugs. PubChem uses an automated name matching procedure to apply MeSH terms to chemical structures, enabling retrieval of substances and compounds through the Classification Browser.
The MeSH tree, as shown in the Classification Browser, also includes terms from the MeSH Supplementary Concept Records (SCR). The SCR is a separate thesaurus that contains approximately 200,000 chemical names in addition to those already in MeSH. Terms from the SCR are grouped under nodes named "Supplementary Records" in the Classification Browser. This enhancement greatly increased the number of PubChem records that can be browsed/retrieved with the MeSH Classification; for example, the number of PubChem Compounds linked to MeSH increased from 12,983 to 82,812 (as of Oct. 19, 2012).
|
Substances |
Compounds |
|
ChEBI |
The European Bioinformatics Institute (EBI)'s Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on "small" chemical compounds.
ChEBI is a depositor of small molecules into the PubChem Substance database. It maintains an ontology of terms used to describe the small molecules, and includes relevant terms in their deposited substance records. ChEBI terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the ChEBI substances. This enables retrieval of substances and compounds through the Classification Browser using the ChEBI ontology.
|
Substances |
Compounds |
|
Gene Ontology (GO) |
Gene Ontology (GO) project is an initiative to standardize the representation of gene and gene product attributes across species and databases and provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data.
GO is a depositor of data into the NCBI BioSystems database. It maintains an ontology of terms that includes three branches: biological processes, cellular components, and molecular functions. Terms from each of those branches are associated with genes and proteins through the method described in the data processing section of the BioSystems help document.
If any of those genes or proteins tagged with GO terms are targets of a biological test in the PubChem BioAssay database, the bioassay record is also tagged with the GO term(s). This enables retrieval of bioassays through the Classification Browser using Gene Ontology.
In addition, all substances (SIDs) and compounds (CIDs) found to be active in the assay (AID) are also tagged with the GO term(s) that were applied the assay.
|
Substances |
Compounds |
BioAssays |
KEGG |
The Kyoto Encyclopedia of Genes and Genomes (KEGG), by the Kanehisa Laboratory of the Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan, is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information.
KEGG is a depositor of small molecules into the PubChem Substance database. It maintains an ontology of terms used to describe the small molecules, and includes relevant terms in their deposited substance records. KEGG terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the KEGG substances. This enables retrieval of substances and compounds through the Classification Browser using the KEGG ontology.
|
Substances |
Compounds |
|
LIPID MAPS |
LIPID Metabolites And Pathways Strategy (LIPID MAPS) is a multi-institutional effort to identify and quantitate, using a systems biology approach and sophisticated mass spectrometers, the lipid species in mammalian cells, and to quantitate the changes in these species in response to perturbation.
LIPID MAPS is a depositor of small molecules into the PubChem Substance database. It maintains an ontology of terms used to describe the small molecules, and includes relevant terms in their deposited substance records. LIPID MAPS terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the LIPID MAPS substances. This enables retrieval of substances and compounds through the Classification Browser using the LIPID MAPS ontology.
|
Substances |
Compounds |
|
WHO ATC Code |
The World Health Organization (WHO) Anatomical Therapeutic Chemical (ATC) classification system divides active substances into different groups according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties.
Drugs are classified in groups at five different levels. The 1st level divides drugs into fourteen main groups; the 2nd level places them into pharmacological/therapeutic subgroups (2nd level). The 3rd and 4th levels are chemical/pharmacological/therapeutic subgroups and the 5th level is the chemical substance. The 2nd, 3rd and 4th levels are often used to identify pharmacological subgroups when that is considered more appropriate than therapeutic or chemical subgroups. Substances classified in the same ATC 4th level cannot be considered pharmacotherapeutically equivalent since their mode of action, therapeutic effect, drug interactions and adverse drug reaction profile may differ.
Medicinal products are classified according to the main therapeutic use of the main active ingredient, on the basic principle of only one ATC code for each route of administration (i.e. pharmaceutical forms with similar ingredients and strength will have the same ATC code). Immediate and slow release tablets will normally have the same ATC code. A medicinal product can be given more than one ATC code if it is available in two or more strengths or routes of administration with clearly different therapeutic uses. Normally, different stereoisomeric forms will have separate ATC codes. Prodrugs are usually assigned separate ATC codes if the dosages used are different and/or the nonproprietary name of the prodrug and the active drugs are different.
|
Substances |
Compounds |
|
WIPO |
The World Intellectual Property Organization (WIPO) International Patent Classification (IPC), established by the Strasbourg Agreement 1971, provides for a hierarchical system of language independent symbols for the classification of patents and utility models according to the different areas of technology to which they pertain.
PubChem uses the IPC classifications available from the EPO (European Patent Office).
The IPC divides technology into eight sections with approximately 70,000 subdivisions. Each subdivision has a symbol consisting of Arabic numerals and letters of the Latin alphabet. The appropriate IPC symbols are indicated on each patent document, of which more than 1,000,000 were issued each year in the last 10 years. The IPC symbols are allotted by the national or regional industrial property office that publishes the patent document. For PCT documents, IPC symbols are allotted by the International Searching Authority (ISA). The Classification is indispensable for the retrieval of patent documents in the search for "prior art." Such retrieval is needed by patent-issuing authorities, potential inventors, research and development units, and others concerned with the application or development of technology.
The Substance-Patent associations are made by depositors.WIPO terms are also inherited by PubChem Compounds that are identical in chemical structure (including same connectivity, isotopes, and stereochemistry) to the KEGG substances. This enables retrieval of substances and compounds through the Classification Browser using the WIPO classification.
|
Substances |
Compounds |
Patents |
* The data processing section of this document describes the method by which terms from each classification hierarchy are associated with the PubChem data types noted above.
- Browse -- Simply select the desired hierarchy from the pull-down menu to browse through its terms and/or view distribution of PubChem data among its nodes.
- Some classifications have subcategories. For example, Gene Ontology is subdivided into 3 subcategories: biological process, cellular component, and molecular function. If subcategories are available, then a second dropdown menu appears allowing users to choose a subcategory of interest. The Classification Browser displays a single subcategory at a time.
- If desired, use the display settings to show the counts of PubChem records associated with the hierarchy. You can also hide zero count nodes in order to view only the subset of nodes that contain your search term AND have links to PubChem data.
- Note that PubChem record counts reflect the subset of PubChem records that have been explicitly annotated with terms from the selected hierarchy, and not all PubChem records.
- The methods by which classification terms are annotated on PubChem records are described in the data processing section of this document.
- Search -- As an alternative to browsing an entire classification tree, you can search for a specific keyword to find nodes in a classification that contain the keyword in their node name and/or node description, or search for a specific or PubChem unique identifier (UID) to view the classification terms that have been annotated on a chemical structure (CID) or bioassay (AID).
- Keyword - Enter a term of interest in the search box to find the nodes in the selected hierarchy that contain your search string in the node name and/or node description. If desired, you can display the counts of PubChem records associated with those nodes and click on the resulting counts to retrieve the records.
- Your search string can appear in the node name or in the node description.
- For example, a search of Gene Ontology for "DNA repair" (without the quotes) will find entries such as "DNA repair" (GO:0006281) and "regulation of DNA repair" (GO:0006282), and more. A search of MeSH for "DNA repair" (without the quotes) will retrieve the exact term "DNA repair," and it will also retrieve terms such as "DNA Polymerase beta," which contains the search string in its description. Choose to display counts for BioAssays in order to see the number of PubChem BioAssay records associated with the terms retrieved.
- Some classifications have subcategories. For example, Gene Ontology is subdivided into 3 subcategories: biological process, cellular component, and molecular function. If subcategories are available, then a second dropdown menu appears allowing users to choose a subcategory of interest. Note that a search is executed only within the currently selected subcategory. To see if your search term exists in other subcategories of the hierarchy, do a separate search in each one.
- By default, the results of a search are shown in "List" view rather than "Tree" view. This displays a non-redundant list of terms that contain your search string, even if a term appears in multiple branches of the selected hierarchy. If a term appears in multiple branches, click on the [+] to view all of the classifications of that term within the selected hierarchy.
- If you toggle to "Tree" view using the display control at the top of the page, all nodes that contain your search string (as all or part of the node name, or as part of the node's description), will be underlined. A given term may appear multiple times in the "Tree" view, if that term appears in multiple branches of the selected hierarchy.
- If desired, use the display settings to show the counts of PubChem records associated with the hierarchy. You can also hide zero count nodes in order to view only the subset of nodes that contain your search term AND have links to PubChem data.
- PubChem unique identifier - Enter a compound identifier (CID), substance identifier (SID), or bioassay identifier (AID) to view the classification terms that have been annotated the chemical structure or bioassay, respectively. Note that the input type you select (CID, SID, or AID) will determine what choices appear in the "Select classification" menu; as noted in the table listing "available classifications and the types of PubChem data they can be used to retrieve," most of the classification systems have been applied to PubChem compounds and substances, and one of the classifications (GO) has been applied only to BioAssay data.
- CID - Use this option to enter a PubChem compound identifier (CID) of interest and view the classification terms with which it has been associated. The "Select Classification" menu will be set to "MeSH" by default, and you can change to another classification system at any time. The data processing section of this document describes the method by which classification terms have been applied to PubChem compounds.
- SID - Use this option to enter a PubChem substance identifier (SID) of interest and view the classification terms with which it has been associated. The "Select Classification" menu will be set to "MeSH" by default, and you can change to another classification system at any time. The data processing section of this document describes the method by which classification terms have been applied to PubChem substances.
- AID - Use this option to enter a PubChem assay identifier (AID) of interest and view the GO classification terms that have been applied to the gene/protein target tested by the bioassay. The data processing section of this document describes the method by which classification terms have been applied to PubChem bioassays.
Whether you are browsing a classification hierarchy to see the distribution of PubChem data among its nodes, or searching to find nodes that contain your term(s) of interest in the node name or description, you can customize the display in the following ways:
- Choose which PubChem record counts to display using the "Data type counts to display" menu. The options include:
- None - displays all of the terms present in the classification hierarchy, whether or not the terms are associated with any PubChem data. This option appears regardless of which hierarchy you have selected.
- Substances - displays the number of PubChem Substance records that have been annotated with terms from the classification hierarchy you have selected.
- Compounds - displays the number of PubChem Compound records that have been annotated with terms from the hierarchy you have selected.
- BioAssays - displays the number of PubChem BioAssay records whose protein targets have been annotated with terms from the hierarchy you have selected.
Note: The "counts" options that are available will depend on which classification hierarchy you have selected. For example, the option to see counts for Substances and Compounds will appear if you are viewing the MeSH, ChEBI, KEGG, or LIPIDMAPS hierarchies. The option to see counts for BioAssays will appear if you are viewing the Gene Ontology (GO) hierarchy. The associations between the terms and PubChem records have been made using the methods described in the data processing section of this document.
- Choose which nodes to display in the seleted hierarchy -- If you choose to display PubChem record counts for Substances, Compounds, or BioAssays, then another menu, "Display zero count nodes?" also appears, allowing you to indicate which nodes should be displayed in the hierarchy:
- display all nodes - The "Display zero count nodes: YES" setting displays all nodes in the classification hierarchy you have selected, regardless of whether a given node is associated with PubChem data.
- hide zero count nodes - The "Display zero count nodes: NO" setting displays only the nodes that have links to the data type (substances, compounds, or bioassays) that you selected in "Data Type Counts to Display," and hides nodes that are not associated with the PubChem data type of interest.
(The "Display zero count nodes?" menu will not be shown if the setting of "Data Type Counts to Display: None" is in effect.)
|