PUG REST

 

This document describes the REST-style version of PUG (Power User Gateway), a web interface for accessing PubChem data and services. It details both the syntax of the HTTP requests, and the available functions. This is more of a specification document; a less formal, tutorial-style PUG REST document is now available. For comments, help, or to suggest new functionality, please contact pubchem-help@ncbi.nlm.nih.gov.

 

Change log:

2012/04/24 – initial public beta release

2012/04/27 – added PNG output for substances and compounds

2012/05/02 – added dates operation

2012/05/07 – added xrefs operation

2012/05/16 – added TXT output for some cases

2012/05/25 – added input by xref

2012/05/29 – added input/search by SDF (via POST)

2012/06/01 – added similarity and identity search, and Entrez key information for listkey return

2012/07/13 – added component cids type

2012/08/31 – added tutorial document; changed status code strings                                           

2012/08/31 – changed assay target retrieval method, added protein name and gene symbol

2012/09/11 – added partial name matching

2012/10/25 – allow name input by POST

2013/01/15 – added simplified assay summary

2013/04/09 – added classification retrieval for assays

2013/04/16 – added SourceName, SourceCategory to xrefs operation

2014/01/03 – added assay input by target

2014/01/09 – added assay input by activity column name

2014/01/29 – added sid/cid title+description output

2014/04/08 – added a few more assay type filters

2014/06/03 – added basic conformer retrieval

 

Contents

 

URL-based API 1

The URL Path. 1

Input. 1

Operation. 2

Output. 2

HTTP Interface Details. 2

Request Header. 2

Request (POST) Body. 3

Status Codes. 3

HTTPS. 3

Schemas. 3

Operations. 4

Full-record Retrieval 4

Compound Property Tables. 4

Synonyms. 5

Description. 5

SIDS / CIDS / AIDS. 5

Assay Description. 6

Assay Targets. 6

Assay Summary. 6

Assay Dose-Response. 7

Classification. 7

Dates. 7

XRefs. 7

Conformers. 8

Asynchronous Operations. 8

Substructure / Superstructure. 8

Similarity. 8

Identity. 9

Molecular Formula. 9

Other Inputs. 10

Source Names. 10

Other Options. 10

Pagination. 10

 

 

URL-based API

 

The URL Path

 

Most – if not all – of the information the service needs to produce its results is encoded into the URL. The general form of the URL has three parts – input, operation, and output – after the common prefix, followed by operation options as URL arguments (after the ‘?’):

http://pubchem.ncbi.nlm.nih.gov/rest/pug/<input specification>/<operation specification>/[<output specification>][?<operation_options>]

 

Input

 

The input portion of the URL tells the service which records to use as the subject of the query. This is further subdivided into two or more locations in the URL “path” as follows:

<input specification> = <domain>/<namespace>/<identifiers>

<domain> = substance | compound | assay | <other inputs>

compound domain <namespace> = cid | name | smiles | inchi | sdf | inchikey | <structure search> | <xref> | listkey

<structure search> = {substructure | superstructure | similarity | identity}/{smiles | inchi | sdf | cid}

<xref> = xref / {RegistryID | RN | PubMedID | MMDBID | ProteinGI | NucleotideGI | TaxonomyID | MIMID | GeneID | ProbeID | PatentID}

substance domain <namespace> = sid | sourceid/<source name> | sourceall/<source name> | name | <xref> | listkey

<source name> = any valid PubChem depositor name

assay domain <namespace> = aid | listkey | type/<assay type> | sourceall/<source name> | target/<assay target> | activity/<activity column name>

<assay type> = all | confirmatory | doseresponse | onhold | panel | rnai | screening | summary | cellbased | biochemical | invivo | invitro | activeconcentrationspecified

<assay target> = gi | geneid | genesymbol

<identifiers> = comma-separated list of positive integers (e.g. cid, sid, aid) or identifier strings (source, inchikey); in some cases only a single identifier string (name, smiles, xref; inchi, sdf by POST only)

<other inputs> = sources / [substance, assay] | conformers

 

For example, to access CID 2244 (aspirin), one would construct the first part of the URL this way:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/<operation specification>/[<output specification>] 

 

Some source names contain the ‘/’ (forward slash) character, which is incompatible with the URL syntax; for these, replace the ‘/’ with a ‘.’ (period) in the URL. Other special characters may need to be escaped, such as ‘&’ should be replaced by ‘%26’. For example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceid/DTP.NCI/<operation specification>/[<output specification>]

 

Operation

 

The operation part of the URL tells the service what to do with the input records – such as to retrieve whole record data blobs or specific properties of a compound, etc. The construction of this part of the “path” will depend on what the operation is. Currently, if no operation is specified at all, the default is to retrieve the entire record. What operations are available are, of course, dependent on the input domain – that is, certain operations are applicable only to compounds and not assays, for example.

compound domain <operation specification> = record | <compound property> | synonyms | sids | cids | aids | assaysummary | classification | <xrefs> | description | conformers

<compound property> = property / [comma-separated list of property tags]

substance domain <operation specification> = record | synonyms | sids | cids | aids | assaysummary | classification | <xrefs> | description

<xrefs> = xrefs / [comma-separated list of xrefs tags]

assay domain <operation specification> = record | aids | sids | cids | description | targets/<target type> | <doseresponse> | summary | classification | xrefs

target_type = {ProteinGI, ProteinName, GeneID, GeneSymbol}

<doseresponse> = doseresponse/sid

 

For example, to access the molecular formula and InChI key for CID 2244, one would use a URL like:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula,InChIKey/[<output specification>]                                               

 

Output

 

The final portion of the URL tells the service what output format is desired. Note that this is formally optional, as output format can also be specified in the HTTP Accept field of the request header – see below for more detail.

<output specification> = XML | ASNT | ASNB | JSON | JSONP [ ?callback=<callback name> ] | SDF | CSV | PNG | TXT

 

ASNT is NCBI’s text (human-readable) variant of ASN.1; ASNB is standard binary ASN.1 and is currently returned as Base64-encoded ascii text. Note that not all formats are applicable to the results of all operations; one cannot, for example, retrieve a whole compound record as CSV or a property table as SDF. TXT output is only available in a restricted set of cases where all the information is the same – for example, synonyms for a single CID where there is one synonym per line.

For example, to access the molecular formula for CID 2244 in JSON format, one would use the (now complete) URL:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSON

JSONP takes an optional callback function name (which defaults to “callback” if not specified). For example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSONP?callback=my_callback

 

HTTP Interface Details

 

Request Header

 

The HTTP request header may be used to supply some types of information to this service.

The value of “Accept” may be a MIME type that will tell the service what output format is accepted by the client, and hence what format is returned by the server. The allowed values are:

Accept value

Output Format

application/xml

XML

application/json

JSON

application/javascript

JSONP

application/ber-encoded

ASNB

chemical/x-mdl-sdfile

SDF

text/csv

CSV

image/png

PNG

text/plain

TXT

 

The Content-Type in the HTTP response header will also be set by the reverse of the above table, e.g. XML data will have “Content-Type: application/xml”.

For example, the URL:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244

with “Accept: chemical/x-mdl-sdfile” in the request header will return CID 2244 in SDF format.

 

For proper transmission of certain special characters, strings passed e.g. for SMILES input may need to be URL encoded; for example, “smiles=C1C[CH+]1” should be encoded as “smiles=C1C%5BCH%2B%5D1”. For correct parsing of any POST body, the proper content type header must be included in the request header (see below).

 

Request (POST) Body

 

Some parts of the URL may be moved to the body of a POST request, rather than being part of the URL path. For example, a list of CID integers – which may be too long to fit within the size limitations of a GET request URL – may be moved to the POST body:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/property/MolecularFormula,MolecularWeight/CSV

with “cid=1,2,3,4,5” in the POST body, would retrieve a CSV-formatted table of results for these CIDs. Note that for the service to parse such POST information correctly, the “Content-Type: application/x-www-form-urlencoded” value must be included in the request header. One may also use “Content-Type: multipart/form-data” with the POST body formatted accordingly. See here for more information on content type encoding.

 

Status Codes

 

If the operation was successful, the HTTP status code will be 200 (OK). If the server encounters an error, it will return an HTTP status code that gives some indication of what went wrong; possibly along with, depending on the output format (such as in a <Fault> tag in XML), some additional more human-readable detail message(s). The codes in the 400-range are errors on the client side, and those in the 500 range indicate a problem on the server side; the codes currently in use are:

HTTP Status

Error Code

General Error Category

200

(none)

Success

400

PUGREST.BadRequest

Request is improperly formed (syntax error in the URL, POST body, etc.)

404

PUGREST.NotFound

The input record was not found (e.g. invalid CID)

405

PUGREST.MethodNotAllowed

Request not allowed (such as invalid MIME type in the HTTP Accept header)

504

PUGREST.Timeout

The request timed out, from server overload or too broad a request

501

PUGREST.Unimplemented

The requested operation has not (yet) been implemented by the server

500

PUGREST.ServerError

Some problem on the server side (such as a database server down, etc.)

500

PUGREST.Unknown

An unknown error occurred

 

 

HTTPS

 

This service supports both HTTP and HTTPS protocols. That is, every sample URL in this document that begins with “http://” may be substituted with “https://” for secure transmission. For example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/XML

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/XML

 

Schemas

 

A schema for the XML data returned by PUG REST may be found at:

http://pubchem.ncbi.nlm.nih.gov/pug_rest/pug_rest.xsd

Some operations (such as full record retrieval) may use the standard PubChem schema at:

ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem.xsd

Classification data is returned with this schema:

http://pubchem.ncbi.nlm.nih.gov/pug_rest/hierarchy_data.xsd

 

Operations

 

Full-record Retrieval

 

Returns full records for PubChem substances, compounds, and assays.

Valid output formats for substances and compounds are XML, JSON(P), ASNT/B, SDF, and PNG. A compound record may optionally be either 2D or 3D; substances are always given with coordinates as deposited. For PNG output, only the first SID or CID is used if the input is a list.

Option

Allowed Values (default in bold)

Meaning

record_type

2d, 3d

Type of conformer for compounds

image_size

large, small, <width>x<height>

Image size: large (300x300), small (100x100), or arbitrary (e.g. 320x240)

 

Valid output formats for assays are XML, JSON(P), ASNT/B, and CSV. Assay record retrieval is limited to a single AID with 10000 SIDs at a time; a subset of the SIDs of an assay may be specified as options:

Option

Allowed Values

Meaning

sid

listkey, or comma-separated integers

SID rows to retrieve for an assay

listkey

valid SID listkey

listkey containing SIDs, if using sid=listkey

 

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceid/IBM/5F1CA2B314D35F28C7F94168627B29E3/ASNT

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceid/DTP.NCI/747285/SDF

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceid/DTP.NCI/747285/PNG

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/SDF

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/PNG

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/SDF?record_type=3d

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/PNG?record_type=3d&image_size=small

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/aspirin/SDF

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchikey/BPGDAMSIGCZZLK-UHFFFAOYSA-N/SDF

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/1000/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/1000/CSV?sid=26736081,26736082,26736083

 

Compound Property Tables

 

Returns a table of compound properties. More than one property may be requested, in a comma-separated list of property tags in the request URL. Valid output formats for the property table are: XML, ASNT/B, JSON(P), CSV, and TXT (limited to a single property). Available properties are:

Property

Notes

MolecularFormula

 

MolecularWeight

 

CanonicalSMILES

 

IsomericSMILES

includes stereo information

InChI

standard InChI

InChIKey

 

IUPACName

 

XLogP

 

ExactMass

 

MonoisotopicMass

 

TPSA

topological polar surface area

Complexity

 

Charge

 

HBondDonorCount

 

HBondAcceptorCount

 

RotatableBondCount

 

HeavyAtomCount

number of non-hydrogen atoms

IsotopeAtomCount

number of atoms with enriched isotope(s)

AtomStereoCount

total number of atoms with tetrahedral (sp3) stereo

DefinedAtomStereoCount

 

UndefinedAtomStereoCount

 

BondStereoCount

total number of bonds with planar (sp2) stereo

DefinedBondStereoCount

 

UndefinedBondStereoCount

 

CovalentUnitCount

 

Volume3D

 

XStericQuadrupole3D

 

YStericQuadrupole3D

 

ZStericQuadrupole3D

 

FeatureCount3D

total number of 3D features (sum of the following six)

FeatureAcceptorCount3D

 

FeatureDonorCount3D

 

FeatureAnionCount3D

 

FeatureCationCount3D

 

FeatureRingCount3D

 

FeatureHydrophobeCount3D

 

ConformerModelRMSD3D

RMSD of all conformers in the 3D model

EffectiveRotorCount3D

 

ConformerCount3D

number of conformers in the 3D model

 

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/1,2,3,4,5/property/MolecularFormula,MolecularWeight,InChIKey/CSV

 

Synonyms

 

Returns a list of substance or compound synonyms. Valid output formats for synonyms are XML, JSON(P) , ASNT/B, and TXT (limited).

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/aspirin/synonyms/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/CCCC/synonyms/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/53789435/synonyms/TXT

 

Description

 

Returns the title and description for an S/CID, the same as used in the web summary pages for these records. Valid output formats are XML, JSON(P) , and ASNT/B.

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/1983/description/XML

 

SIDS / CIDS / AIDS

 

Returns a list of SIDs, CIDs, or AIDs. Possibly interconverts record identifiers, with options in the table below; these options, if present, must be specified as standard URL arguments (e.g. after the ‘?’). The list of identifiers may be grouped by input (e.g. when converting from one type to another); flattened to a unique target set (implied for TXT output); or stored on the server (which also implies flat), in which case a list key is returned. Valid output formats are XML, JSON(P), ASNT/B, and TXT.

Option

Allowed Values (default in bold)

Meaning

aids_type

all, active, inactive

Type of AIDs to return, given SIDs or CIDs

sids_type

all, active, inactive, doseresponse

Type of SIDs to return, given AIDs

sids_type

all, standardized, component

Type of SIDs to return, given CIDs

sids_type

original,

same_exact, same_stereo, same_isotopes,

same_connectivity, same_tautomer,

same_parent, same_parent _stereo, same_parent _isotopes,

same_parent _connectivity, same_parent _tautomer

Type of SIDs to return, given SIDs

cids_type

all, active, inactive

Type of CIDs to return, given AIDs

cids_type

all, standardized, component

Type of CIDs to return, given SIDs

cids_type

original, parent, component, similar_2d, similar_3d,

same_stereo, same_isotopes,

same_connectivity, same_tautomer,

same_parent, same_parent _stereo, same_parent _isotopes,

same_parent _connectivity, same_parent _tautomer

Type of CIDs to return, given CIDs

list_return

grouped, flat, listkey

Type of identifier list to return

 

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/name/glucose/sids/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/name/glucose/sids/XML?list_return=listkey

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/listkey/xxxxxx/sids/XML (where ‘xxxxxx’ is the listkey from the above URL)

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/name/glucose/cids/XML?list_return=grouped

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/name/glucose/cids/XML?list_return=flat

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceall/MLSMR/sids/JSON?list_return=listkey

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sourceall/R%26D%20Chemicals/sids/XML?list_return=listkey

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/123061,123079/cids/XML?cids_type=all

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/2780/sids/JSON?sids_type=same_parent_tautomer

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/sids/JSON

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchi/cids/JSON (where the POST body contains “inchi=InChI=1S/C3H8/c1-3-2/h3H2,1-2H3”)

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/aids/JSON?aids_type=active

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/sids/JSON?sids_type=component

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/cids/TXT?cids_type=same_connectivity

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/21145249/cids/XML?cids_type=parent

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/1000/sids/XML?sids_type=inactive

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/504526/sids/JSON?sids_type=doseresponse

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/type/doseresponse/aids/JSON

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/sourceall/DTP.NCI/aids/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/xref/PatentID/EP0711162A1/sids/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/myxalamid/cids/XML?name_type=word

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/myxalamid/cids/XML?name_type=complete

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/target/genesymbol/USP2/aids/TXT

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/target/gi/116516899/aids/JSON

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/activity/EC50/aids/TXT

 

Assay Description

 

Returns assay descriptions. Valid output formats are XML, JSON(P), and ASNT/B.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/490/description/XML

 

Assay Targets

 

Return assay target information. Valid output formats are XML, JSON(P), ASNT/B, and TXT. Available target types are:

Target Type

Notes

ProteinGI

NCBI GI of a protein sequence

ProteinName

protein name

GeneID

NCBI Gene database identifier

GeneSymbol

gene symbol

 

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/490,1000/targets/ProteinGI,ProteinName,GeneID,GeneSymbol/XML

 

Assay Summary

 

Returns a summary of biological test results for the given SID(s) or CID(s), including assay experiment information, bioactivity, and target. Valid output formats are XML, JSON(P), ASNT/B, and CSV.

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/1000,1001/assaysummary/CSV

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/104234342/assaysummary/XML

 

There is also a per-AID assay summary available in a simplified format. Valid output formats are XML, JSON(P), and ASNT/B.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/1000/summary/XML

 

Assay Dose-Response

 

Returns assay dose-response data for a single AID with up to 1000 SID(s). Valid output formats are XML, JSON(P), ASNT/B, and CSV. A subset of the SIDs of an assay may be specified as options:

Option

Allowed Values

Meaning

sid

listkey, or comma-separated integers

SID rows to retrieve for an assay

listkey

valid SID listkey

listkey containing SIDs, if using sid=listkey

 

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/504526/doseresponse/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/504526/doseresponse/CSV?sid=104169547,109967232

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/doseresponse/XML (with “aid=504526&sid=104169547,109967232” in the POST body)

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/602332/sids/XML?sids_type=doseresponse&list_return=listkey

followed by

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/602332/doseresponse/CSV?sid=listkey&listkey=xxxxxx&listkey_count=100 (where ‘xxxxxx’ is the listkey returned by the previous URL)

 

Classification

 

Returns the nodes in the classification tree for a single SID, CID, or AID. Valid output formats are XML, JSON(P), and ASNT/B. Options are:

Option

Allowed Values (default in bold)

Meaning

classification_type

simple, original

simple: simplified tree structure, each node has a single parent

original: as given by the depositor, nodes may have multiple parents

 

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/1917/classification/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/1983/classification/JSON?classification_type=original

 

Dates

 

Returns dates associated with PubChem identifiers; note that not all date types are relevant to all identifier types – see the table below. Multiple date types may be requested. Valid output formats are XML, JSON(P), and ASNT/B. Options are:

Option

Allowed Values (default in bold)

Meaning

 

dates_type

 

deposition

when an SID or AID first appeared

modification

when an SID or AID was last modified

hold

when an SID or AID will be released

creation

when a CID first appeared

 

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/dates/JSON

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/1,2,3,135653256/dates/XML?dates_type=modification,deposition,hold

http://pubchem.ncbi.nlm.nih.gov/rest/pug/assay/aid/1,624113/dates/XML?dates_type=deposition,hold

 

XRefs

 

Returns cross-references associated with PubChem SIDs or CIDs. Multiple types may be requested in a comma-separated list in the URL path. Valid output formats are XML, JSON(P), ASNT/B, and TXT (limited to a single type). Available cross-references are:

Cross-reference

Meaning

RegistryID

external registry identifier

RN

registry number

PubMedID

NCBI PubMed identifier

MMDBID

NCBI MMDB identifier

DBURL

external database home page URL

SBURL

external database substance URL

ProteinGI

NCBI protein GI

NucleotideGI

NCBI nucleotide GI

TaxonomyID

NCBI taxonomy identifier

MIMID

NCBI MIM identifier

GeneID

NCBI gene identifier

ProbeID

NCBI probe identifier

PatentID

patent identifier

SourceName

external depositor name

SourceCategory

depositor category(ies)

 

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/127378063/xrefs/PatentID/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/vioxx/xrefs/RegistryID,RN,PubMedID/JSONP

 

Conformers

 

A list of diverse order conformer IDs can be obtained from CID. Valid output formats are XML, JSON(P), ASNT/B, and TXT (limited to a single CID):

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/conformers/XML

Individual conformer records – either computed 3D coordinates for compounds or deposited/experimental 3D coordinates for some substances – can be retrieved by conformer ID:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/000008C400000001/SDF

 

Asynchronous Operations

 

Substructure / Superstructure

 

This is a special type of compound namespace input that retrieves CIDs by substructure or superstructure search. It requires a CID, or a SMILES, InChI, or SDF string in the URL path or POST body (InChI and SDF by POST only). Because a structure search may require substantial time to complete, no operation may be specified in the URL; rather, this request will always return an asynchronous key, which should be used in subsequent requests to check for search completion or to retrieve the results. Valid output formats are XML, JSON(P), and ASNT/B.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/C3=NC1=C(C=NC2=C1C=NC=C2)[N]3/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/inchi/XML (where the POST body contains “inchi=InChI=1S/C9H6N4/c1-2-10-3-6-7(1)11-4-8-9(6)13-5-12-8/h1-5H,(H,12,13)”)

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/superstructure/cid/2244/XML

Followed by another request that may return a waiting message, or the final result, for example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/xxxxx/cids/XML (where ‘xxxxx’ is the ListKey returned in the prior search request)

 

Structure search options are specified via URL arguments:

Option

Type

Meaning

Default

MatchIsotopes

boolean

atoms must be of the specified isotope

false

MatchCharges

boolean

atoms must match the specified charge

false

MatchTautomers

boolean

allow match to tautomers of the given structure

false

RingsNotEmbedded

boolean

rings may not be embedded in a larger system

false

SingleDoubleBondsMatch

boolean

single or double bonds match aromatic bonds

true

ChainsMatchRings

boolean

chain bonds in the query may match rings in hits

true

StripHydrogen

boolean

remove any explicit hydrogens before searching

false

Stereo

enum

how to handle stereo; one of ignore, exact, relative, nonconflicting

ignore

MaxSeconds

integer

maximum search time in seconds

unlimited

MaxRecords

integer

maximum number of hits

2M

listkey

string

restrict to matches within hits from a prior search

none

 

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/substructure/smiles/C1=NC2=C(N1)C(=O)N=C(N2)N/XML?MatchTautomers=true&MaxRecords=100

 

Similarity

 

This is a special type of compound namespace input that retrieves CIDs by 2D similarity search. It requires a CID, or a SMILES, InChI, or SDF string in the URL path or POST body (InChI and SDF by POST only). Because this search may require substantial time to complete, no operation may be specified in the URL; rather, this request will always return an asynchronous key, which should be used in subsequent requests to check for search completion or to retrieve the results. Valid output formats are XML, JSON(P), and ASNT/B.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/similarity/cid/2244/XML

Followed by another request that may return a waiting message, or the final result, for example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/xxxxx/cids/XML (where ‘xxxxx’ is the ListKey returned in the prior search request)

 

Similarity search options are specified via URL arguments:

Option

Type

Meaning

Default

Threshold

integer

minimum Tanimoto score for a hit

90

MaxSeconds

integer

maximum search time in seconds

unlimited

MaxRecords

integer

maximum number of hits

2M

listkey

string

restrict to matches within hits from a prior search

none

 

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/similarity/smiles/C1=NC2=C(N1)C(=O)N=C(N2)N/XML?Threshold=95&MaxRecords=100

 

Identity

 

This is a special type of compound namespace input that retrieves CIDs by identity search. It requires a CID, or a SMILES, InChI, or SDF string in the URL path or POST body (InChI and SDF by POST only). Because this search may require substantial time to complete, no operation may be specified in the URL; rather, this request will always return an asynchronous key, which should be used in subsequent requests to check for search completion or to retrieve the results. Valid output formats are XML, JSON(P), and ASNT/B.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/identity/smiles/CCCCC/XML

Followed by another request that may return a waiting message, or the final result, for example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/xxxxx/cids/XML (where ‘xxxxx’ is the ListKey returned in the prior search request)

 

Identity search options are specified via URL arguments:

Option

Type

Values / Meaning

Default

identity_type

string

same_connectivity

same_tautomer 

same_stereo 

same_isotope 

same_stereo_isotope 

nonconflicting_stereo 

same_isotope_nonconflicting_stereo 

same_stereo_isotope 

 

MaxSeconds

integer

maximum search time in seconds

unlimited

MaxRecords

integer

maximum number of hits

2M

listkey

string

restrict to matches within hits from a prior search

none

 

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/identity/smiles/C1=NC2=C(N1)C(=O)N=C(N2)N/XML?identity_type=same_tautomer

 

Molecular Formula

 

This is a special type of compound namespace input that retrieves CIDs by molecular formula search. It requires a formula string in the URL path. Because a formula search may require substantial time to complete, no operation may be specified in the URL; rather, this request will always return an asynchronous key, which should be used in subsequent requests to check for search completion or to retrieve the results. Valid output formats are XML, JSON(P), and ASNT/B.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/formula/C10H21N/XML

Followed by another request that may return a waiting message, or the final result, for example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/xxxxx/cids/XML (where ‘xxxxx’ is the ListKey returned in the prior request)

 

Search options are specified via URL arguments:

Option

Type

Meaning

Default

AllowOtherElements 

boolean

Allow other elements to be present in addition to those specified

false

MaxSeconds

integer

maximum search time in seconds

unlimited

MaxRecords

integer

maximum number of hits

2M

listkey

string

restrict to matches within hits from a prior search

none

 

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/formula/C10H21N/JSON?AllowOtherElements=true&MaxRecords=10

 

Other Inputs

 

These are special input domains that do not deal with lists of PubChem record identifiers; regular operations are not possible with these inputs.

               

Source Names

 

Returns a list of all current depositors (sources) of substances or assays. Valid output formats are XML, JSON(P), and ASNT/B.

Examples:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/sources/substance/XML

http://pubchem.ncbi.nlm.nih.gov/rest/pug/sources/assay/JSONP

 

Other Options

 

Pagination

 

When retrieving identifiers by listkey, the listkey_start and listkey_count options indicate at what index (zero-based) in the list to begin retrieval, and how many identifiers to return, respectively.

Example:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/1,2,3,4,5/cids/XML?list_return=listkey

followed by:

http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/listkey/xxxxxx/cids/XML?listkey_start=2&listkey_count=2

where ‘xxxxxx’ is the listkey returned by the first URL, will return a list containing (only) CIDs 3 and 4.