|PubMed | Entrez | Structure | PubChem | Help|
|PubChem » PubChem Help » PUG SOAP Help|
PUG SOAP is a web services access layer to PubChem functionality. It is based on a WSDL which can be found at the URL:
PubChem’s PUG (Power User Gateway), documented elsewhere, is an XML-based interface suitable for low-level programmatic access to PubChem services, wherein data is exchanged through a relatively complex XML schema that is powerful but requires some expertise to use. PUG SOAP contains much of the same functionality, but broken down into simpler functions defined in a WSDL (http://www.w3.org/TR/wsdl), using the SOAP protocol (http://www.w3.org/TR/soap) for information exchange. This WSDL/SOAP layer is most suitable for SOAP-aware GUI workflow applications (Taverna, Pipeline Pilot) and programming languages (C#/.NET, Perl, Python, Java, etc.). See the Tips & Tricks section at the end of this document for more information on specific clients.
PUG SOAP is an interface to PubChem’s specialized search and analysis services – chemical structure searches, full record downloads, etc. For text and numeric queries via Entrez, a SOAP interface to Entrez’s eUtils is available at:
This may be used in conjunction with PUG SOAP, for example by using eSearch to find compounds with a particular molecular weight, saving the resulting ID list in an Entrez history key, and passing that key to PUG SOAP to retrieve the records in SDF format. See the main PubChem help page for a list of searchable fields in Entrez.
We welcome feedback and suggestions; please direct these to NCBI’s help desk at email@example.com. RSS and Atom feeds are also available for announcements specific to this web service, such as new functions, enhancements, etc.
PUG SOAP Concepts
For example, when a SMILES or SDF structure is provided as initial input, PUG SOAP will return a “structure key” that can be used in subsequent functions that take a chemical structure as their starting point – like a substructure search or a standardization request. Similarly, functions that are used to specify a BioAssay table will return an “assay key.”
Sets of PubChem database identifiers (SIDs, CIDs, or AIDs) are contained in a “list key.” So for example, a similarity search (which takes a structure key as input) will return a list key as output. From there, one may use this list key to retrieve the actual identifiers, to download structures, to show the result set in Entrez, to limit the search space of subsequent queries, etc.
PubChem BioAssay, Substance, and Compound downloads produce a “download key” that may be used to obtain a URL from which the desired records may be obtained.
Queued (Asynchronous) Operations
Many functions in PUG SOAP are run on dedicated servers shared by the entire PubChem community, on which jobs run on a first come first serve queue and may take some time to complete. The key that such a function returns must be used with a status check function, to make sure that the request has been completed before proceeding to the next step. The user’s application is responsible for periodically polling the status function, moving on only when success is achieved, or halting if an error status is returned.
Examples of how to do this polling within many common SOAP clients are provided later in this document.
PUG SOAP Functions
This section describes the functions available through the PUG SOAP WSDL, their inputs and outputs, and whether they are synchronous or asynchronous. Synchronous here means that the function returns a result right away. Asynchronous means that the operation is queued, that the status check function (GetOperationStatus) must be used with the returned key, and further operations on that key must not be performed until the check indicates success.
This document provides a general description of what each function does. For details on the SOAP messages, XML schema types, enumerations, etc., see the automatically generated (and highly detailed!) documentation, in HTML or PDF format, respectively:
The functions below are listed alphabetically. One may also organize them by classifying into three categories: input, processing, and output. Input functions are the starting points, where the user provides structures and ID lists for further operations. Processing functions perform some complex calculation on PubChem’s computational infrastructure. Output functions are used to retrieve the results of the processing. By design, the input functions all begin with “Input” and are synchronous. Processing functions may have any name, and are asynchronous. Output functions begin with “Get” and are synchronous.
Given an assay key, prepare for download a file containing an assay data table in the selected format. See the assay query section of the PUG service documentation (http://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html) for more detail on the supported formats. Compression is optional and defaults to gzip (.gz). Returns a download key. Asynchronous.
Given a list key, prepare for download a file containing those records in the selected format. See the web download service documentation (http://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch-help.html) for more detail on the supported formats and file types. Returns a download key. Asynchronous.
Get the description of column (readout) in a BioAssay, which may be the outcome, score, or a TID from the given AID. Synchronous.
Get the description of all columns (readouts) in a BioAssay. Synchronous.
Get the descriptive information for a BioAssay, including the number of user-specified readouts (TIDs) and whether a score readout is present. Optionally get version information. Synchronous.
Given a download key, return an FTP URL that may be used to download the requested file. Synchronous.
Given a list key, return an Entrez history key (db, query key, and WebEnv) corresponding to that list. Synchronous.
Given an Entrez history key (db, query key, and WebEnv), return an HTTP URL that may be used to view the list in Entrez. Synchronous.
Given a list key, return the identifiers as an array of integers. Note that this method expects there to be at least one identifier in the list, and will fault if the list is empty; see GetListItemsCount, which can be used to check for an empty list prior to calling GetIDList. The optional Start (zero-based) and Count parameters can be used to return smaller portions of the list, useful especially for large lists. Synchronous.
Return the number of IDs in the set represented by the given list key. Synchronous.
Given a key for any asynchronous operation, return the status of that operation. Possible return values are: Success, the operation completed normally; HitLimit, TimeLimit: the operation finished normally, but one of the limits was reached (e.g. before the entire database was searched); ServerError, InputError, DataError, Stopped: there was a problem with the input or on the server, and the job has died; Queued: the operation is waiting its turn in the public queue; Running: the operation is in progress. Synchronous.
Given a structure key that has been processed by Standardize, return the corresponding PubChem Compound database CID, or an empty value if the structure is not present in PubChem. Synchronous.
Given a structure key that has been processed by Standardize, return the chemical structure in as SMILES or InChI strings. Synchronous.
Given a structure key that has been processed by Standardize, return the chemical structure as ASN, XML, or SDF, returned as a Base64-encoded string. Synchronous.
Given a key for any asynchronous operation, return any system messages (error messages, job info, etc.) associated with the operation, if any. Synchronous.
Search PubChem Compound for structures identical to the one given by the structure key input, based on a user-selected level of chemical identity: connectivity only, match isotopes and/or stereo, etc. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
Convert IDs from one type to another, using any one of a variety of CID matching algorithms. Output can be a list or a downloaded file; download file compression is optional and defaults to gzip (.gz). Returns a list or download key. Asynchronous.
Specify an assay table from a BioAssay AID. The table may be complete, concise, or include a ListKey-specified set of readouts (TIDs). By default, all tested substances are included, but can be restricted to a ListKey-specified set of SIDs or CIDs. Returns an assay key. Synchronous.
Input an Entrez history key (db, query key, and WebEnv). Returns a list key. Synchronous.
Input a set of identifiers for a PubChem database, as an array of integers. Returns a list key. Synchronous.
Input a set of identifiers for a PubChem database, as a simple string of integer values separated by commas and/or whitespace. Returns a list key. Synchronous.
Input a chemical structure as a simple (one-line) string, either SMILES or InChI. Returns a structure key. Synchronous.
Input a chemical structure in ASN.1 (text or binary), XML,
or SDF format. The structure must be encoded as a Base64 string. Currently only
single structures are supported. Returns a structure key. Synchronous.
Search PubChem Compound for structures of a given molecular formula, optionally allowing elements not specified to be present. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
Compute a matrix of scores from one or two lists of IDs (if one, the IDs will be self-scored), of the selected type and in the selected format. Compression is optional and defaults to gzip (.gz). Returns a download key. Asynchronous.
Search PubChem Compound for structures similar to the one given by the structure key input, based on the given Tanimoto-based similarity score. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
Standardize the structure given by the structure key input, using the same algorithm PubChem uses to construct the Compound database. Returns a structure key. Asynchronous.
Search PubChem Compound for structures containing the one given by the structure key input, based on a user-selected level of chemical identity: connectivity only, match isotopes and/or stereo, etc. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
Search PubChem Compound for structures contained within the one given by the structure key input, based on a user-selected level of chemical identity: connectivity only, match isotopes and/or stereo, etc. The search may be limited by elapsed time or number of records found, or restricted to search only within a previous result set (given by a list key). Returns a list key. Asynchronous.
The standard WSDL/SOAP interface to PUG SOAP makes these web services functions generic and compatible with any SOAP client - in theory. In practice, we have found that the support for WSDL/SOAP among various clients is highly variable, each with different quirks and workarounds necessary to make them work with PUG SOAP. On our PUG SOAP client help web page:
we share some of our experiences and tricks for working with some common clients, which hopefully will help first time users get started.
We have made every effort to design PUG SOAP to work as broadly and generically as possible. But since the SOAP clients are out of PubChem’s control, we cannot guarantee that every version of every client will work the same way, or that any given client will be compatible. Please contact NCBI’s help desk at firstname.lastname@example.org with any comments, suggestions, or questions. Provided we have access to the same client software, we will try to help with specific issues.
|Write to Helpdesk | Disclaimer | Privacy statement | Accessibility | Data Citation Guidelines|