PubChem Upload FAQ
 
 
   

This file answers frequently asked questions about PubChem Upload, a tool for submitting data to the PubChem Substance and PubChem BioAssay databases. It supplements the brief help document, which provides basic information about PubChem Upload, including sample files for submitting substances and assays, and the complete help document, which includes the information provided in the brief help document, plus technical details about the PubChem Upload tool and FTP submissions. A tutorial is also available, and provides step by step examples of how to use PubChem Upload for substance and assay submissions. If you need additional help, please contact pubchem-deposit-help@ncbi.nlm.nih.gov.

 
     
Frequently Asked Questions
 
   
 
back to top Are there templates or sample files for submitting substances and assays?
 
  Yes, you can use an existing PubChem record as a template for a new submission if you select the option to "fill in forms" when you make a new submission. If you prefer to make your submission by uploading files rather than using wizards to fill out forms, the PubChem Upload help document provides links to sample files.

Templates:

The PubChem Upload interface for making a "New submission" enables you to specify an existing PubChem record for use as a template. After you specify which type of submission you will make (substance or bioassay), you can specify how you would like to enter the data (upload files or fill in forms). If you choose "fill in forms," the system will ask if you want to "start from" an existing PubChem SID or CID, or from an existing AID.
New Substance submission template:
If you make a new substance submission and choose to "fill in forms," the PubChem Upload system will ask you:
"Do you have an SID (PubChem Substance ID) or a CID (PubChem Compound ID) that you would like to start from?"
If so, specify the CID or SID that you would like to use as a template for the new substance submission.

The PubChem Upload system will then copy the chemical structure, chemical name, and synonyms from that compound or substance record into your new submission forms. Then you can use the wizards to complete the rest of the submission. (See the help document for additional details about substance submissions.)
New Assay submission template:
If you make a new bioassay submission and choose to "fill in forms," the PubChem Upload system will ask you:
"Do you have an AID (PubChem Assay ID) that you would like to start from?"
If so, specify the AID that you would like to use as a template for the new bioassay submission.

The PubChem Upload system will then copy the protocol and description from that assay record into your new submission forms. Upload will NOT copy the assay's RegID, and will not copy its data. However, the protocol will supply the data definition (the column headers) that will be used in the data table. (See the help document for additional details about bioassay submissions.)
Sample files:

If you prefer to make your submission by uploading files rather than using wizards to fill out forms, the following sections of the PubChem Upload help document provides links to sample files.
Substance submission sample files:
http://pubchem.ncbi.nlm.nih.gov/upload/docs/upload_help_complete.html#SubstanceSampleFile
Assay submission sample files:
http://pubchem.ncbi.nlm.nih.gov/upload/docs/upload_help_complete.html#AssaySampleFile
 
 
 
back to top Can I submit bioassay data for PubChem substances that were submitted by a vendor or other third party?
 
  Yes, you can submit bioassay data for PubChem substances that were submitted by a vendor or other third party, but only if those substances are RNAi records. If you have tested other types of substances (such as chemicals), you need to submit those substances to PubChem before submitting the assay description and assay data.

As noted in the "three parts to an assay submission" section of the complete PubChem Upload help document, the first step is the submission of the substances you have tested (such as chemicals or RNAi) to the PubChem Substance database. This step provides you with a PubChem Substance identifier (SID) for each substance you have tested. You can then list each of your SIDs under the "PUBCHEM_SID" column of your assay data file (when you do the third step of the assay submission process).

RNAi assay depositors, however, also have the option of using the SID or External ID that was submitted to PubChem by an outside RNAi substance provider, as an alternative to submitting RNAi substance records from scratch. To do this, you can just specify, in your assay data file, either the SID or External ID of the RNAi records that the outside provider submitted to the PubChem Substance database. This allows RNAi assay depositors to essentially skip the substance submission step of the "three parts to an assay submission."

If the substance provider is an RNAi vendor, the External ID is usually a catalog ID. Both the External ID and the PubChem SID are displayed in PubChem Substance records, and can therefore be obtained by viewing the vendor's PubChem records. In addition, some vendors have provided files that list their catalog IDs and corresponding SIDs, and these are available on the PubChem FTP site (ftp://ftp.ncbi.nlm.nih.gov/pubchem/Bioassay/Extras/VendorCatalogs/).

In order to exercise the option of using an RNAi vendor's catalog ID or SID (instead of your own SID), you need to upgrade your PubChem Upload account from a "test" account to a "full" account, and configure the account to indicate you are using a third-party or vendor's product, by checking the "For RNAi Assay depositors only: Use outside RNAi substance provider?" option under the "Upload Preferences" tab. (The Preferences tab is accessible from the "Account Settings" folder of your PubChem Upload home page.)

Then, when you submit the assay data file (which is the third step of the three parts to an assay submission), you can enter the vendor's External ID or their PubChem SID for the RNAi you have tested.

If you use the vendor's catalog ID, place it in the "PUBCHEM_EXT_DATASOURCE_REGID" column of your assay data file; if you use the vendor's SID, place it in the "PUBCHEM_SID" column of the data file.

For example, if you tested the siGENOME siRNA reagent for TXNDC15, which was submitted to PubChem Substance by GE Healthcare Dharmacon RNAi Technologies, you could enter the submitter's External ID for that RNAi (in this case, "D-056123-17") in the "PUBCHEM_EXT_DATASOURCE_REGID" column of your assay data file, or you could enter the corresponding PubChem Substance identifier (193126385) in the "PUBCHEM_SID" column of the file.
 
 
 
back to top Can I put my record on-hold and for how long?
 
  Yes, you can put your PubChem Substance or PubChem BioAssay record on hold. This can be useful, for example, if you have a paper under review and would like to embargo the PubChem records until the paper is published.

If you would like PubChem to hold your data, it is very critical for submitters to specify a PUBCHEM_HOLD_UNTIL_DATE at the initial submission. If you do that, your data will be added to PubChem after you press "Commit," but they will not be visible to the public until the "PUBCHEM_HOLD_UNTIL_DATE" that you specified. (The substance data specifications and bioassay data specifications provide additional details about the PUBCHEM_HOLD_UNTIL_DATE for each type of data.)

An initial hold of up to one year from initial submission is accepted, with the possibility of additional extensions.

You can update your record any time to remove the on-hold request and release the record to the public. The complete PubChem Upload Help document describes how to remove the on-hold request for substances and how to remove the on-hold request for assays.

If you DO NOT specify a hold-until-date at the time of initial submission, your PubChem record will be released automatically after the record goes through standard PubChem data processing.

Once a record is "published" (publicly released) in PubChem, it can no longer be put back to the on-hold status. (That is, a retroactive hold cannot be placed on PubChem records that are already publically accessible.) The only action available after that time is a revoke.

Notes for assay submissions:
  • If you have chosen to hold your assay data private, but you would like to provide access to select individuals such as collaborators or reviewers, you can create a URL that provides temporary access to a given on-hold assay submission (i.e., to a given AID and its associated substance records). You can then share it with the desired individuals. This function was added in order to facilitate collaboration (whether external or in-house) and administrative work such as publication and grant processing.

  • An assay cannot be released to the public before all of its associated substances (SIDs) are public. (Validation checks are performed on the release dates of an assay and its associated substance records, and an error is generated if a request is made to release an assay before all of its SIDs are public.) Therefore, if you would like to release your assay data to the public, you may need to do two steps if both the assay data and the substances are on-hold:

    1. Release the substances: Open your PubChem Upload home page/"Substances" folder tab/"On Hold" subfolder, and type the Assay ID (AID) in the text box entitled "Release (remove Hold Until Date) Substances." That will release all PubChem Substance records that are associated with that AID.

    2. Release the assay: Once the substances are uploading to PubChem, release the assay data by following the steps described in the Assay data release section of the complete Upload Help document.
 
 
 
 
back to top Can I allow my collaborators or reviewers to access my on-hold assay data?
 
  Yes. If you have chosen to hold your assay data private (as noted in the overview/data release section of the complete Upload Help document), but you would like to provide access to select individuals such as collaborators or reviewers, you can create a URL that provides temporary access to a given assay submission (i.e., to a given AID).

To do this:
  • Login to your Upload home page
  • Open the "Assays/In PubChem" folder
  • Mouse over the assay submission (AID) of interest to view the available actions (e.g., view in PubChem, use as template, modify, revoke, export SID list, and on-hold access).
  • Click on the option for "On-Hold Access".
  • Click on the "Create URL" button. That will open a new folder tab with the temporary URL that points to the on-hold assay. You can copy/paste that URL into an e-mail to the desired individuals (e.g., collaborators, reviewers) in order to enable them to access your on-hold assay. The dialog box also provides the following options:
    • Share: The "Share" button simply opens a new folder tab with the URL.
    • Expiration Date: By default, the expiration date for the URL is 90 days from the day on which it was created. If the Expiration date text box is left blank, the default 90 days will be applied. You can change the expiration date, if desired.
  • The temporary URL will expire on the expiration date, or when the data comes off of its on-hold status, whichever comes first.
Please note that:
  • The ability to create a temporary URL to on-hold assay data was added to the PubChem Upload system in order to facilitate collaboration (whether external or in-house) and administrative work such as publication and grant processing.
  • The PubChem Upload system allows you to create a temporary URL only after your assay has been deposited to PubChem and has received an Assay identifier (AID). (The complete Upload Help document describes the assay submission steps, the last of which is deposition into PubChem and assignment of an AID.)
  • You can create, and then delete, a URL for a given AID as many times as you'd like. A different URL will be generated each time you create one. Each AID can have only a single URL at any given time.
  • The temporary URL will expire on the expiration date, or when the data comes off of its on-hold status, whichever comes first.
 
 
 
 
back to top How can I update a previous substance submission? Will SID remain the same?
 
  The general approach for updating any PubChem record you have submitted is as follows:
  1. Login to PubChem Upload. That will open your PubChem Upload home page, which provides access to all of your submissions.
  2. Open the folder tab for the type of record you would like to update (e.g., substance or assay) and select the appropriate subfolder (e.g., In-PubChem).
  3. Mouse over the record you would like to modify to open a pop-up menu of actions that you can take for that record, then click on "Modify."
  4. Update the desired data elements, using any of the same flexible methods for data entry that are available for new submissions (e.g., use wizards or uploade files).
  5. Preview the record in PubChem and/or continue editing.
  6. Once you are satisfied with the changes and ready to deposit the revised record into PubChem, click "Commit."

The detailed steps for updating a PubChem Substance record that you previously submitted are as follows:
  1. Login to PubChem Upload. That will open your PubChem Upload home page, which provides access to all of your submissions.
  2. Click on the "Substances" folder tab
  3. Click on the "In PubChem" subfolder to view a full list of substance records that you have deposited
  4. Scroll through the page to locate the substance record that you would like to modify, or simply enter the substance identifier in the search box in the upper right corner of the folder to instantly find the desired record.
  5. Mouse over that record to open a pop-up menu of actions that you can take for that record, then click on "Modify." That will open a wizard that contains the data from the substance record (organized under folder tabs such as Name and Structure, Dates, Comments, Cross-References).
  6. Click on the desired folder tab and data element to edit it within the wizard. Or, if you prefer to simply upload a file that contains the modified substance record, click on the "Upload Modified Substance Record" button in the lower right corner of the wizard.
  7. Repeat step 6 to make more edits, if desired.
  8. When you are finished making edits, click "Upload Modified Record." The Upload System will then validate the modified record.
  9. If desired, you can preview the record in PubChem and/or continue editing.
  10. Once you are satisfied with the changes and ready to deposit the revised record into PubChem, click "Commit."

The SID for your PubChem substance record will remain the same, regardless of how many or what type of changes you make to the record. The unique Registry Identifier (RegID) that you associate with a record is the only piece of information that determines the SID. Once the SID is first assigned, all subsequent updates become versions of that same SID, assuming that you continue to provide the same RegID. If you provide a new RegID, a new SID will be assigned.
 
 
 
back to top Can I update the chemical structure in a previous substance submission, and how that would affect SID-CID association?
 
  Yes, you can update the chemical structure in a previous substance submission, by following the steps noted in the FAQ above on "How can I update a previous substance submission?"

As noted in that FAQ, the SID for the PubChem substance record will not change, as long as you provided the same Registry Identifier (RegID) that was originally associated with that SID.

A new chemical structure, however, will trigger the PubChem data processing pipeline to recalculate the association between the substance record and a newly associated PubChem Compound record. Depending on the nature of the change to the chemical structure, there may or may not be a change to the SID-CID association. For example, if the structure is changed to a different tautomeric form, it may resolve back to the same CID. On the other hand, if the structure's stereo flags or protonization are changed, it will be regarded as a different structure and the SID-CID association will change. Finally, if PubChem changes the standardization rules between updates, that could result in a change of CID for the same SID structure, although that is unlikely.

As an example of a chemical structure change, let's say a depositor submits a PubChem substance record with a regid of "my-regid-1234" and includes the structure for aspirin and the synonym "aspirin" in the submission. At a later time, the depositor updates that record, using the same RegID, by replacing the original chemical structure and the synonym "aspirin" with the chemical structure for caffeine and the synonym "caffeine." The SID will not change because the RegID remained the same. However, the CID associated with the record will change in this example because the chemical structure changed. If there is no structure in the updated record, or if there is some problem with the structure, then this SID will have no associated CID.
 
 
 
back to top How can I add the PubMed ID (PMID) of my recently published paper to my PubChem BioAssay or Substance record?
 
  The steps for adding a PMID to a bioassay record differ slightly from those for adding a PMID to a substance record. This is because the PubChem Upload user interface is adapted to accomodate the differences in the nature of assay and substance data. However, the main concepts are the same in both cases: (a) choose the PubChem record you'd like to modify, (b) select the type of cross-reference you would like to add (in this case, a PubMed ID), then (c) enter the ID for the record to which you would like to make a cross reference. Note that you can use a similar set of steps to add other types of cross-references, such as those to genes, biosystems (pathways), 3D structures (from the Molecular Modeling Database, MMDB), and more.

To add a PubMed cross-reference to PubChem BioAssay record, follow these steps:
  1. Login to PubChem Upload
  2. Click on the "Assays" folder tab
  3. Click on the "In PubChem" subfolder to view a full list of bioassay records that you have deposited
  4. Scroll through the page to locate the bioassay record you would like to modify
  5. Mouse over that record to open a pop-up menu of actions that you can take for that record, then click on "Modify"
  6. Scroll down to the Cross-References section, and click on "Edit" to bring up a spread sheet that lists the existing cross-references. The spread sheet will be blank if no references have been provided yet for that assay.
  7. Click on "Add More Rows" to create a new row.
  8. Open the "XREF_TYPE" menu and select the option for "PubMed Id (PMID)."That value will then automatically appear under the 'XREF_TYPE' column in the newly created row of the spread sheet.
  9. Enter the PMID number of the publication under the 'XREF_VALUE' column.
  10. Repeat steps 8-9 to add more PMIDs, if needed.
  11. Click "Save" when finished.
  12. Click "Validate."
  13. Once the submission passes validation, click "Commit."
To add a PubMed cross-reference to PubChem Substance record, follow these steps:
  1. Login to PubChem Upload
  2. Click on the "Substances" folder tab
  3. Click on the "In PubChem" subfolder to view a full list of substance records that you have deposited
  4. Scroll through the page to locate the substance record that you would like to modify, or simply enter the substance identifier in the search box in the upper right corner of the folder to instantly find the desired record.
  5. Mouse over that record to open a pop-up menu of actions that you can take for that record, then click on "Modify"
  6. Click on the Cross-References folder tab, which lists various types of cross references that can be added (e.g., PubMed ID, GenBank ID, OMIM ID, Gene ID, BioSystem ID).
  7. Click on the type of cross-reference that you would like to add (in this case, PubMed ID), and enter the PMID number of the publication that you would like linked to your substance record.
  8. Repeat step 7 to add more cross-references, if desired.
  9. When you are finished adding cross-references, click "Upload Modified Record." The Upload System will then validate the modified record.
  10. If desired, you can preview the record in PubChem and/or continue editing.
  11. Once you are satisfied with the changes and ready to deposit the revised record into PubChem, click "Commit."
 
 
 
back to top How can I add a cross-reference from my PubChem Substance or BioAssay record to related data such as a gene, biosystem (pathway), 3D protein structure, or other data type?
 
  To add cross-references from your PubChem BioAssay or Substance record to related data such as genes, biosystems (pathways), 3D protein structures (from the Molecular Modeling Database, MMDB), and more, simply follow the steps listed in the FAQ on how to add the PubMed ID (PMID) of a recently published paper to your PubChem BioAssay or Substance record. The only difference occurs at the step where you select the type of cross-reference you would like to add. Instead of choosing PubMed ID, select the other type of cross-reference that you want.

Note that the steps for adding cross-references to bioassay records differ slightly from those for adding cross-references to substance records. This is because the PubChem Upload user interface is adapted to accomodate the differences in the nature of assay and substance data. However, the main concepts are the same in both cases: (a) choose the PubChem record you'd like to modify, (b) select the type of cross-reference you would like to add, then (c) enter the ID for the record to which you would like to make a cross reference.
 
 
 
back to top What is a DSN (display name) and DSN-ID, and can either one be changed?
 
  DSN stands for "data source name," also sometimes referred to as "display name." It is the name of the data depositor, which is displayed in the "Data Source" (or "Source") line on PCSubstance and PCAssay search results pages, as well as in the detailed displays of individual PubChem records. The DSN is specified by the depositor and can be changed by the depositor.

DSN-ID is a numeric identifier for a given PubChem data source (depositor). A DSN-ID is assigned by pubchem to each depositor, and it remains stable, regardless of how many times the display name might have been changed by the depositor. (Some DSN-IDs assigned in past years were alphanumeric, and those, too, have remained (and will remain) stable. However, PubChem now assigns only numeric DSN-IDs to new data depositors.)

If a DSN has been changed over time, a user can search by any one of the old display names, or by the current display name in the [SourceName] field, and find all the records that were submitted by that depositor.

For example, each of the following searches will retrieve the same set of PubChem Substance records. They represent three different data source names, in chronological order, that have been displayed over time for the same depositor:

SASTRA[SourceName]
"Quorum sensing and Peptidomimetic Laboratory, SASTRA"[SourceName]
"SASTRA University, Quorum sensing and Peptidomimetics Laboratory"[SourceName]

NOTE: Only the last display name will work if it is searched in the [CurrentSourceName] search field:
"SASTRA University, Quorum sensing and Peptidomimetics Laboratory"[CurrentSourceName]
The two previous names will not work if the search is limited to the CurrentSourceName field.

 
 
 
 
back to top What is the difference between Upload ID, Registry ID, SID, CID, and AID?
 
 

There may be several types of unique identifiers (UIDs) associated with PubChem records:

  • Upload ID - An Upload ID is similar to the bar code used to track a package that is being shipped from a starting point to a destination. It is used to track your data while they are being transported through the submission process until they reach PubChem. A "package" (i.e., an Upload ID) can contain one to many substances, or a single assay. Additional details:

    • Whether you begin a new submission from scratch, use an existing PubChem record as a template for a new submission, or modify your records (modify substances or modify assays) that are already in PubChem, the PubChem Upload system will assign an Upload ID to the data set that you are submitting or updating. This enables Pubchem to track your data through the submission/update process, and also enables you to save an in-progress submissions or modifications and return to them at a later time.

    • Your Upload IDs are visible only to you through your PubChem Upload Home Page (in the Substances/Pending, Substances/SubmissionHistory, and Assays/Pending subfolders).

    • For substance submissions, the Upload ID can be associated with one to many substances. That is, a new substance submission can be used to deposit one to many small molecules. A single Upload ID will be assigned to that data set. When the submission is committed to PubChem, each small molecule will receive its own PubChem Substance Identifier (SID). Once the molecules are in the PubChem Substance database, only the SIDs are shown in the PubChem records. The association between the Upload ID and corresponding SIDs is visible only to you, in the Substances folder tab, Submission History subfolder, of your PubChem Upload Home Page.

    • For assay submissions, each Upload ID is associated with a single assay submission. When the submission is committed to PubChem, the assay will receive a PubChem Assay Identifier (AID). (Therefore, for each Upload ID, there is only one corresponding AID.) Once the assay record is in the PubChem BioAssay database, only the AID is shown.

  • Registry Identifier (RegID) - Each substance, and each assay, must have an identifier (also known as "Registry ID," "RegID," "External ID," or "PUBCHEM_EXT_DATASOURCE_REGID") that you have assigned to it, and that must be unique among the substance and assays that you (i.e., that a given data source) have submitted to PubChem. The substance data specifications and bioassay data specifications provide additional details about the Registry ID for each type of data. This identifier is sometimes referred to as an "external registry ID" because it is assigned by the external data source (i.e., by you), and not by PubChem.

  • Substance Identifier (SID) and Assay Identifier (AID) - Once your data are "committed" to PubChem, each substance receives a stable Substance Identifier (SID), and each assay receives a stable Assay Identifier (AID) from PubChem. At that time, the Upload ID is no longer very important.

  • Compound Identifier (CID) - If the chemical structure of a substance passes the PubChem standardization procedure, the substance's Registry ID and SID will be mapped to the Compound Identifier (CID) for the corresponding standardized structure in the PubChem Compound database. (The PubChem Substance/Compound Summary Page help document describes the difference between substance and compound records.)

  • Upload IDs for data updates/modifications -- Upload IDs are also assigned to data that are being updated/modified, so they, too, can be tracked during the update process. As is true for submissions, a single Upload ID for tracking data modifications can be associated with a set of one to many substances that are undergoing revision, or a single assay record. Once the updated records are committed to PubChem, they retain their original SIDs and AIDs.
 
 
 
back to top When I upload a very large file, I sometimes see a time out or bad gateway error message. What should I do?
 
  If you are uploading a very large file through a web browser interface, it is possible you might see an error message such as a time out or bad gateway. This usually means that your browser timed out before it could get notification back to you confirming that your file was uploaded. If you simply close the pop-up that displays the error message and wait a few minutes, you should see that the file has been uploaded. After a few minutes, refresh your original window, which should display a message such as "validating" or "validated." If you see a message such as "parsing failed," then the file might have had some problem. But generally, PubChem Upload eventually receives everything in large files, even if a time out error message is displayed during the upload process.

As an alternative, you can use FTP to upload very large files. Details about requesting an FTP account and the procedures for FTP submissions are provided in the complete PubChem Upload help document. Please note, however, that FTP is mainly intended as a way to begin a new submission, and not necessarily as a way to add to an existing submission.

 
 
 
back to top How can I contact PubChem with questions about data submissions?
 
  If you have questions about PubChem data submissions, or if you would like to request an FTP account for large scale data submissions, please write to pubchem-deposit-help@ncbi.nlm.nih.gov.
 
 
 
 
 
 
 Revised 11 December 2014