NCBI PubChem logo
PubMed Entrez Structure GenBank PubChem Help


PubChem Deposition Gateway


This site allows users to test data exchange for deposition of chemical structure and/or bioassay data into PubChem, and to provide data to be added to PubChem. Please obtain an account for login before you start. PubChem Deposition Gateway Accounts come in two types, Test and Deposition.
 
1. Getting Started 2. PubChem Deposition Gateway 3. PubChem Deposition Gateway FAQ's

4. FTP Depositions & FAQ

5. PubChem Deposition Documents and Examples


1. Getting Started Top of Page
1.1 Logging into the PubChem Deposition Gateway Top of Page
  • New User
  • Existing User
1.1.1 New User: If you are a new user to the PubChem Deposition Gateway, you will need to create an account. We would strongly recommend that you start with a Test Account. You can proceed to create a Test Account by clicking the "Create Test Account" button.

If you already have a "Test Account" and wish to proceed to putting data into PubChem, you should create a "Deposition Account". To make a "Deposition Account", press the "Create Deposition Account" button.

1.1.2 Existing User: If you already have an account, you may login to begin using the PubChem Deposition Gateway.

Please enter your username in the text box labeled "Username:" and your password in the text box labeled "Password:", then push the "Log In" button below the previously mentioned text boxes.

If you have forgotten your password, please click "Forgot Password?". You will be prompted for your "Username". An e-mail will be sent to the primary contact for that account with further instructions on how to proceed.


1.2 Creating a New Account Top of Page

Choosing an Account Type
When creating a new account you have two types from which to choose:
  1. Test Account Type:
    A Test account allows a user to go through all the steps of uploading substance and/or bioassay data. Anyone with a valid email account can get a test account. The purpose of a Test account is to allow potential depositors to validate that their data is properly suited for submission into the PubChem Deposition Gateway, without actually putting any of the data into the public PubChem system or signing off on the PubChem deposition agreement. Any data submitted via a Test account will remain accessible for up to one week.

    For instructions on creating a Test Account, continue here.

  2. Deposition Account Type:
    A Deposition account, like a Test account, begins the deposition process by allowing a user to upload and validate substance and bioassay data. The Deposition account, however, is the only way you can finish this process to make your data public in PubChem.

    Deposition accounts require more detailed information to setup and may involve direct contact with PubChem administrators for approval. Furthermore, Deposition account users must agree to and are bound by the PubChem Deposition Agreement. This agreement gives PubChem the right to redistribute the information deposited. It is important to note that all data deposited into PubChem is covered under fair-usage and, as such, does not require the depositor to assign away copyrights or other ownership prior to PubChem deposition, i.e., the deposited data does not have to become part of the public domain.

    Organizations with multiple data collections may need multiple deposition accounts. PubChem requires a separate Deposition account with a unique username for each unique data source. If you have multiple substance collections that need to be treated independently within PubChem, you will need separate Deposition accounts for each. For example, the "NIST" and "NIST Chemistry WebBook" substance collections are both generated by the organization NIST. Note, however, that if you have one collection but many users that are allowed to make depositions, we now have a mechanism to assign multiple logins. If you are not sure whether you need multiple deposition accounts, please contact the PubChem Deposition Help Desk.

    For instructions on creating a Deposition Account, continue here.


1.2.1 Creating a Test Account
A Test account allows a user to go through all the steps of uploading substance and/or bioassay data. To setup a Test account please follow three simple steps:
  1. Fill out all form items, those denoted with a "*" are required.
  2. Press the "Register" button.
  3. Go to the URL link, typically by clicking on it in the e-mail you receive from the PubChem Deposition Gateway
After these three steps are complete, you may login and begin using the PubChem Deposition Gateway. If you do not complete Step 3 within 24 hours of completing the first two steps, you will need to start again from Step 1.

Test Account Information
  • Username
  • E-Mail
    • Notify
  • Password
  • First Name
  • Last Name
  • Additional Information
  • Terms of Use

Username
Choose a username. If the username you request is taken, you will need to provide a different one. The username you provide must be at least six alphanumeric characters long and cannot contain spaces.

E-Mail
Before your Test Account is activated for use, an e-mail will be sent to this address with further instructions.
Notify About Submission Status Changes
This is a pull-down menu that allows you to choose whether or not you would like automated e-mails sent to you as your submission is processed or as its status changes. There is a third option for
multi-user deposition accounts entitled "My submissions only" which means that you will only be mailed for submissions you have personally initiated. You can modify this notification setting in your Account Info after you login.
Password
Choose an account password. You must type the same password in the text boxes "Password" and "Confirm Password". Please commit this password to memory, as you will need it every time you attempt to login to the PubChem Deposition Gateway.

The password you provide must be at least six characters long and cannot be your username.

First Name
Your first name.

Last Name
Your last name.

Additional Information
Please type any additional information or notes in this text box.

1.2.2 Creating a Deposition Account
A Deposition account allows a user to go through all the steps necessary to put substance and/or bioassay data into PubChem. To setup a Deposition Account please follow three simple steps:
  1. Fill out all form items (those denoted with a "*" are required); then click the "Register" button.
  2. Within a few minutes you will receive an e-mail from the PubChem Deposition Gateway. Click on the URL link inside of it to confirm your e-mail address. Curators will then review your account and may contact you to verify your account information.
  3. Within a couple of days you will receive a second e-mail from the PubChem Deposition Gateway. Upon receipt, login to your account at our website. At this point you must agree to a Data Transfer Agreement (DTA), typically by clicking on the button when you first login.

After these three steps are complete and your information is reviewed and verified by a PubChem administrator, you may receive an e-mail notifying that you may login and begin using the PubChem Deposition Gateway. It is possible that you may be contacted by phone or e-mail by a PubChem administrator during this process. If you do not complete Step 3 within 24 hours of completing the first two steps, you may need to start again from Step 1.

Multiple Users on One Account
It is now possible to create one deposition account that contains multiple users, each having their own login and password.
  • How this works
    • Separate logins
      You can now have an arbitrary number of users each with her own password on one deposition account. No one can see the password of another user, but there is one primary user who can add and remove other users.
    • Separate deposition tracking
      For each action that a user takes, the processing history for a substance or assay deposition will record that user's id. The user who initiates a deposition, however, will remain associated with the deposition even if others work on it. The pending listing, for example, of that substance or assay deposition will show that user as the owner.
    • Joint access
      Though login and tracking is distinguished, access is not. All users from the same data source will see all of their depositions whether they initiated them or not. Users are also free to make follow-up actions on any submission from their data source. For example, if jane_doe deposits a file of substances that pass validation, juan_carlos (from the same data source) may commit the submission for publishing or for that matter may even delete it.

  • How you implement it
    If you would like to have multiple users for your data source, please follow these steps.
    1. Choose one person to be the administrator and have that person create the initial deposition account as described. If you already have a deposition account and would like to add users, skip to the next step.
    2. Your administrator must login and goto the Account Info > Contacts tab and click on the Add Contact View on the left side. Fill out this form for each additional user. You can have the user enter her password while the administrator is present or the admin can enter a temporary and the user can change it afterwards.

Deposition Account Information
  • Username
  • E-Mail
    • Notify
  • Password
  • First Name
  • Last Name
  • Data Source
  • Company/Organization
  • Job Title
  • Phone Number
  • Street Address
  • City
  • State, Province or Area
  • ZIP
  • Country
  • Additional Information
  • Terms of Use

Username
Choose a username. If the username you request is taken, you will need to provide a different one. The username you provide must be at least six alphanumeric characters long and cannot contain spaces.

E-Mail
Before your Test Account is activated for use, we will send you an e-mail to this address with further instructions.
Notify About Submission Status Changes
This is a pull-down menu that allows you to choose whether or not you would like automated e-mails sent to you as your submission is processed or as its status changes. There is a third option for multi-user deposition accounts entitled "My submissions only" which means that you will only be mailed for submissions you have personally initiated. You can modify this notification setting in your Account Info after you login.
Password
Choose an account password. You must type the same password in the text boxes "Password" and "Confirm Password". Please commit this password to memory, as you will need it every time you attempt to login to the PubChem Deposition Gateway.

The password you provide must be at least six characters long and cannot be your username.

First Name
Your first name.

Last Name
Your last name.

Data Source
Your Data Source consists of a Display Name, which you set initially and can reset, and it also consists of a Data Source Id (a number), which PubChem sets automatically.

The Data Source Display Name consists of one or more words to uniquely distinguish data coming from an organization; it is prominently displayed on all PubChem substance and bioassay records. PubChem users may search with your full Display Name or with keywords from it to find your data. We advise that this name be short but informative, such that it includes common names by which your organization is known.

In addition, you will be automatically assigned a Data Source Id (a number), which cannot be changed even though your Display Name can be changed. This Id is used to track all of your data in PubChem and is what must be used to identify your data records even though it is your Display Name which will be visible to PubChem users on our webpages.

Organization (Data Source) URL
This should be the appropriate home page for people wanting more information on your organization.

Company / Organization
The Company or Organization name associated with this Deposition Account. Please include the division or group name, if appropriate.

Job Title
Your job title or position within the company or organization you represent.

Phone Number
The phone number where you and your organization can be reached.

Street Address
City
State, Province, or Area
ZIP

Country
Your physical or legal address to which correspondence may be sent.

Additional Information
Please type any additional information or notes in this text box.

Data Transfer Agreement
To apply for a deposition account, you must agree to a Data Transfer Agreement (html format) for this website as your data will be released to the public in PubChem (unlike a test account). To agree, you must make sure the "I agree" check box is checked on your first login. Note, this check box does not appear on the first registration page, but after your account has been reviewed and you login for the first time. If your organization requires modifications to the Data Transfer Agreement, please contact a PubChem Curator.


2. PubChem Deposition Gateway Top of Page

Uploading and managing depositions is fairly straightforward. First, you will need to familiarize yourself with the required file format documentation and review the PubChem Deposition Gateway help documentation.

Essentially, the deposition process consists of uploading an appropriately formatted PubChem data file. The file contents are subjected to several stages of validation. After all validation checks are complete, an e-mail may be sent notifying you to review your submission.

After your review, PubChem Deposition Gateway users with a Deposition Account may "Commit" their successfully validated data. Typically, your committed data will be made available in PubChem within two days. It is possible, however, that your committed data availability may be delayed, especially if you are notified that there is further action required on your behalf.

The PubChem Deposition Gateway provides the means for you to manage and review your depositions. You may delete or review prior submissions. If you have a Deposition Account, you may also generate reports of prior submissions and retrieve the PubChem Substance identifiers (SID) for your successful depositions.

When you enter the PubChem Deposition Gateway, you will see, immediately below the heading "PubChem Deposition Gateway", a navigation bar with various tabs and icons. You may click on the tab text or icons at anytime for navigation. These tabs are: At the time of login, you will always default to the "Welcome" tab.

2.1  Home Tab
Clicking the "Home" Tab gives you a screen listing short descriptions of the main activities from which you can choose:
  • Substances - Deposit new or resume suspended submissions
  • Assays - Deposit new or resume suspended submissions
  • Account Info - Review your account information/preferences
Please note that you may also use the main navigation bar tabs with identical labels as the links, as they perform the same function.

2.2  Substances Tab
Clicking the "Substances" Tab puts you onto the substance welcome page.

2.2.1  Substances > Welcome Tab
The substance welcome page lists the main substance deposition activities from which you can choose:
  • New - Create chemical structure records for deposition to PubChem
  • Pending - Resume unfinished depositions
  • Deposited in PubChem - Access previous depositions in PubChem
Please note that you may also use the main navigation bar tabs with identical labels as the links, as they perform the same function.

2.2.2  Substances > New Tab
Clicking on Substances > New Tab opens up a third row of tabs, which provide options for new record creation.

2.2.2.1 Substances > New > Upload File Tab
"Upload File" Tab provides an interface for uploading a file into the PubChem Deposition Gateway.

The format of the file you upload is expected to be in SD File format. (However, we also alow a CSV format using SD tags as described below.) Each substance in the SD file must have a unique registry ID in the appropriate SD field. A description of the allowed and required SD fields is available . For examples of suitable SD files for deposition, see this SD file.

For CSV input, all the SD tags are allowed, and they serve as column headers. The SD tags can come in any order, with the exception of the PUBCHEM_EXT_DATASOURCE_REGID, which should always be in the first column. Each substance can take up one line only. When an SD tag is accociated with multiple values, they are separated with a newline character. (Hint: when entering data with Excel, use Alt-Enter to create a newline in data cell or double click a data cell before pasting multiple line entry into it.) For an example of a CSV file for deposition, see this CSV file. This file can be imported into any spreadsheet program for viewing or editing.

Press the "Browse..." button to select a file to upload to the PubChem Deposition Gateway. After selecting a file, provide comments in the "Comments:" text box that will help you track this deposition and, perhaps, provide useful information to the PubChem Deposition Gateway administrators. Please remember to press the "Submit" button after you have selected the appropriate file and provided necessary comments. When the file transfer is complete, you will be transferred to "Pending" displaying this submission.

Please note that the file you upload to the PubChem Deposition Gateway may be compressed. Compressing your file may substantially reduce the time it takes to transfer your data. We support files compressed using the "gzip" compressor. Please note that we do not support "zip" or "bzip2" compressed files.

2.2.2.2 Substances > New > Fill in Form Tab
Clicking on "Fill in Form" Tab produces a substance entry form that allows one to create a single substance record for the deposition to PubChem. When the form is filled out and submitted, a new deposition containing one record is created.

Top of Page

2.2.2.2.1 Substance Form Fields
The substance entry form contains the fields allowed in the SD file format. You can review these fields in thisdocument. Note that the only required field is the "Substance Name (External Registry ID)".

2.2.2.2.2 Importing/Exporting an SD File
Clicking on Import button prompts the user to "Browse" for a desired SD File on their computer. The substance record (including chemical structure if any) contained in the selected file is then read into the form and can be edited as needed. Note that if the file contains more than one record, only the first one would be imported into the form.

Clicking on Export button gives the user a choice to either save or view an SD File that would be produced if the form was submitted to PubChem.

2.2.2.2.3 Chemical Structure Input
In addition to importing an SD File into the form, there are two more ways to provide a chemical structure:
  • Sketch it - click on the structure image area or the "Edit" button to open the PubChem Sketcher.
  • Use CID, SMILES, or InChi - enter the string into the textbox below the structure image area and click "Apply".

    2.2.2.3 Substances > New > Revoke Substance Tab
    Clicking on "Revoke Substance" Tab produces a screen that allows one to create a single revoke record for the deposition to PubChem. When the form is filled out and submitted, a new deposition containing one record is created.

    2.2.2.4 Category Assignment
    When substances are deposited in PubChem, the depositor category will be assigned to all substances. Based on the depositor's category, users can expect to find additional category-specific information either on the PubChem substance summary page or on the depositor's site.

    The different categories and their descriptions are the following:

    StatusMeaning
    Biological Properties Depositor provides information about the biological properties of a substance or compound
    Chemical Reactions Depositor provides information about the reactivity, synthesis, or known reactions of a substance or compound
    Imaging Agents Depositor provides information about the contrast agent or imaging agent used in, for example, MRI's
    Journal Publishers Depositor is a journal publisher and has articles published about a substance or compound
    Metabolic Pathways Depositor provides information on the metabolic pathways involving a substance or compound
    Molecular Libraries Screening Center Network Depositor is part of the NIH Molecular Libraries Screening Center Network (MLSCN)
    NIH Substance Repository Depositor is an NIH Molecular Libraries Small Molecule Repositor servicing the MLSCN
    Physical Properties Depositor provides information about the experimental physical properties of a substance or compound
    Protein 3D Structures Depositor provides information about the experimental 3-D structure of a substance or compound
    Substance Vendors Depositor is a seller of a substance or compound
    Theoretical Properties Depositor provides information about the theoretical properties of a substance or compound
    Toxicology Depositor provides information about the toxicological properties of a substance or compound

    2.2.3  Substances > Pending Tab
    This tab gives you a list of your unfinished or recently added depositions to PubChem.

    A "Filter by Status" pull-down menu allows you to filter your submissions by multiple criteria. By default, all submissions, "Any Status", are shown. The other filter criteria are:
    1. Failed - Submissions that failed validation checks at any processing stage
    2. Processing - Submissions being validated by the PubChem System
    3. Commit Required - Submissions validated by PubChem and awaiting your Commit
    4. Committed - Submissions committed by you awaiting PubChem approval
    5. Rejected - Submissions rejected by a PubChem administrator
    6. Approved - Submissions approved by a PubChem administrator
    7. Deposited in PubChem - Submissions now publicly available in PubChem
    Below the "Filter by Status" is a summary line denoting the number of submissions meeting the filtering criteria. Furthermore, the summary line includes the total count of records parsed, substances standardized, and processing (error, warning, and informational) messages. Please note that substances standardized count refers to chemical structures considered valid by PubChem.

    The pending submission table columns provide a summary of each deposition:
    ID
    The submission id number used in the Deposition Gateway.

    Submitter
    The person who initiated this submission. This is only present for accounts which have multiple users.

    Started
    The date and time on which you initiated your submission.

    Status
    Summary of the submission status denoted by a graphical depiction. Each dot represents a particular phase of the submission process. Green dots denote success. Red dots denote failure. Blue dots denote an uninitiated submission phase. Yellow dots denote a submission phase in-progress. When all the dots are green, your deposition is in PubChem.

    Data Set
    The name of your data set, which, if you went the file upload route, is most likely the name of your uploaded file, unless you have subsequently modified the deposition.

    Records
    The total number of substances uploaded in this submission.

    Curator
    The person handling your deposition (typically assigned after you have committed it). Unlike the other fields, this field points to the curator's email address.

    Select
    A checkbox used to denote a deposition as selected for merging with other depositions

    The "Merge" procedure joins the records in two or more depositions into one new deposition. The original depositions are not retained. The resulting deposition appears in the "Pending" depositions list, and can be further edited, split, added to, merged, etc.

    The "Merge" is performed by selecting (checking the boxes in the list) the depositions to be merged and then clicking the "Merge" button.

    Note that the "Merge" feature is only available for depositions that have not yet been committed. Also note that as of now, the depositions are fully re-processed after a merge, and therefore may take some time to reach the "Commit" stage again.
    With an exception of "Select" column, the pending submission table column headings can be used to sort the table. For example, clicking on "Started" will sort the table by date. Clicking the "Started" column header for a second time will reverse the order of the sort.

    Clicking on any row will put you in the Validation Summary View under a dynamically-created Pending <Submission-Id> Tab for the corresponding submission.

    2.2.4  Substances > Pending <Submission-Id> Tab
    This tab is created when viewing any details for a particular submission in process.

    A. Submission Page Overview
    In this section the basic elements common to all views for a particular submission will be explained.
    Submission Page Overview
    To Proceed box
    On the left side below the tabs, this box gives a hint for the next
    step needed to push your submission forward towards deposition.
    Submission To Proceed Box
    Progress Meter
    In the middle below the tabs, this meter gives information about the
    current state of the deposition.
    Submission Progress Meter

    The progress meter shows a graphical timeline of your deposition. The main stages of the process are written above the meter. Each step may have multiple actions that must be completed before going on to the next step. The specific action that the system is undergoing at the moment is written immediately below the meter.

    The Summary information below the Progress Meter provides the Submitted timestamp of your upload and a summary of processing statistics. Critical statistics will be highlighted in red.

    The phases of the submission process, in order, are:
    1. Submit
    2. Standardize
    3. Approve
    4. Deposit in PubChem

    Submit
    The Submit processing phase includes your initial file submission upload and the system's syntax validation of your submission. If your file is incorrectly formatted, you can consult the appropriate Views for more information, but you must fix and resubmit the file in order to proceed.
    Successful completion of this phase automatically initiates the next processing phase.

    Standardize
    The Standardize processing phase performs several actions on the data. It begins by examining the data provided by your submission. The chemical structure information is validated and standardized for use within PubChem. Your textual information is also examined and validated. Substances that fail this standardization process will not be assigned a CID. It is possible that this phase will fail for some poorly formatted files. If your substances do not have chemical information available, then it is expected and normal that they will "fail" Standardization. This is the only action in which failing may not require your intervention. In addition to this action, the Standardize phase also includes additional validation checks.

    The first validation action crosschecks your submission with other submissions you have within the PubChem Deposition Gateway. These crosschecks detect duplicate registry ID's and duplicate structures.

    The second validation action crosschecks your submission with previous submissions already deposited in PubChem. These crosschecks detect duplicate registry ID's and duplicate structures.

    The Standardize phase is initiated automatically after successful completion of the Submit phase.

    Approve
    The Approve processing phase is available only to deposition account users and must be initiated by the depositor by pushing the "Commit" button. This phase cannot be initiated if any previous processing phase has failed.

    The Approval phase is completed by a PubChem registrar. The registrar examines the deposition for completeness and, after a brief review, either approves or rejects the submission for addition to PubChem. A PubChem registrar may contact you by phone or e-mail concerning your submission.

    If the PubChem registrar approves the submission, the final Deposit in PubChem phase is initiated automatically.

    Deposit in PubChem
    The Deposit in PubChem phase is the final processing stage, where your submission is actually loaded into the PubChem system. A report, which you may download, is generated from this loading process. After your data is successfully loaded into PubChem, the data will usually become publicly accessible within two business days.

    Pending Deposition Views Available
    1. Validation Summary View
    2. Validation Details View
    3. List All View
    4. List Failed Standardization View
    5. PubChem Structure Preview View
    6. History View

    A. Validation Summary View
    Clicking the Validation Summary View provides a summary table of the unique message categories encountered during your deposition. The summary table columns are:
    1. Severity - Submission message severity
    2. Category - Submission message category
    3. Count - Count of submission messages of the category type
    4. Phase - Submission processing phase where the message was encountered
    5. Split Records - An icon , clicking of which results in splitting off records with this particular issue into a new deposition.

    The column headings can be used to sort the table. For example, clicking on "Category" will sort the submission table by category. Clicking the "Category" column header a second time will reverse the order of the sort.

    Clicking on text in a row will transfer you to the Validation Details View filtered to show you only those records with that particular message category corresponding to the row you click.

    B. Validation Details View
    Clicking the Validation Details View provides a detailed list of all messages generated by the PubChem processing. The details table columns are:
    1. Severity - Submission message severity
    2. Category - Submission message category
    3. Message - Submission message
    4. Record - SD file record number
    5. Line - SD file line number that generated the message
    6. Phase - Submission processing phase where the message was encountered
    7. Edit Records - Icons and for editing or deleting of records.
    8. Split Records - An icon , clicking of which results in splitting off this particular record into a new deposition.
    The column headings can be used to sort the table. For example, clicking on "Message" will sort the submission table by category. Clicking the "Message" column header a second time will reverse the order of the sort.

    To filter the details table by category of validation message, return to the Validation Summary View. To filter the details table for a substance, proceed to the List All View.

    Clicking on any text in a row will, typically, spawn a new browser window. Depending on the context of the message, this window will display different information. If the message is in the context of the records you uploaded, the window may display the SD file record number, a depiction image of the SD file record, and the uploaded SD file record, prefixed with the SD file line number. Alternately, if the message context refers to collision with a different submission, you will view the submission record from that deposition.

    C. List All View
    Clicking the List All View provides a summary view of the processed data records with counts of messages associated with that record. The processed records table allows for rapid navigation to particular records in your submission. The standardized table columns are:
    1. Record - Sequential record number of your submission
    2. Depiction - Thumbnail depiction for the submitted substance
    3. RegID - Your unique registry ID for the substance
    4. Comments - Your comments provided at the time of submission.
    5. Edit Records - Icons and for editing or deleting of records.
    The column headings can be used to sort the table. For example, clicking on "RegID" will sort the table by registry ID. Clicking the "RegID" column header a second time will reverse the order of the sort.

    Clicking on a row will open a new window for the PubChem Structure Preview View showing the substance record corresponding to the row you click.

    D. List Failed Standardization View
    In the case that you have substances which have failed standardization, an extra View will appear to isolate those failed substances. Clicking the List Failed Standardization View displays the same information as the List All View filtered for failed records only.

    Please note that if your substances do not have chemical information available, then it is expected and normal that they will "fail" Standardization.

    As with the List All View, clicking on a row will open a new window for the PubChem Structure Preview View showing the substance record corresponding to the row you click.

    E. PubChem Structure Preview View
    Clicking a particular substance record in the List All View or the List Failed Standardization View opens a summary page of complete individual substance records. The display of your substance closely resembles how the submitted data will appear in the PubChem Substance Summary CGI. The URL links are available on this page so you can test the links back to your website to verify that the URL's work as intended. Additionally, you can see how PubChem processing affects the data you have provided.

    Various export buttons are available to allow you to examine the PubChem data records generated for your substance in ASN.1, XML, and SDF file formats.

    F. History View
    Clicking the History View displays a detailed chronology of the processing phases for the submission. The history table columns are:
    • Originator - Your unique registry ID for the substance
    • Date - Timestamp when a particular stage has completed
    • Stage - Graphical depiction of the stage
    • Comments - Processing stage specific comments.
    The column headings can be used to sort the table. For example, clicking on the column "Date" will sort the submission table by date. Clicking the "Date" column header a second time will reverse the order of the sort.

    Action buttons
    • Download Data Set Icon
    • Add a New Substance Icon
    • Add a Revoke Record Icon
    • Split Listed Below Icon
    • Save Report Icon
    • Delete Icon
    • Commit Button
    Commit Button
    Clicking the Commit Button enables you to deposit the submission in PubChem. Your submission will be reviewed and, if approved, will be made public in the PubChem data system.

    Download Data Set Icon
    Clicking the Download Data Set Icon will allow you to download the data you submitted.

    Add a New Substance Icon
    Clicking on Add a New Substance Icon redirects the user to the Substance > New > Fill in Form Tab, and the record created there gets appended to the current deposition.

    Add a Revoke Record Icon
    Clicking on Add a Revoke Record Icon redirects the user to the Substance > New > Revoke Substance Tab, and the record created there gets appended to the current deposition.

    Split Listed Below Icon
    When a pending deposition is accessed via "Validation summary" View, "Split Listed Below" Icon is available. In this view any record that that has an error, a warning or just an info message appears on the list. Therefore clicking "Split Listed Below" would split off all of those records into a new deposition, and the ones remaining would have no issues of any kind associated with them.

    When a pending deposition is accessed via "Validation details" View, "Split Listed Below" Icon is available as well. In this view, however, only seleceted record with particular errors appear on the list. Therefore clicking "Split Listed Below" would split off those records into a new deposition.

    Save Report Icon
    Clicking the Save Report Icon allows you to download a report file in CSV (comma separated value) format.

    Delete Icon
    Clicking the Delete Icon enables you to delete the submission you are currently viewing from the Deposition system only. This means the submission will have no effect on PubChem.

    2.2.5  Substances > Deposited in PubChem Tab
    This tab gives you an archive list of your substance submissions which were successfully deposited in PubChem.

    Clicking on a row will take you to an Archived Submission Details View. From there you will be able to find specific substances and go to the corresponding entry in PubChem.

    The column headings can be used to sort the table. For example, clicking on the column "PC-Aid" will sort the table by that id. Clicking this column header a second time will reverse the order of the sort.

    ID
    The submission id number used in the Deposition Gateway.

    Submitter
    The person who initiated this submission. This is only present for accounts which have multiple users.

    Finished
    The date and time on which your submission entered PubChem.

    File
    The name of your uploaded datafile.

    Records
    The total number of substances uploaded in this submission.

    2.3  Assays Tab
    Clicking the "Assays" Tab puts you onto the assay welcome page.

    2.3.1  Assays > Welcome Tab
    The assay welcome page lists the main assay deposition activities from which you can choose:
    Please note that you may also use the main navigation bar tabs with identical labels as the links, as they perform the same function.

    Choosing an Assay Action
    Perhaps the biggest change in how the Deposition Gateway handles assay depositions is that now you choose from four distinct assay actions whenever you want to affect your public data in PubChem. The first choice you must make is whether you want to create a new assay or modify one of your existing PubChem assays. If you want to modify a PubChem assay, you have three further choices of possible modifications:
    1. New assay
    2. Add/Change data (Modify)
    3. Alter description (Modify)
    4. Replace assay (Modify)
    You can leave your unfinished assay action at any time and return later to find it under the Pending tab. Once your action becomes part of PubChem, it will be removed from the pending table. If you'd like to make further modifications, you will choose it under the Modify tab in your list of PubChem assays.

    2.3.2  Assays > New Tab
    • Understanding Basic Concepts
    • New Assay Deposition Overview
    • New Tab Description

    Clicking the "New" Tab is the starting point for creating a new biological assay deposition in the PubChem Deposition Gateway as a means of making your data public in PubChem. We will now have a brief overview of the assay deposition process. If you would like to skip to the specific explanation of the New Tab, click here.

    Understanding Basic Concepts

    Prior to depositing biological assay data into PubChem, it is important to understand the nomenclature we use so that you and we are referring to the same elements. Please read the following paragraph to make sure we are clear on a few terms: Substance, Assay Description, Assay Data, and Activity Summary.

    An Assay Description refers to the protocol and parameters of an assay, which can only be defined once.  Assay Data are the actual results; as long as they follow the protocol of the Assay Description, Assay Data on new substances can be continually added.  A Substance is the stuff being tested; typically it is what is in an assay plate well. A Substance can be a discrete chemical entity, e.g. aspirin, or a complex mixture, e.g. a plant extract. If you think the material in two assay plate wells is the same, we ask that you refer to it as the same Substance with a single Activity Summary. If you think material in two wells differ, please refer to them as two distinct Substances, hopefully with different chemical structures (or different mixtures), and surely with distinct Activity Summaries. It is of course very common to do replicates across different batches and salt forms of a Substance when you believe the salt form to be irrelevant to activity. Your data, however, must be reduced to a single Activity Summary per substance that is submitted as an integer value: "inactive" - 1, "active" - 2, "inconclusive" - 3 (if there are indeed contradictory replicates), "unspecified" - 4, or "probe" - 5. In this way, your results will be much more accessible and understandable to users through the various searching and graphing functions of the PubChem Bioassay system.

    New Assay Deposition Overview

    There are several steps in a new assay submission that must be followed sequentially to complete the process:
    (Submit Substances) > Create Description > Add Data > Approve > Deposit in PubChem
    • Submit Substances
      This step is not mentioned on the Progress Meter but is a prerequisite for assay depositions. Before depositing assay data please make sure that all substances tested in your assay are deposited in PubChem and have SIDs assigned. In your assay data file you can refer to substances with either PubChem-assigned SIDs or your own unique RegIDs, but in order to complete the assay deposition we must be able to find a valid PubChem SID for each substance. Details on substance deposition are given in the description of the Substances tab.

    • Create/Edit Description
      An Assay Description defines the results you wish to report. You have the ability to provide a detailed description of the assay being performed. There are separate sections to provide a description, protocol, comments, annotated cross-references, result definitions, and restrictions on allowed data values. The description for a particular assay must be input before the corresponding data can be uploaded for the sake of validation. Note that once an assay description is defined, you can continually add results tested on new substances.

    • Add Data
      Once your description has been entered and verified, you upload your appropriately formatted assay data file. The data will be parsed and validated. All issues ranging from informational to critical errors will be reported back to you. If there are critical errors, you must fix them and resubmit your data file by first deleting the data file with errors. For more information, see here.

    • Approve
      If your data has no critical errors, it will be available for Preview. Either click on the 'Preview In PubChem' button or on the 'Preview' tab.

      The Preview will show you how your assay will appear to users in PubChem. This is the last opportunity to validate your data before you commit it. If you confirm that your assay action should be public in PubChem, click on the "Commit" button and a reviewer will inspect and approve it. At this stage if you find an error, you must contact the reviewer for any possible "emergency" assistance as you have already approved it for deposition. Once Deposited in PubChem, additional changes can be made by starting a Modify action.

    • Deposit in PubChem
      Once your data is approved by a PubChem curator it awaits final processing. The assay processing cycle is designed to run once a day. The processing includes additional validation steps and intensive post-processing. Please note that due to loading and synchronization schedules of the PubChem database servers, a moderate publishing delay should be anticipated. After curator approval, the data typically will become public in PubChem within 48 hours.

    New Tab Description
    Click on the New tab under the Assays tab to begin uploading a bioassay into the NCBI PubChem Deposition Gateway. If you are returning to resume working on a new assay deposition, please look under the Pending tab to find it.

    Progress Meter
    Just under the rows of tabs in the middle of the page is the progress meter.
    New Assay Progress Meter
    The progress meter shows a graphical timeline of your deposition. The main stages of the process are written above the meter. Each step may have multiple actions that must be completed before going on to the next step. A brief explanation of each step can be found in the previous section.

    Input Assay Description
    Once you have read this section and are ready to input your description, begin by choosing your method of input: Prior to uploading or entering anything, please review this help document describing the allowed file formats. Once the bioassay has been deposited, all parts of it must pass an automated validation procedure without errors in order to be accepted into PubChem. If you need to make changes to your assay after deposition to PubChem, please refer to the "Modify" tab.

    An Assay Description defines the results you wish to report. You have the ability to provide a detailed description of the assay being performed. There are separate sections to provide a description, the protocol, comments, the activity outcome method, target data, annotated cross-references, result definitions, and restrictions on allowed data values. The description for a particular assay must be input before the corresponding data can be uploaded for the sake of validation. Once your assay is defined in PubChem with an initial set of data, you can continually add results tested on new substances for the same assay description by going to the "Modify" tab.

    Descriptions can be input in a number of ways: by filling in the form on the webpage, by uploading an XML file or a series of CSV files, or by using one of your existing PubChem assays as a template. You must define at least one result definition (TID).  To see an example description, download and upload this example file.

    Fill in Form

    Enter each of the required and optional fields necessary to describe the assay (as described below) into the corresponding boxes. Once the boxes are completed, click on "Create" to enter the data or "Cancel" to start over.

    Upload Assay Description from File
    There are two basic ways description information can be uploaded via a file.
    1. By using the native PubChem Bioassay description specification, the appropriate XML (.xml,.xml.gz) or or ASN.1 (.asn,.asnt,.asn.gz,.asnt.gz) file can be uploaded. This option requires the depositor to have some programming experience to generate such a file, though a file downloaded from the PubChem FTP site could be modified and used as a guide. The real advantage to this option is that it can be combined with BioAssay FTP uploads to give an automated upload procedure for large numbers of assays. Here is an example XML file.

    2. By using CSV (.csv) files, individual sections of the assay description can be uploaded one at a time and the information will be progressively populated in the webform. Alternatively, a single spreadsheet file, either OpenOffice (.ods) or Excel (.xls,.xlsx), containing multiple sheets can be uploaded at once (see examples). This allows one to take description information saved in popular spreadsheet programs and upload it without conversion to other formats. There is, however, a requirement that the description information be organized in specific ways so that our system can recognize and better validate it. The principal way this is done is through standard header tags that must be used at the top of each column. Details about how to setup such files can be found in this accompanying help document.

    Use PubChem Assay as Template
    Choose one of your existing PubChem assays from the pulldown menu and click on "Load". This option is only for convenience; the assay you are creating will have no special link to the assay you chose for a template. You will be required to create a new RegID and Name for your new assay as with the other two input methods.


    Assay Description Fields
    The description of the assay defines the assay purpose and parameters. Fundamentally, the Assay Description defines the "columns" that are populated by the Assay Data "rows". Each "column" is assigned a result type identity (TID) in the Results Definition section. The Assay Data uploaded later must be reported in the same order as the TIDs defined in the Assay Description. Additionally, the Assay Data must be consistent with the Assay Description TID definitions.

    The description of an assay consists of nine parts: External Assay RegID, Name, Description, Protocol, Comments, Activity Outcome Method, Target Data, XRef Data and Results Definition.
    External Assay RegID
    The external assay identifier assigned by the depositor. This must be unique amongst your other PubChem assays.

    Name
    A short, informative name of the assay for display purposes.

    Description
    A definition of the assay purpose and parameters.

    Protocol
    The assay protocol description must be provided here.

    Comments
    Any comments on the assay can be provided here.

    Substance Type
    By default assays are assumed to be tested on small molecules. With this pulldown, nucleotides can also be specified.

    Grant Number
    For NIH screening centers only, a grant number can be specified. Note that this string is not validated.

    Hold Until Date
    Optional hold-until date for bioassay data you upload into PubChem. If this field is set to a future date, your bioassay data will be made accessible to PubChem users only after that date. Your access to the data will also be limited until that date, only via the PubChem deposition-system account you have used for upload. Only set a hold-until date if you wish to delay public release of bioassay data, for example to match public access in PubChem with the publication date of a journal article about that bioassay. And please note that PubChem will only accept bioassays with either no hold-until date, or a hold-until date less than one year in future from the initial upload date.

    Project Category
    • NIH Molecular Libraries Probe Production Network (MLPCN) This assay category should be selected by depositors that participate in MLPCN and the assay experiment was funded by MLPCN grant
    • NIH Molecular Libraries Screening Center Network (MLSCN) This assay category should be selected by depositors that participate in MLSCN and the assay experiment was funded by MLSCN grant
    • NIH Molecular Libraries Probe Production Network (MLPCN), Assay Provider This category should be selected for bioassay depositions where assay data is provided or developed by assay providers participating in MLPCN projects.
    • NIH Molecular Libraries Screening Center Network (MLSCN) , Assay Provider This category should be selected for bioassay depositions where assay data is provided by assay providers participating in MLSCN projects.
    • Literature, Extracted This assay category should be used for assays that have their data extracted from literature by 3rd party (not by author or article publisher)
    • Literature, Author This assay category should be used for assays that have their data extracted from article by author
    • Literature, Publisher This assay category should be used for assays that have their data extracted from literature by publisher
    • RNAi Global Initiative This assay category should be used for assays that are being deposited by under RNAi Global Initiative
    • Assay Vendor This category should be used for bioassay depositions contributed by assay service providers


    Activity Outcome Method
    You must classify the activity outcome method of your assay here. Choices include:
    • Screening assay - Single Concentration Activity Observed:
      Activity outcome was defined based on the percentage of inhibition from test at a single dose.

    • Confirmatory assay - Concentration-Response Relationship Observed:
      Activity outcome was defined based on EC50/IC50 values and so forth, derived from dose response curves following testings with multiple concentrations.

    • Summary assay - Candidate Probes/Leads with Supporting Evidence:
      An assay which summarizes information from multiple assays.

      Summary assay is a special kind of assay which gives users a summary of the project and brief overview of all related screening and confirmatory assays. A summary assay should be created simultaneously with the first (screening or confirmatory) assay of the project. At the beginning, data is optional for creating a summary assay (unlike other assay types). As the project progresses, the summary assay needs to be updated with additional descriptions, related assays, any probes identified and associated test results (if need be). A summary assay should always reference all assays it summarizes through its XRef fields to related assays. When linking to related assays with XRefs, make sure to provide a brief comment of how each assay fits into the overall picture of the project. Note that if you are linking to another assay which is pending in the deposition system, but not yet deposited into PubChem, you must link to it with its regid that you supplied (its PubChem AID will not yet be assigned).

      To identify probes, depositors must minimally supply a CSV datafile with two columns defined including their headers: PUBCHEM_SID and PUBCHEM_ACTIVITY_OUTCOME, where the latter column will have a value of 5 set for probes. In case additional depositor-defined readouts are provided, the regular CSV file format should be used. Readouts previously reported in related assays do not need to be repeated in the summary assay.

    • Other - An assay which does not fall into the above categories
    Active Concentration TID
    For Confirmatory assays only, an additional pulldown menu appears requiring the indication of which of your TIDs provides active concentration summary. Such a summary might be reported as the concentration which produces 50% of the maximum possible biological response such as IC50, EC50, AC50, GI50 etc. or by reporting constant parameters such as Ki, that based on which the activity outcome of your assay is called. Please choose the column number and TID name as found in your Results Definition list.

    Target Data
    For any assay designed to identify chemicals interacting with a protein target, such as enzyme inhibitors, please add the identifier of the target molecule from one of the following NCBI databases:

    Please note that for such assays you should not add an additional XRef protein link. In the opposite case, in which it is only known that an assay is identifying modulators that affect some biological processing, for example, to identify compounds affecting certain protein expression, it is appropriate to identify a protein with an XRef link (described in the next section) and not with Target data.

    Related Assay XRefs
    The Related Assay XRefs section allows for linking an assay (e.g. "A") to other PubChem assays (e.g. "B") including relevant assays from other depositors. To link assays "A" and "B" depositor can add links (XRefs) to both of them and in that case XRefs become part of the assay records. Being part of the records, links will be included in assay ASN blobs when exporting those assays using PubChem web interfaces or FTP.

    Depositors have option of adding Xrefs to only one of the assays (e.g. "A") and PubChem then will automatically add reciprocal link to all display interfaces for assay "B". In that case, however, the Xref link will not became part of the assays record for "B" and will not be included in export functions (e.g. FTP). Also note, that PubChem does not automatically build back-links from assay "B" when assay "A" has hold-until date. PubChem-build back-links will appear after hold-until date.

    Other XRefs
    The Other XRefs section links to relevant data from other NCBI databases and beyond. Examples include PubMed Ids (PMIDs), Taxonomy Ids, OMIM Ids, reference URLs to your source database/assay, etc.)

    Attention: for XRef protein links please see the previous section on Target data to determine whether you should make an XRef protein link or fill out Target data information. You should not do both.
    Type
    Choose from a list to classify the data type.

    Primary Citation? (PubMed-Id Type Only)
    If checked for a PubMed-Id, indicates citation is directly relevant to the assay, thereby allowing your assay to be discoverable in PubMed from the cited record.

    Value
    The actual data, such as a URL or an identifer.

    Annotation
    A comment to describe the XRef data.
    Results Definition
    Column definitions for the assay results that will be uploaded in the next step. Use the "Add" and "Remove" buttons to create the same number of results definitions as there are columns in the assay data. For each definition there are the following fields:
    Name
    The name of a result. Keep this short, but informative.

    Type
    The result type typically is either a Float, Integer, Boolean or String.

    Optionally, the type can be used to specify an identifer, such as one coming from another NCBI Entrez database. For example, if PubMed Id is chosen as the type, then all data values in this column will be checked to ensure that they are valid PubMed identifiers. The following is a list of accepted identifier types:
    • PubMed Id
    • MMDB Id
    • URL
    • Protein GI
    • Nucleotide GI
    • Taxonomy Id
    • OMIM Id
    • Gene Id
    • Probe Id
    • PubChem BioAssay Id
    • PubChem Substance Id
    • PubChem Compound Id
    • Protein Target GI
      Use this only when an assay contains multiple targets.
    • Biosystems Target Id
      Use this only when an assay contains multiple targets.
    • Target Name
      Use this only when an assay contains multiple targets.
    • Target Description
      Use this only when an assay contains multiple targets.
    • Target Tax-Id
      Use this only when an assay contains multiple targets.
    • Gene Target Id
      Use this only when an assay contains multiple targets.
    • DNA Target GI
      Use this only when an assay contains multiple targets.
    • RNA Target GI
      Use this only when an assay contains multiple targets.

    Unit
    Various units are available to choose from if appropriate.

    Description
    More description to the result beyond its name.

    Constraint
    Limits on the range of accepted values for integers and floats.  The more limits that can be introduced, the more validation can be performed on future data added to the assay.  A minimum and/or maximum can be specified or specific acceptable values can be specified.
    Set of Values
    Individual allowed values for integer type only.

    Minimal Value
    A single number to specify minimum possible allowed value for integer or float type only.

    Maximal Value
    A single number to specify the maximum possible allowed value for integer or float type only.

    Range
    A Minimal Value and a Maximal Value.
    Attribute: Tested Concentration
    If box is checked, the micromolar concentration at which this result was tested. This concentration attribute indicates that the readout under this test result field is biological concentration-response data, the attribute provides the value of the tested concentration in micromoles.

    Attribute: Concentration-Response (CR) Plot Labels
    Use this attribute to track concentration-response series for confirmatory assays only.

    If the Tested Concentration attribute for a result definition is filled in, then the optional "CR Plot Labels" menu appears for that TID. By default, only one CR label appears in the menu but the user can add labels by visiting the "Concentration-Response (CR) Plot Labels" section at the bottom of the description page.

    Multiple labels are useful for assays with multiple series of data and tested concentrations. For each CR Plot Label series there should be at least three activity data points with tested concentration attributes set.

    Collecting this information allows PubChem to annotate and track the concentration-response series reported, and will facilitate the development of new features such as drawing dose-response curves upon request of PubChem users.

    Derived by equation?
    PubChem attempts to record and distinguish experimental dose-response data points vs. data theoretically calculated such as using curve-fitting algorithms. For each concentration-response series input, if this box is not checked, the status as 'experimental data' would be assigned and recorded.

    If checked, this option allows one to define an alternative curve fit as desired, (e.g., dropping outliers, using other fitting functions), by supplying just enough data points (about 10 are recommended) to allow a Hill equation to draw a line that presumably fits another experimental series that you have defined.
    Is Panel Assay?
    Panel Assay Introduction
    PubChem now expands the bioassay data model to support the presentation and annotation of profiling screening results. The following video gives a quick overview of how a Panel Assay looks in PubChem.

    This video requires a more recent version of the Adobe Flash Player. If you are you using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.

    A single panel-type PubChem bioassay record may contain readouts and the respective bioactivity outcome annotations for screening tests over multiple individual targets, cell lines or species. Each of such targets, cell lines or species is regarded as a "panel component". Description of the experiments, including a short name, general goal, specific experimental protocol, and information of assay target, can be provided for each individual panel component. A panel component should be associated with one or multiple test result fields(TID). The test results for each panel component can be designated as "bioactivity outcome", "active concentration" if need be, or otherwise are treated as regular readouts. Profiling test results is complex, this expansion of PubChem bioassay data model allows one to describe a compound profiling screening test, and enables PubChem to record and annotate multiple related bioactivity outcomes under a single AID. Such grouping facilitates straightforward comparison and evaluation of compound bioactivities using the profiling results through the PubChem data analysis tools. To see a panel assay example, check out the kinase profiling assay.
    Creating a Panel Assay
    The following video gives a quick overview of how to create a Panel Assay in the Deposition Gateway:

    This video requires a more recent version of the Adobe Flash Player. If you are you using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.


    The following video shows the appropriate format of your panel CSV file:

    This video requires a more recent version of the Adobe Flash Player. If you are you using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.


    Checking the Is Panel Assay? box designates your assay as a multi-target panel assay and enables an additional input mechanism to define your assay. Such assays are very complex in nature and we have tried to make the interface as user-friendly as possible. Please remember, however, that extra attention should be paid to panel assay definitions and data to ensure their accuracy. Also remember, if the assay seems too complicated to deposit, it may also be too complicated for PubChem users to understand!

    Name
    Short name for the panel, such as "Kinase Profiling".

    Description
    Short description of the panel.

    Load Panel Component Info from CSV file
    A comma-delimited CSV file is used to define panel components. Note that this CSV file is additional to and independent of the CSV file used later for your assay data.

    The Panel Component Info CSV file consists of one required and several optional columns as follows:
    • PANEL_ID (Required)
      This is your panel component id and is important because it allows you to associate one or more result descriptions (TIDs) with it. It must be an integer starting from one and ascending by ones.
    • PANEL_NAME (Optional)
      Short name of panel component.
    • PANEL_DESCRIPTION (Optional)
      Short description about specifics of panel component, such as about cell line, or target information.
    • PANEL_PROTOCOL (Optional)
      Specific procedure used to generate results for the panel.
    • PANEL_COMMENT (Optional)
      Additional information.
    • The following three labels are used to specify a target, which is often provided for profiling assays across protein families.
      • PANEL_TARGET_NAME (Optional)
        Not necessary to provide - this will be filled in automatically unless you provide a value.
      • PANEL_TARGET_ID (Optional)
        This is mandatory if any of the target fields are present.
      • PANEL_TARGET_TYPE (Optional)
        This is mandatory if any of the target fields are present. It is an integer: Protein(1), DNA(2), RNA(3), Gene(4), BioSystems(5).
    • PANEL_TAXONOMY (Optional)
      NCBI Taxonomy-id (integer).
    • PANEL_GENE (Optional)
      NCBI Gene-id (integer).
    • PANEL_ACT_OUTCOME_METHOD (Optional)
      Assay outcome qualifier (integer). Choices include screening (1), confirmatory(2), summary(3) and other(0). See here for more explanation.
    • PANEL_TID_NAMES_REGULAR (Optional)
      Names of existing result descriptions (TIDs) in your assay separated by a "|". This is a convenient way of mapping panel components to one or more TIDs. In addition, all TIDs listed in this column will get marked as regular type TIDs. This speeds up the input process so that you do not need to choose from the pulldown menu of each TID. Please make sure to first define your TIDs and then upload your panel info file.
    • PANEL_TID_NAMES_OUTCOME (Optional)
      Names of existing result descriptions (TIDs) in your assay separated by a "|". This is a convenient way of mapping panel components to one or more TIDs. In addition, all TIDs listed in this column will get marked as outcome type TIDs. This speeds up the input process so that you do not need to choose from the pulldown menu of each TID. Please make sure to first define your TIDs and then upload your panel info file.
    • PANEL_TID_NAMES_SCORE (Optional)
      Names of existing result descriptions (TIDs) in your assay separated by a "|". This is a convenient way of mapping panel components to one or more TIDs. In addition, all TIDs listed in this column will get marked as score type TIDs. This speeds up the input process so that you do not need to choose from the pulldown menu of each TID. Please make sure to first define your TIDs and then upload your panel info file.
    • PANEL_TID_NAMES_AC (Optional)
      Names of existing result descriptions (TIDs) in your assay separated by a "|". This is a convenient way of mapping panel components to one or more TIDs. In addition, all TIDs listed in this column will get marked as active concentration type TIDs. This speeds up the input process so that you do not need to choose from the pulldown menu of each TID. Please make sure to first define your TIDs and then upload your panel info file.
    Alternative Ways to Upload a Panel Assay
    All of the preceding discussion of panel assays assumes that a new panel assay will be loaded into the Deposition Gateway in three steps: 1) fill out the form in the web interface for the description, 2) upload the optional panel csv file to define panel components, and 3) upload the data csv file.

    Because of the complexity of panel assays, it may be more efficient, however, to create an XML file of the assay based on our data specification to minimize the use of the web form. This will allow you to bypass some or all of the steps mentioned above. Here are the alternative options:
    • XML File Upload via the Web - Description Only
      In this route the entire assay description including panel components is defined in an XML file and uploaded using the file upload option for new assays. This will prepopulate all fields in the description form and allow you to accept it and move on to the step of uploading your regular data csv file. This means you do not provide a special panel csv file.
    • XML File Upload via FTP
      In this route the entire assay description and all data is defined in one XML file and uploaded using a private FTP account. The assay will show up in the Deposition Gateway after all parsing and validation of data has completed. No other files are needed to define your panel assay.

    Create your Description

    Once you have finished entering your description and verified that it is accurate, click on the "Create" button. If the system finds no errors with it, it will become a pending assay in the Deposition Gateway and you will be routed to a dynamically-created tab entitled "New assay assay-id" where the assay-id is an identifer to keep track of the assay while it is in the Deposition Gateway. To continue reading about the next step in the new assay deposition process, click on Add Data. Otherwise, we will now discuss the next Tab under the Assay Tab, the Modify Tab, which allows for various operations to existing PubChem assays.

    2.3.3  Assays > Modify Tab
    Clicking on the "Modify" tab routes you to a Modify "Welcome" tab starting on a third row of tabs.

    2.3.3.1  Assays > Modify > Welcome Tab
    Assay Modify Welcome Tab
    The modify assay welcome page lists the three types of modifications you can make on one of your existing PubChem assays:
    • Add/Change Data - without description change
    • Alter Description - with changes such as fixing a typographical error or adding XRef data like a URL (no data can be added)
    • Replace Assay - to make a significant change such as adding/removing/modifying a result column (all data must be resubmitted to replace existing data)
    • Revoke Entire Assay - to suppress from searches of PubChem
    2.3.3.2  Assays > Modify > Add/Change Data Tab
    This tab is the starting point for the most common type of modification to an existing PubChem assay: adding or changing data results. With this mechanism you can add new data results, replace selected data results, or remove data results that are no longer valid.

    Note that any duplicated substance (SID/RegID) test results for a given assay (whether in the same data file or not) will be archived in PubChem. Only the most recent one will be available for searching.

    If you are returning to resume working on an Add/Change Data action, please look under the Pending tab to find it.

    To revoke test results please submit a csv file with the following format. If your intention is to revoke the actual substance, you must first revoke it from any assays where it is a test result, then revoke it from the Substancestab.

    Progress Meter
    Just under the rows of tabs in the middle of the page is the progress meter.
    Add Data Progress Meter
    The progress meter shows a graphical timeline of your deposition. The main steps of the process are written above the meter. These steps to Add/Change Data to a PubChem assay must be followed sequentially to complete the process:
    (Submit Substances) > Add Data > Approve > Deposit in PubChem
    Each step may have multiple actions that must be completed before going on to the next step. Click on a step for a brief explanation.

    Submit CSV Data File
    Pick the PubChem assay you wish to modify from the pull-down menu, click on "Browse" to choose your CSV data file, and click "Submit". Note that if you are already modifying this assay in a pending action, you will not be able to proceed. When it loads, an Add/Change Data <Assay-Id> Tab tab will be created for you for the next step of validating your data. Also note that for this action you can only view the description, but can not modify it.

    2.3.3.3  Assays > Modify > Alter Description Tab
    This tab is the starting point for making small changes to your description. Typical examples of changes you can make here are fixing typographical errors in the Description/Protocol/Comments sections and adding XRef data, like a URL to your website. No data can be added in this action. You can not change the meaning or number of Results columns because such changes invalidate the assay's existing data. If you must do that, please see the Replace Assay tab.

    If you are returning to resume working on an Alter Description action, please look under the Pending tab to find it.

    For this action the revision of your PubChem assay will be incremented, but the version will remain unchanged. While in the Deposition Gateway, the pending assay will show a blank revision since it is being modified.

    Progress Meter
    Just under the rows of tabs in the middle of the page is the progress meter.
    Add Data Progress Meter
    The progress meter shows a graphical timeline of your deposition. The main steps of the process are written above the meter. These steps to Alter Description of a PubChem assay must be followed sequentially to complete the process:
    Edit Description > Approve > Deposit in PubChem
    Each step may have multiple actions that must be completed before going on to the next step. Click on a step for a brief explanation.

    Choose existing Description to modify
    Pick the PubChem assay you wish to modify from the pull-down menu and click "Load". Note that if you are already modifying this assay in a pending action, you will not be able to proceed. When it loads, an Alter Description <Assay-Id> Tab tab will be created for you for the next step of modifying your description. Remember that for this action you can not submit data.

    2.3.3.4  Assays > Modify > Replace Assay Tab
    This tab is the starting point for making significant changes to your description. Typical examples of changes you make here are adding or removing a Results column or changing the data type of a Results column. For this action you must resubmit all of your data results along with your description change. If an existing data result is not resubmitted with this action, it will no longer be available in PubChem when the change is made public.

    Special note: This is a powerful action that should be used as a last resort. It is your responsibility to maintain consistency with what this assay currently means in PubChem. Think of PubChem users who expect that the existing data to this assay may grow in number, but will not change in definition. Even here you can not modify the External Assay RegID. If this is what you want to do, please consider creating a new assay.

    You can not use this action to only make small description changes, like adding a URL XRef. To do that please see the Alter Description tab. However, if you have a modification for a result's name or description which would invalidate existing PubChem test results, then this is the correct action.

    If you are returning to resume working on a Replace Assay action, please look under the Pending tab to find it.

    For this action the version of your PubChem assay will be incremented and the revision will be reset to zero. While in the Deposition Gateway, the pending assay will show a blank for both version and revision since they are being modified.

    Progress Meter
    Just under the rows of tabs in the middle of the page is the progress meter.
    Add Data Progress Meter
    The progress meter shows a graphical timeline of your deposition. The main steps of the process are written above the meter. These steps to Add/Change Data to a PubChem assay must be followed sequentially to complete the process:
    (Submit Substances) > Edit Description > Add Data > Approve > Deposit in PubChem
    Each step may have multiple actions that must be completed before going on to the next step. Click on a step for a brief explanation.

    Choose existing Description to modify
    Pick the PubChem assay you wish to modify from the pull-down menu and click "Load". Note that if you are already modifying this assay in a pending action, you will not be able to proceed. When it loads, a Replace Assay <Assay-Id> Tab tab will be created for you for the next step of modifying your description. Remember that this action will replace all of this assay's existing PubChem data.

    2.3.3.5  Assays > Modify > Revoke Entire Assay Tab
    This tab allows you to suppress one of your PubChem Assays from Entrez searches. Once your assay is revoked, it will only be publically available through the PubChem BioAssay Summary service by providing the AID. This operation is considered a major update for the bioassay record, for which you must provide a comment for the reason of the revoke; the comment provided will be included in the bioassay record. As with other deposition operations, it will be reviewed by a curator.

    2.3.4  Assays > Pending Tab
    This tab gives you a list of your unfinished or recently added depositions to PubChem.  To resume working on a given assay (or simply to view its detailed information), click on one of the fields in its row. If you have not yet started on your desired assay operation, please choose from the New or Modify or tabs as appropriate.

    Please note that once your deposition has been successfully uploaded to PubChem, you will view it in PubChem and not in the Deposition Gateway. The successful deposition will remain listed here for a short time and then you can see a history of the operation under the Deposited in PubChem tab. If you'd like to make further modifications, you will choose it under the Modify tab in your list of PubChem assays.

    Also note that unfinished assay actions will be deleted from the Deposition Gateway after one month of inactivity. This will have no affect on PubChem and only means that you will need to re-enter your description and/or data as appropriate.

    The column headings can be used to sort the table. For example, clicking on the column "Action" will sort the table by the type of action. Clicking this column header a second time will reverse the order of the sort.

    Assay
    The temporary id assigned to track substances while in the deposition system.

    PC-AID (Ver.Rev)
    The permanent assay id number assigned once the assay is accepted into the PubChem system. Note that this identifier will only be non-zero for one of the Modify operations to one of your existing PubChem assays.

    "(Ver.Rev)" refers to the Version and Revision of your PubChem assay. These will also be blank if the PC-AID has not yet been assigned. In addition, if the modify operation you are undergoing will have the affect to change either the Revision or the Version and Revision, then the respective place holders will show a "-" to indicate they are being updated.

    Action
    One of four types of actions you can perform on PubChem assays.
    1. New assay
    2. Add/Change data (Modify)
    3. Alter description (Modify)
    4. Replace assay (Modify)

    Status
    The current step of the assay in the deposition process.

    RegID
    The substance registry id as supplied by the depositor.

    Name
    A descriptive name of the assay.

    Date
    The date and time on which your assay operation began.

    Curator
    The person handling your deposition (typically assigned after you have committed it). Unlike the other fields, this field points to the curator's email address.

    2.3.5  Assays > <Assay-Action> <Assay-Id> Tab
    • Assays > New Assay <Assay-Id> Tab
    • Assays > Add/Change data <Assay-Id> Tab
    • Assays > Alter description <Assay-Id> Tab
    • Assays > Replace assay <Assay-Id> Tab
    This tab is created when working on and viewing any details for a particular assay in process. It's name tells you the type of action you are performing and the temporary id assigned to track the assay while in the deposition system. Please note that this is different than the permanent PubChem-AID.

    Page Layout

    To Proceed box
    The To Proceed box on the left side below the tabs gives you a hint of what you must do next in order to advance your deposition to completion.

    Views
    The Views box on the lower left side lets you pick appropriate informational views of your deposition relevant to the current stage of the process. We will now go through a detailed explanation of the various views available. Some views are unique to one step of the process, some are unique to one of the four assay actions, and others are common (for example "View Description"). Please find the View you have questions about and read more about it.
    Add Data View

    This is the View where you upload your assay data file in CSV format. Click on "Browse" to choose your CSV data file, and click "Submit". If you are trying to find this view and already have data uploaded in your deposition, first click on Delete Data, then you will see this View. Also note that this View is not appropriate for the Alter description action as it does not allow data uploads.

    The data will be parsed and validated against the description information to find all relevant issues with the data.  If there are any errors, you must resolve them before the data can be committed into PubChem.

    CSV formatted assay data

    The PubChem BioAssay Deposition System allows the use of CSV (Comma Separated Value) formatted data files for assay data deposition. The CSV column ordering for the first seven columns is fixed and must be exactly as documented below.  Beyond that, there must be a column for each result (TID) defined in the description.

    The best way to get familiar with this format is to click on the "CSV Template" link (in the Add Data View only) to download a CSV template file using the Assay Description that you have already entered.  This is a guide so that you can cut and paste your data into this CSV file while maintaining the correct number of columns.  For fields without data there will be nothing but consecutive commas.  We also have an example CSV file with data. Your CSV file should have column headers show below as well as the names of the result definitions that you have defined; any deviations will cause errors.

    Note that any duplicated substance (SID/RegID) test results for a given assay (whether in the same data file or not) will be archived in PubChem. Only the most recent one will be available for searching.

    The following columns are accepted in your CSV file along with column headers using the names of your result definitions. If a particular data cell does not have anything to report for a given column or it is not applicable, simply leave it blank.

    Column 1: PUBCHEM_SID
    If you have previously deposited your Substance description into PubChem, you may use your Substance identifier (SID) assigned by PubChem. This must be an unsigned integer value and, in nearly all cases, your organization must have deposited the Substance associated with this SID. Optionally, you may choose to use "Column2" instead, to provide your own Substance identifier, and, if you do, you must set this column value to be "0". If you have not previously deposited your Substance descriptions into PubChem, you must, at a minimum, have these in the PubChem deposition system prior to uploading Assay Data. If you have Substance descriptions in the PubChem deposition system, you may have Assay Data refer to these by setting the value in this column to "0" and use "Column2" to provide your identifier to this Substance.

    Column 2: PUBCHEM_EXT_DATASOURCE_REGID
    You may use your own identifier for Substance descriptions previously loaded into either PubChem or the PubChem deposition system. If you provide a value in this column, you must set the value in "Column1" to "0" or leave it blank.

    If you choose to identify the Substance for which you are providing data using "Column1", please leave this column blank.

    Column 3: PUBCHEM_ACTIVITY_OUTCOME
    The Activity Summary for every Substance has two parts, the outcome and the score. The outcome for each Substance is reported as an integer value in this column and must be one of five different values:
    1 - Substance is considered inactive.
    2 - Substance is considered active.
    3 - Substance activity outcome is inconclusive.
    4 - Substance activity outcome is unspecified.
    5 - Substance identified as a probe.

    Column 4: PUBCHEM_ACTIVITY_SCORE
    The Activity Summary for every Substance has two parts, the outcome and the score. The score for this Substance is reported in this column and must be an integer value, where larger values are more active and smaller values are less active. Please make sure your scores are on a linear scale because that's how they will be interpreted. We encourage depositors to consider using the range 0-100, although values larger and smaller are allowed. The score values are used to allow PubChem users to partition, sort, and profile Assay Data results within and between biological assays.

    Column 5: PUBCHEM_ACTIVITY_URL
    An URL may optionally be provided for Assay Data reported for this Substance in this column. This URL will be provided within PubChem displays to allow a PubChem user to link to your website, where you may choose to provide additional information or interfaces to your Assay Data, for example, dose-response curves, replicate data, etc.

    Column 6: PUBCHEM_ASSAYDATA_COMMENT
    Your textual annotation and comments may optionally be provided for Assay Data reported for this Substance in this column.

    Column 7: PUBCHEM_ASSAYDATA_REVOKE
    When you submit the data you must leave this blank or put a value '0' in this column. You may optionally suppress Assay Data for this Substance by putting a value of "1" in this column. In this case, leave all other columns blank except for Column 1: PUBCHEM_SID.  Suppressing Assay Data does not delete data from PubChem, rather it eliminates all references and links to this information; however, all pre-existing links to this information will still function and a disclaimer will be displayed specifying this data is revoked.

    You may un-revoke Assay Data for a Substance by depositing either the same or new data for this Substance. Do not revoke and submit the same substance in the same file.

    Columns 8 and higher (one column per TID): PUBCHEM_ASSAYDATA_VALUE
    All remaining columns are an order dependent one-to-one correspondence between the result definitions (TIDs) defined in the associated Assay Description. All defined "columns" must be present; however, values are optional in individual fields. Consult the auto-generated CSV template file with your description information to see the layout.
    Validation Summary View
    Display issue categories related to the parsing and validation of your assay data.  This view shows the general types of issues found in processing the data including errors, warnings and info.  If errors are found, they must be resolved before the data will be accepted into PubChem.  Warnings and info issues do not have to be resolved, but often indicate something that should be adjusted.
    N
    Issue count.

    Severity
    Issue type: Error, Warning or Info.  All Error issues must be resolved.

    Category
    A short description of the issue.  For greater detail, go to the Validation Details View.

    Count
    The number of instances of this issue found.
    Depositors are able to modify/change their uploaded CSV file by uploading a new one.

    Validation Details View
    Display all instances of issues related to the parsing and validation of your assay data.  This view lists a line for each issue found in processing the data including errors, warnings and info.  This list can be very large in some cases, so it is best to begin with the Validation Summary View.  If errors are found, they must be resolved before the data will be accepted into PubChem.  Warnings and info issues do not have to be resolved, but often indicate something that should be adjusted.
    N
    Issue count.

    Severity
    Issue type: Error, Warning or Info.  All Error issues must be resolved.

    Category
    A short description of the issue.

    Message
    The detailed message for this instance of the issue.

    Data Row
    The record number from the input file where the issue was found, if applicable, otherwise set to 0. Note that this number will be one less than the line number of your CSV file because of the header line.

    Column
    The column in the input file where this issue was found if applicable.
     

    View Description View
    Review the description of a pending assay in read-only format. To see the description in machine-readable format, click on the Export Files Pulldown and choose either the XML or ASN format (if you have data loaded, those options will also include the data in the file). If you want to edit the description, you must first delete any uploaded data, then go to the Edit Description View. If you want to remove the assay from the Deposition Gateway (no affect on PubChem), again make sure any uploaded data is deleted, then click the Delete Session Icon. For more information on specific assay description fields, see here.

    Edit Description View
    Make modifications to the description of a pending assay. This view is only available when any uploaded data has been deleted from your pending deposition. This view is never available for the Add/Change data action. Restrictions on what you can edit apply in particular for the Alter Description action, but in no case can you edit the External Assay RegID. For more information on specific assay description fields, see here.

    History View
    This view displays a detailed chronology of the processing steps for the pending assay action.  This history is only for the current pending action and does not include previous actions you have committed to PubChem for the same PubChem AID. To see an overall history of the actions you have committed to PubChem for all of your assays, click on the Deposited in PubChem tab. The columns are:
    N
    Count of processing step.

    Originator
    Who initiated the step, typically you, a curator or an automatic process ("Service Daemon").

    Date
    The date and time the step was taken.

    Status
    The name of the assay deposition step.

    Comments
    Additional description of the step taken.

    View Data View
    Display uploaded assay data in read-only format. Of course this view is only appropriate if you have uploaded an assay data file. If that file has passed the first phase of the Add Data step, which is "Data Parsing", then you will see your data file parsed into columns with the corresponding headers at the top and the data displayed on multiple pages as necessary.

    The first column, "N", numbers the records and the next seven columns are the predefined columns specified earlier for the CSV format. The second column, SID, is the PubChem Substance identifier. Each SID number links back to the appropriate PubChem substance summary page. The remaining columns, TID1..N, correspond to the Results Definitions as shown in the View Description View. If you have failed "Data Validation", the second phase of the Add Data step, it is useful to look at this parsing and make sure it is what you intended. Perhaps you forgot a comma somewhere in your CSV file and your data is lined up with the wrong column headers.

    Note that if your file could not get past the first phase of "Data Parsing", then an attempt will be made to show the text of your file as is. For convenience we will add line numbers, "N", and then show the text under the header "Unparsed Text".

    If you would like to change something in your data file, first click on Delete Data, and then resubmit your modified file. If you would like to download your original CSV file or the machine-readable (XML/ASN) file generated from it, click on the Export Files Pulldown.


    Assay Action buttons
    • Export Files Pulldown
    • Delete Data Icon
    • Delete Session Icon
    • Commit Button
    Export Files Pulldown
    Clicking the Export Files Pulldown allows you to download various files. If you have submitted a CSV data file, you can download it or the parsed XML or ASN file that we create from it. You can also download the description only as an XML or ASN file.

    Delete Data Icon
    Clicking the Delete Data Icon enables you to delete the attached data file so that you can resubmit it or go backwards to edit your description or delete the action from the Deposition System. The Delete Data Icon is required for going backwards in the deposition process. Also, remember that deleting here refers to the Deposition system only; this action will have no affect on PubChem.

    Delete Session Icon
    Clicking the Delete Session Icon enables you to delete the current assay action from the Deposition System's pending list. This action will have no affect on assays in PubChem.

    Commit Button
    Clicking the Commit Button enables you to deposit the submission in PubChem. Your submission will be reviewed and, if approved, will be made public in the PubChem data system.

    2.3.6  Assays > Deposited in PubChem Tab
    This tab gives you a history of all assay actions taken by the Deposition Gateway which successfully affected PubChem. Each action will be listed on one line. This means that for one PubChem assay it will have a line for when the assay was first created. It could have additional lines for when it was modified, either by adding more data or by modifiying its description.

    Clicking on a row will take you to the corresponding entry in PubChem.

    The column headings can be used to sort the table. For example, clicking on the column "PC-Aid" will sort the table by that id. Clicking this column header a second time will reverse the order of the sort.

    PC-AID
    The permanent assay id number assigned to an assay in PubChem (not the temporary assay-id used in the Deposition Gateway).

    Version
    The version of your assay in PubChem upon completion of this action. This number will increase when you make significant changes to your description; please see the Replace assay tab for details.

    Revision
    The revision of your assay in PubChem upon completion of this action. In general this number will increase when you make small changes to your description; please see the Alter description tab for details.

    Action
    One of four types of actions you can perform on PubChem assays.
    1. New assay
    2. Add/Change data (Modify)
    3. Alter description (Modify)
    4. Replace assay (Modify)
    Started
    The date and time on which your assay operation began.

    Finished
    The date and time on which your assay operation entered PubChem.

    User
    The person who initiated this assay action. This is useful for accounts which have multiple users.

    nRecords
    The total number of tested substances uploaded in this action.

    nRevoked
    The number of substances marked to be revoked for this assay in this action.

    Datafile
    The name of your uploaded datafile. Note that this will be blank for the Alter Description action.

    Curator
    The person who handled your deposition. Unlike the other fields, this field points to the curator's email address.




    2.4  Account Info Tab
    This tab allows you to manage your account preferences and contact information. It creates a second row of the following tabs: By default you are placed under the Account Tab in the second row.

    Multiple Users on One Account
    It is now possible to create one deposition account that contains multiple users, each having their own login and password. For an overview of the process, click here.

    Views
    The Views box on the lower left side lets you pick appropriate views of your account information relevant to the second-row tab that you are under. A detailed explanation of the various views available are discussed under the sections explaining each of the second-row tabs. To read more, please find the View you have questions about.


    2.4.1  Account Info > Account Tab
    This tab puts you by default into the "View Account" View that allows you to manage your account information.

    For an explanation of individual fields under this tab, please see the appropriate test or deposition account description.

    View Account View
    This View gives you read-only access to your account information.

    Edit Account View
    This View allows you to modify some of your account information. If there is information you need to update, but the field cannot be edited, please contact the PubChem Deposition Help Desk. After you make all desired changes, be sure to press the "Update" button to commit your changes.


    2.4.2  Account Info > Contacts Tab
    The Contacts Tab is only available for the primary user of a deposition account (i.e. the user who first opened the account for your data source).

    List Contacts View
    This view displays a summary of contact information for the "Primary Contact" (primary user) and below that one row for each of the "Additional Contacts". Clicking on the Primary Contact row takes you back to the Account tab. Clicking on a row of the Additional Contacts takes you to the View Contact view for that contact.

    The contacts listed include information from the following columns:
    1. Full Name - Full name of the contact
    2. Email - E-mail address of the contact
    3. Title - Contact's title within the data source organization
    4. Phone - Phone number of the contact.
    5. Notify - Should this contact be sent deposition status e-mails.
    Note that at the end of each Additional Contacts row is a "Delete" link. Use this very carefully as it will remove one of your organization's users from our system.

    Add Contact View
    This view allows you to add a contact for this deposition account. Please fill in all fields as completely as possible. You must fill in fields with a red "*".

    Allow to login independently
    The first checkbox on the "Add Contact" form determines whether this new contact will be able to login independent of the primary user.
    • Checked - "Username" and "Password" fields will appear. Ask the contact what username she would like; it must be unique within our system. For the password, either have the contact fill it in at your computer or set a temporary one and then the contact can change it. For more information on having multiple users on your account, click here.
    • Unchecked - The contact can receive update notifications on submissions as requested and his contact information is available for reference, but he must use the primary login/password to get access to the account.
    After completing the contact information form, click on the "Register" button.

    View Contact View
    This View gives you read-only access to the contact's account information. Individual fields are defined just like the primary deposition account.

    Edit Contact View
    This View allows you to modify the contact's account information. Note that once a contact with independent login has been added, you should be very careful to make any changes to their account information. Each user can make his own changes. Also note that the primary user can change a contact's password, but can not view it. Individual fields are defined just like the primary deposition account. Please note: If you uncheck the Allow Login box for a contact that has an existing login, both his login and password will be lost.


    2.4.3  Account Info > Preferences Tab
    This tab displays a few preferences that you can review and revise. As with the other tabs, if you wish to make modifications, click on the "Edit Preferences" View on the left.

    Data Source Description Terms
    One of the more powerful aspects of PubChem and its background search engine, Entrez, is its categorization and linking of related data. This section offers a list of terms to categorize the type of data you provide to PubChem. Please choose at least one term (more than one is ok too). PubChem users looking only for toxicology data, for example, will be able to limit their search to those data sources, thereby making your data more accessible.

    Auto-Confirm Substance & Assay FTP Depositions
    This checkbox only applies to substance and assay depositions made via FTP. If checked, all such substance and assay (Alter description only) depositions will be automatically confirmed on your end if they pass validation. This means you will not have to click on the "Commit" button on the user interface in such a case. The submission will still need to be reviewed and approved by a PubChem curator, but one manual step will be eliminated.

    Note that for assay depositions this automated process only applies to the Alter description type of deposition.

    Resolve Substance Names
    If this box is checked and if no structure is provided in the substance record, PubChem processing will attempt to use provided synonyms to auto-generate the deposited compound structure. This processing includes the use of CID as synonym (e.g. "CID1" will use the structure of CID 1 for the structure record), matching synonym to MeSH (e.g. "Aspirin" will use the structure of CID 2244), and name to structure software (e.g. "2-acetyloxybenzoic acid" will yield the same structure as CID 2244).

    Consider 3D Substance Coordinates as Experimental
    If this box is checked, the depositor confirms that all 3D substance coordinates supplied were experimentally-derived. If 3D coordinates were generated by a computational algorithm, do not mark this box as it is not in the scope of the PubChem database to display such information.

    Include CIDs with Get SIDs Download Report
    If this box under the Preferences tab is checked, an extra CID column will be included at the end of the CSV file downloaded with the 'Get SIDS' link for substance depositions.

    Note: If you are currently looking at a Substance deposition, you can find this checkbox by clicking on the Account Info tab and then the Preferences tab. If you are not the primary user on your account, you will need to ask that person to login and check this box.

    Ignore Past Hold-Until dates for Substances
    If this box under the Preferences tab is checked, any substance record Hold-Until date set in the past will be stripped out and ignored for the sake of versioning.

    Avoid registry ID in list of chemical structure synonyms
    If this box under the Preferences tab is checked, registry IDs will not be used as synonyms, so they will not get used as preferential names for records.

    Use outside RNAi substance provider
    If this box under the Preferences tab is checked, RNAi assay depositors are able to use substance records from an outside RNAi provider in addition to their own deposited substance records.

    View Preferences View
    This View gives you read-only access to your preferences.

    Edit Preferences View
    This View allows you to modify your preferences. After you make all desired changes, be sure to press the "Update" button to commit your changes.

    Add Icon
    Clicking the Add Icon allows you to add a secondary contact for a deposition account. Please fill in all fields as completely as is possible. You must fill in all fields with a red "*". After completing the contact information form, please push the "Register" button.
     


    2.5  Navigation Icons

    2.5.1 Check Mark Icon
    Clicking the "Checkmark" icon in the Main Navigation Bar spawns a new web browser window and displays the PubChem Deposition Agreement in PDF format, requires Adobe Acrobat Reader to view. PubChem Depositors must (electronically) sign this agreement prior to adding any data to PubChem.

    2.5.2 Movie Man Icon
    Clicking the "Movie Man" icon in the Main Navigation Bar spawns a new web browser window and plays a movie, requires Macromedia Flash Player plug-in to view, within that window demonstrating the use of the PubChem deposition system.

    2.5.3 Question Mark Icon
    Clicking the "Question Mark" icon in the Main Navigation Bar spawns a new web browser window and displays the PubChem Deposition Gateway help document. You can learn about the various features of the deposition system by exploring this document.

    2.5.4 Person Icon
    Clicking the "Person" icon in the Main Navigation Bar prompts you if you would like to log out of the PubChem Deposition Gateway.
     

     3. PubChem Deposition Gateway FAQ's Top of Page

    Q: I uploaded my file, now what?

    A: The PubChem Deposition Gateway will parse and validate the data you submitted. You can watch this process proceed, or you may submit another file or logout and come back later. When this processing is complete, as denoted by the submission status bar or by receipt of an e-mail, you may want to review the submission. If you have a Test Account, and the submission proceeded without error, you have successfully tested your data and can be assured that your data is ready for use with the PubChem Deposition System. If you have a Deposition Account, and the submission proceeded without errors, you may commit your data to PubChem by pressing the "Commit" button.

    Q: Can I supply an additional datasource URL as well as my datasource URL? Can I supply an additional substance URL as well as my substance URL?

    A: You have two URL's per substance. One URL is associated with your data source name and the other is associated with your unique registry ID. Beyond that, you would need to use the Entrez "link-out" mechanism that can "piggy-back" URL's on your (or anyone's) substances.


    Q: Can I supply multiple lines of additional searchable text per Substance?

    A: All additional information should be put in the comments ("PUBCHEM_SUBSTANCE_COMMENT") section of the SD file. You can have as many lines as you need there. You could also put URL's there, too.


    Q: If I have new substance information available, how do I update PubChem?
    Do I need to re-deposit the complete substance record (including the new information) or can I just deposit the new information?


    A: To update, please re-deposit the complete substance record, including the new information, using the same registry identifier. Updates are versioned but only the most current data will be readily visible, searchable, or downloadable. Please note that the revised record will still have the same SID. Please also note that PubChem will not version substance records if nothing has changed.

    Q: If a substance ceases to be part of my data, how do I delete the record in PubChem?

    A: You will need to re-deposit the record such that it contains empty CTAB section, the registry ID tag ("PUBCHEM_EXT_DATASOURCE_REGID "), and a revoke tag ("PUBCHEM_REVOKE_SUBSTANCE"). We suggest that you provide a comment (a line of text) with PUBCHEM_SUBSTANCE_COMMENT tag to designate why you revoked the record, e.g. "Deprecated in favor of record ABC123". As an example we added a revoke record to the example SDF file (last record).


    Q: If I find a mistake in my PubChem Substance record, is it better to update or remove my substance?

    A: The best way is to "update" the substance record.


    Q: After I get a deposition account, is my test account still active? Can I still use it to test submissions?

    A: Yes, the test account is still active. You may continue to use that account for testing. Please be advised that test accounts will not allow you to deposit data into PubChem. Deposition accounts will allow you to deposit data into PubChem, after processing has successfully completed.

    Q: We have various flags "nucleophile", "electrophile", "yuck" that we are starting to attach to molecules in our deposition. We'd like to send that data to PubChem in the most useful way possible. We think of them as "properties". What is the best way to do that?

    A: The substance/compound properties you mentioned above will go to PubChem's "Comment". You can simply put them under sd tag <PUBCHEM_SUBSTANCE_COMMENT>.

    Q: If we have CAS registry numbers, is it best to put them in <PUBCHEM_GENERIC_REGISTRY_NAME> or <PUBCHEM_SUBSTANCE_SYNONYM> ? Does it matter?

    A: The PubChem original design let user to put all "Registry" items under <PUBCHEM_GENERIC_REGISTRY_NAME>. Since many depositors already put the CAS numbers under their own synonyms field, those CAS numbers will automatically go to the <PUBCHEM_SUBSTANCE_SYNONYM>. So it doesen't matter and up to you to put them in which field.

    Q: We are starting to collect annotations of compounds e.g. inhibits enzyme XYZ with Ki=10uM using a spetrophotometric assay. Also, we annotate compounds as being aggregators (non specific inhibitors) at a particular concentration. This is starting to sound like something that overlaps with the PubChem Assay database. So far we don't have a lot of data to upload, but that may soon change. Can you advise us on the best way to send this data to you?

    A: Yes. You are right. Such bio-data related to your substance will go to PubChem BioAssay database.

    Q: From time to time, compounds become depleted at various suppliers. We would like to either A. indicate in the comment record that this supplier's stock is depleted. OR B. remove a supplier from the comment record completely.

    A: Once you update your record, we will archive all old version content. We recommend you indicate in your comment.

    Q: Is the only way to do this to upload the full SD record again, overwriting the previous one? I think this is true, but wanted to make sure.

    A: Yes.

    Q: Do you want compounds that are depleted in PubChem? I figure the answer is yes, because what you are really looking for is maximum coverage of chemical space. So I'm thinking, why don't you just run a combichem/de novo design program to enumerate millions of molecules, and then load them into PubChem? Obviously, just chemical space isn't what you're after. Can you help me understand the PubChem perspective on this issue?

    A: PubChem substance database is depositors based. Every deposited substance will be assigned an SID. PubChem compound database is a non-redundant, structure unique database. Every compound in the database has a unique CID. If substance(s) linked with this compound become depleted, the compound will be deprecated/suppressed. We will keep all deprecated/suppressed compounds archived, and compounds will be never depleted.


     4. FTP Depositions & FAQ Top of Page

    FTP-based deposition provides a path for completely automated data upload into PubChem. If you have a large amount of data to be uploaded into PubChem or if you update your data on a daily or weekly basis, you may be a good candidate to use the PubChem FTP deposition method.
    To get started with FTP-based depositions, you must:
    • 1. Have an approved deposition account
    • 2. Have performed previous data uploads into PubChem
    • 3. Request an FTP account from PubChem
    Please note that an FTP account is independent of your PubChem deposition account with different login credentials. The PubChem deposition account will be configured to interact directly with data uploaded via FTP. The procedure to create, setup, and configure your FTP account to interact with your PubChem deposition account will take one or more business days.


    Substance-based FTP Depositions
    To deposit data for a Substance deposition via FTP, you must:
    • 1. Upload a file using your FTP account
    • 2. Name the file you upload with the suffix ".sdf.in" (or ".sdf.gz.in", if a compressed file)
    Please note that the file suffix lets the deposition system know your file is intended to be a new Substance deposition. After the file is recognized as being present, the file is transferred into the deposition system. There may be a delay between completion of your FTP upload transfer and before your uploaded file is processed, considering the deposition system processes FTP deposited data at particular times of the day and may wait to verify that your transfer is actually complete. FTP-based deposition processing begins when you notice the ".sdf.in" (or ".sdf.gz.in") suffixed file is removed from your FTP account directory and a status file is created. This status file has a suffix ".sdf.status". For example, if you upload the file "smid.sdf.gz.in", the status file created will be "smid.sdf.status".
    The status file informs you of the processing progress. The possible status file contents and their meaning are listed below.

    StatusMeaning
    I Submitted
    -P Parsing
    !P Parsing Failed
    P Parsed
    -S Standardizing
    !S Standardization Failed
    S Standardized
    -V0 Validating I
    !V0 Validation I Failed
    V0 Validated I
    -V1 Validating II
    !V1 Validation II Failed
    V1 Validated II
    C Committed for PubChem
    A Approved for PubChem
    R Rejected for PubChem
    -D0 Uploading to PubChem
    -D Depositing to PubChem
    !D Depositing to PubChem Failed
    D Deposited in PubChem

    After processing completes to the point of "Validated II", you will need to log into the deposition system, review your submission, and then, if there are no issues, commit your data to be loaded into PubChem. An auto-commit feature can be requested, whereby the deposition commit step is performed on your behalf automatically. This removes the necessity for you to login and commit your data into PubChem. In many ways, FTP-based deposition is much like a normal deposited file. You can login to your deposition account at any time to see the progress of your deposition(s) or to get your SIDs. When processing is complete and your data is loaded into PubChem, you will see the suffixed file ".sdf.err" and, if all went well, the suffixed file ".sdf.out". The file with the ".sdf.out" suffix (e.g., "smid.sdf.out") is your report file containing your PubChem Substance identifiers (SID's).
    Please note that the ".sdf.out" log file is a CSV text file, easily read by Excel or other spreadsheet applications. These files contain no column headers. The columns are in a following order:
    • Data Source Name
    • External Registry ID
    • SID
    • SID Version
    • Load Code
    The "Load Code" column values, described below, allow you to know or track the substances that you have added, modified, or suppressed.

    Load CodeDescription
    0 substance load failed (internal error)
    1 existing substance replaced (internal use only)
    2 new substance created
    3 new substance version, PubChem structure same
    4 new substance version, PubChem structure changed
    5 no change, identical substance
    6 no change, but new PubChem structure (internal use only)
    7 substance revoked/suppressed
    8 substance is "on-hold"

    The presence of a non-zero length file containing the suffix ".sdf.err" (e.g., "smid.sdf.err"), will indicate that there was a problem with your uploaded file and that your data may not be loaded into PubChem. The ".sdf.err" file will contain a human readable text message explaining why the FTP uploaded file failed. Please note that the status file is not deleted after processing and publishing are completed. The final contents, if all went well, will be "D", which will mean "Published".


    FAQ

    Q: How do I specify the URL associated with my data source name? Does PubChem use the URL I provided for my deposition account at registration time? Or, does PubChem use the URL specified using PUBCHEM_EXT_DATASOURCE_URL in the SDfile?
    A: The URL specified when you created your deposition system account is used in the PubChem source page and is associated with your data source name in the PubChem sources display page, for example, the BioCyc data source name. The URL provided in the SD tag "PUBCHEM_EXT_DATASOURCE_URL" gives the organization (data source) URL per substance, which is allowed to change from substance to substance for the same data source name that you deposit.

    Q: How do I deposit substance data using FTP?
    A: You login to your private PubChem FTP account, upload your file(s), commit your processed data, and check back to the FTP account for your load report containing your SID's. Your uploaded substance deposition file must have the suffixed extension ".sdf.in" or ".sdf.gz.in", if "gzip" compressed. After the file is "recognized" by the PubChem deposition system, it will disappear from your FTP account and you will see a file with the suffix ".status". The status file will let you know at what processing stage your uploaded file is via a code in the ".sdf.status" file. After the processing is completed and your data is successfully loaded, there will be ".sdf.out" file containing your SID's.

    Q: Is there any kind of report on the success or failure of the FTP uploaded substance data?
    A: Yes, you can examine the ".sdf.status" file, the ".sdf.err" file, or the ".sdf.out" file. The ".sdf.status" file contains the current status of the data processing. The ".sdf.out" file contains a load report containing a list of records and the load action taken. The ".sdf.err" file may contain a human readable error as to why your FTP-based substance deposition failed.

    Q: After we deposit our substances, will we get the SID's or CID's for linking purpose? How do we get the SID's or CID's? Will you put it on the ftp site for us to pick up?
    A: After processing is completed and your substances are loaded into PubChem, a ".sdf.out" file will appear in your FTP account. This file will contain your SID's that correspond to your registry ID's. CID's are not provided.

    Q: How do I compose a URL to link back to PubChem from my website?
    A: To generate a URL to link to your substance, for example, SID 2244, the URL will be:
    //pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=2244

    We do not recommend linking directly back to a CID associated with your substance using the "summary.cgi", as it may change as we change our preference for different tautomer or resonance forms of a structure. You may "safely" generate an URL to the CID associated with your substance via, for example, SID 2244:
    //www.ncbi.nlm.nih.gov/entrez/query.fcgi?
    cmd=PureSearch&db=pccompound&details_term=2244%5BSID%5D


    Q: Is it possible to link back to PubChem without having to know the Substance identifier (SID)?
    A: Yes, you can compose an URL using your data source name and external registry ID. For example, to make a like to the substance for the data source name "NIAID" and the external registry ID "115500", the URL would be:
    //www.ncbi.nlm.nih.gov/entrez/query.fcgi?
    cmd=PureSearch&db=pcsubstance&details_term=
    niaid%5Bsourcename%5D%20AND%20115500%5Bsourceid%5D


    Q: If we send records with the previous source ID again, will the previous record be overwritten?
    A: Yes, the registry ID you provide is the key to the record. Whenever you send us a record with that registry ID, we will interpret it as an update, replacing the complete record with the information provided.

    Assay-based FTP Depositions

    Bioassay data depositions can be initiated via FTP in much the same way that substance depositions can, but for assays you must additionally specify the type of assay deposition operation.

    To begin, follow the same three-step setup procedure as described for substance FTP depositions. Note that you use the same FTP account for depositing either substance or assay data.

    Once your FTP account is setup, you should have the following directory structure under your top level directory:

    You must decide which of the four types of assay operations you want to perform and place your file to be deposited into the appropriate directory highlighted above. You should be familiar with performing these assay deposition operations before trying them with FTP. For more information on them, see here.

    Assay FTP Deposition File Format
    To upload any kind of assay data or description changes, a single XML or ASN.1 file is required. This file must adhere to the specification for assays and be filled out as appropriate. Search in the specification file (XML Schema, ASN.1) for the tag PC-AssayContainer; this will always be the outermost container for your assay, whether it contains description and data or only description. You can find examples of such XML files from the PubChem public FTP site of bioassays. For assay deposition path-specific XML examples look at Bioassay XML examples for FTP section. No CSV files are permitted using FTP.
    You can also download templates of XML files from pending depositions that you are making in the Deposition Gateway. You will need one file with both the data and description filled out in the cases of new, data_only or replace_all operations. For the alter_descr operation, only the description should be filled out. Let's now reiterate these instructions by assay deposition operation:

    • New assay deposition
      For this operation you need to fill out new description information including a unique aid-soure (RegId) and name for your assay, and assay data results. You can look at one of your existing assays from the PubChem FTP site for guidance.
      Your assay FTP upload file goes in the directory /assay/new. XML example is here.

    • Modify existing PubChem Assay
      For these three operations, you are doing something to affect your current assay in PubChem. Therefore, you need to specify the assay AID correctly so that it can be found. The best way to do this is to first copy the XML file of your current assay and modify it as you wish. In the following three types of Modify operations, we'll briefly mention what you should change.

      • Add/Change data without description change
        For this operation, you should take a copy of your current assay's XML file and replace the data section with the data that you want to add, delete or modify. Be careful to first remove all data or the system will think that you want to add that data again! Also note that for this operation you must make no changes to the existing description.
        Your assay FTP upload file goes in the directory /assay/modify/data_only. XML example is here.

      • Alter description
        For this operation, you should take a copy of your current assay's XML description file without the data section and make your minor alterations. Any significant changes, such as adding TID data result definitions, will result in an error.
        Your assay FTP upload file goes in the directory /assay/modify/alter_descr. XML example is here.

      • Replace assay
        For this operation, you should take a copy of your current assay's XML file and make your description and data section modifications. Please note that all of your existing data for this assay in PubChem will be replaced by the contents of this uploaded file.
        Your assay FTP upload file goes in the directory /assay/modify/replace_all. XML example is here.


    XML Validation against PubChem XSD Schema
    To increase the efficiency of the data exchange for your Bioassay FTP submission, PubChem highly recommends that depositors first validate XML files before uploading them to the PubChem FTP site for processing. XML validation will make sure that your file conforms to the PubChem Bioassay specification and should help speedup the deposition time by isolating XML errors. To check if your XML document conforms to the PubChem XSD Schema, the XML document must be validated against that XSD Schema. You can find PubChem's XSD schema here.

    One XML validator that you might use is xmllint which is often included in standard Linux installations. To validate XML using xmllint one would run the following Linux command:

    xmllint --noout --schema "ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem.xsd" FileName.xml

    Please be advised that PubChem does not support or maintain xmllint, but you can find more information on it here. Depositors may of course use any other equivalent XML package for validation.

    Assay FTP Deposition Communication and Processing
    As with substance FTP depositions, initial communication between you and our deposition system occurs through files in your FTP directory using a naming convention. Your input XML/ASN.1 file must end in ".in". Once that file is picked up by our system, it will try to process it and put the status of the deposition in another file with the extension ".status". There will also be a file ending in ".err" which will contain an explanation of any errors found. In some cases if a processing error occurs right at the start, the deposition will not have a status yet and the ".status" file will be empty.
    The status file informs you of the processing progress. The possible status file contents and their meaning are listed below.

    StatusMeaning
    I Description created
    U Data Submitted
    -P Parsing Data
    !P Data Parsing Failed
    P Parsed
    -V Validating
    !V Data Validation Failed
    V Data Validated
    C Committed for PubChem
    A Approved for PubChem
    R Revise for PubChem
    -D Depositing to PubChem
    !D Depositing to PubChem Failed
    D Deposited in PubChem

    After processing completes to the point of "Validated", you will need to log into the deposition system, review your submission, and then, if there are no issues, commit your data to be loaded into PubChem. An auto-commit feature can be requested, whereby the deposition commit step is performed on your behalf automatically. This removes the necessity for you to login and commit your data into PubChem. In many ways, FTP-based deposition is much like a normal deposited file. You can login to your deposition account at any time to see the progress of your deposition(s).
    Once you have resolved any processing errors that might come up, your assay will proceed to the validated stage. At this point, you can switch to the Deposition Gateway web interface and view your deposition. This gives you more interactive information about your deposition and is necessary for you to confirm the validity of your new assay or changes to your existing assay. From the validated stage you will no longer need the FTP system.





     5. PubChem Deposition Documents and Examples Top of Page

    Specifications
    Examples


  • |Write to PubChem Deposition Helpdesk |Disclaimer |Privacy statement | Accessibility |
    NCBI Home NCBI Search NCBI SiteMap