PubChem Deposition Gateway
This site allows users to test data exchange for deposition of chemical
structure and/or bioassay data into PubChem, and to provide data to be added to
PubChem. Please obtain an account for login before
you start. PubChem Deposition Gateway Accounts come in two types,
Test and Deposition.
1. Getting Started
1.1 Logging into the PubChem Deposition Gateway
1.1.1 New User: If you are a new user to the
PubChem Deposition Gateway, you will need to create an account. We would
strongly recommend that you start with a Test Account. You can proceed to
create a Test Account by clicking the "Create
Test Account" button.
If you already have a "Test Account" and wish
to proceed to putting data into PubChem, you should create a "Deposition
Account". To make a "Deposition Account",
press the "Create Deposition Account" button.
1.1.2 Existing User: If you already have an
account, you may login to begin using the PubChem Deposition Gateway.
Please enter your username in the text box labeled "Username:" and
your password in the text box labeled "Password:", then push the
"Log In" button below the previously mentioned text boxes.
If you have forgotten your password, please click "Forgot Password?".
You will be prompted for your "Username". An e-mail will be sent to
the primary contact for that account with further instructions on how to
proceed.
1.2 Creating a New Account
Choosing an Account Type
When creating a new account you have two types from which to choose:
-
Test Account Type:
A Test account allows a
user to go through all the steps of uploading substance and/or bioassay data.
Anyone with a valid email account can get a test account. The purpose of
a Test
account is to allow potential depositors to validate that their data is
properly suited for submission into the PubChem Deposition Gateway, without
actually putting any of the data into the public PubChem system or signing off
on the PubChem deposition agreement. Any data
submitted via a Test account will remain accessible for up to one week.
For instructions on creating a Test Account, continue
here.
-
Deposition Account Type:
A Deposition account, like a Test account, begins the deposition process
by allowing a user to upload and validate substance and bioassay data.
The Deposition account, however, is the only way you can finish this
process to make your data public in PubChem.
Deposition accounts require more detailed information to setup and may
involve direct contact with PubChem administrators for approval. Furthermore,
Deposition account users must agree to and are bound by the
PubChem Deposition Agreement.
This agreement gives PubChem the right to redistribute the information
deposited. It is important to note that all data deposited into PubChem is
covered under fair-usage and, as such, does not require the depositor to
assign away copyrights or other ownership prior to PubChem deposition,
i.e., the deposited data does not have to become part of the public domain.
Organizations with multiple data collections may need multiple
deposition accounts. PubChem requires a separate Deposition account
with a unique username for each unique data source. If you have multiple
substance collections that need to be treated independently within PubChem,
you will need separate Deposition accounts for each. For example, the
"NIST" and "NIST Chemistry WebBook" substance
collections are both generated by the organization NIST. Note, however,
that if you have one collection but many users that are allowed to make
depositions, we now have a mechanism to assign
multiple logins.
If you are not sure whether you need multiple deposition accounts,
please contact the
PubChem Deposition Help Desk.
For instructions on creating a Deposition Account, continue
here.
1.2.1 Creating a Test Account
A Test account allows a user to go through all the steps of uploading
substance and/or bioassay data. To setup a Test account please follow three
simple steps:
-
Fill out all form items, those denoted with a
"*" are required.
-
Press the "Register" button.
-
Go to the URL link, typically by clicking on it in the e-mail you receive
from the PubChem Deposition Gateway
After these three steps are complete, you may login and begin using the PubChem
Deposition Gateway. If you do not complete Step 3 within 24 hours of completing
the first two steps, you will need to start again from Step 1.
Test Account Information
-
Username
-
E-Mail
-
Password
-
First Name
-
Last Name
-
Additional Information
-
Terms of Use
Username
Choose a username. If the username you request is taken, you will need to
provide a different one. The username you provide must be at least six
alphanumeric characters long and cannot contain spaces.
E-Mail
Before your Test Account is activated for use, an e-mail will be sent to this
address with further instructions.
Notify About Submission Status Changes
This is a pull-down menu that allows you to choose whether or not you would
like automated e-mails sent to you as your submission is processed or as its
status changes. There is a third option for
multi-user deposition accounts entitled
"My submissions only" which means that you will only be mailed
for submissions you have personally initiated. You can modify this
notification setting in your Account Info
after you login.
Password
Choose an account password. You must type the same password in the text boxes
"Password" and "Confirm Password". Please commit this
password to memory, as you will need it every time you attempt to login to the
PubChem Deposition Gateway.
The password you provide must be at least six characters long and cannot be
your username.
First Name
Your first name.
Last Name
Your last name.
Additional Information
Please type any additional information or notes in this text box.
1.2.2 Creating a Deposition Account
A Deposition account allows a user to go through all the steps necessary to put
substance and/or bioassay data into PubChem. To setup a Deposition Account
please follow three simple steps:
-
Fill out all form items (those denoted with a
"*" are required);
then click the "Register" button.
-
Within a few minutes you will receive an e-mail from the PubChem
Deposition Gateway. Click on the URL link inside of it to confirm
your e-mail address. Curators will then review your account and may contact
you to verify your account information.
-
Within a couple of days you will receive a second e-mail from the PubChem
Deposition Gateway. Upon receipt, login to your account at our website. At
this point you must agree to a
Data Transfer Agreement (DTA), typically by clicking on the button
when you first login.
After these three steps are complete and your information is reviewed and
verified by a PubChem administrator, you may receive an e-mail notifying that
you may login and begin using the PubChem Deposition Gateway. It is possible
that you may be contacted by phone or e-mail by a PubChem administrator during
this process. If you do not complete Step 3 within 24 hours of completing the
first two steps, you may need to start again from Step 1.
Multiple Users on One Account
It is now possible to create one deposition account that contains multiple
users, each having their own login and password.
- How this works
- Separate logins
You can now have an arbitrary number of users each with her own
password on one deposition account. No one can see the password of
another user, but there is one primary user who can add and
remove other users.
- Separate deposition tracking
For each action that a user takes, the processing history for
a substance or assay deposition will
record that user's id.
The user who initiates a deposition, however, will remain
associated with the deposition even if others work on it. The
pending listing, for example, of that
substance or
assay
deposition will show that user as the owner.
- Joint access
Though login and tracking is distinguished, access is not. All
users from the same data source will see all of their depositions
whether they initiated them or not. Users are also free to make
follow-up actions on any submission from their data source. For
example, if jane_doe deposits a file of substances that pass
validation, juan_carlos (from the same data source) may commit
the submission for publishing or for that matter may even
delete it.
- How you implement it
If you would like to have multiple users for your data source, please
follow these steps.
- Choose one person to be the administrator and have that person
create the initial deposition account
as described. If you already have a
deposition account and would like to add users, skip to the next step.
- Your administrator must login and goto the
Account Info >
Contacts tab and click on the
Add Contact View on
the left side.
Fill out this form for each additional user. You can have the user
enter her password while the administrator is present or the admin
can enter a temporary and the user can change it afterwards.
Deposition Account Information
-
Username
-
E-Mail
-
Password
-
First Name
-
Last Name
-
Data Source
-
Company/Organization
-
Job Title
-
Phone Number
-
Street Address
-
City
-
State, Province or Area
-
ZIP
-
Country
-
Additional Information
-
Terms of Use
Username
Choose a username. If the username you request is taken, you will need to
provide a different one. The username you provide must be at least six
alphanumeric characters long and cannot contain spaces.
E-Mail
Before your Test Account is activated for use, we will send you an e-mail to
this address with further instructions.
Notify About Submission Status Changes
This is a pull-down menu that allows you to choose whether or not you would
like automated e-mails sent to you as your submission is processed or as its
status changes. There is a third option for
multi-user deposition accounts entitled
"My submissions only" which means that you will only be mailed
for submissions you have personally initiated. You can modify this
notification setting in your Account Info
after you login.
Password
Choose an account password. You must type the same password in the text boxes
"Password" and "Confirm Password". Please commit this
password to memory, as you will need it every time you attempt to login to the
PubChem Deposition Gateway.
The password you provide must be at least six characters long and cannot be
your username.
First Name
Your first name.
Last Name
Your last name.
Data Source
Your Data Source consists of a Display Name, which you set initially and can reset,
and it also consists of a Data Source Id (a number), which PubChem sets automatically.
The Data Source Display Name consists of one or more words to uniquely distinguish
data coming from an organization; it is prominently displayed on all PubChem
substance and bioassay records. PubChem users may search with your full Display
Name or with keywords from it to find your data. We advise that this name be short
but informative, such that it includes common names by which your organization
is known.
In addition, you will be automatically assigned a Data Source Id (a number),
which cannot be changed even though your Display Name can be changed. This Id
is used to track all of your data in PubChem and is what must be used to identify
your data records even though it is your Display Name which will be visible to
PubChem users on our webpages.
Organization (Data Source) URL
This should be the appropriate home page for people wanting more
information on your organization.
Company / Organization
The Company or Organization name associated with this Deposition Account.
Please include the division or group name, if appropriate.
Job Title
Your job title or position within the company or organization you represent.
Phone Number
The phone number where you and your organization can be reached.
Street Address
City
State, Province, or Area
ZIP
Country
Your physical or legal address to which correspondence may be sent.
Additional Information
Please type any additional information or notes in this text box.
Data Transfer Agreement
To apply for a deposition account, you must agree to a
Data Transfer Agreement
(html format) for this website as your data will be
released to the public in PubChem (unlike a test account). To agree, you
must make sure the "I agree" check box is checked on your first
login. Note, this check box does not appear on the first registration page, but
after your account has been reviewed and you login for the first time. If
your organization requires modifications to the Data Transfer Agreement,
please contact a
PubChem Curator.
2. PubChem Deposition Gateway
Uploading and managing depositions is fairly straightforward. First, you will
need to familiarize yourself with the
required file
format documentation
and review the PubChem Deposition Gateway help
documentation.
Essentially, the deposition process consists of uploading an appropriately
formatted PubChem data file. The file contents are subjected to several stages
of validation. After all validation checks are complete, an e-mail may be sent
notifying you to review your submission.
After your review, PubChem Deposition Gateway users with a Deposition
Account may "Commit" their successfully validated data.
Typically, your committed data will be made available in PubChem within two
days. It is possible, however, that your committed data availability may be
delayed, especially if you are notified that there is further action required
on your behalf.
The PubChem Deposition Gateway provides the means for you to manage and review
your depositions. You may delete or review prior submissions. If you have a
Deposition Account, you may also generate reports of prior submissions and
retrieve the PubChem Substance identifiers (SID) for your successful
depositions.
When you enter the PubChem Deposition Gateway, you will see, immediately below
the heading "PubChem Deposition Gateway", a navigation bar with
various tabs and icons. You may click on the tab text or icons at anytime for
navigation. These tabs are:
-
Home
- default login tab displaying major activities
-
Substances
- to create new and manage existing substance depositions
-
Assays- to create new and manage existing bioassay
depositions
-
Account Info
- to manage your account settings
-
Navigation Icons
At the time of login, you will always default to the "Welcome" tab.
2.1 Home Tab
Clicking the "Home" Tab gives you a screen listing short descriptions of the
main activities from which you can choose:
-
Substances
- Deposit new or resume suspended submissions
-
Assays
- Deposit new or resume suspended submissions
-
Account Info
- Review your account information/preferences
Please note that you may also use the main navigation bar tabs with identical
labels as the links, as they perform the same function.
2.2 Substances Tab
Clicking the "Substances" Tab puts you onto the substance welcome page.
2.2.1 Substances > Welcome Tab
The substance welcome page lists the main substance deposition activities from
which you can choose:
-
New
- Create chemical structure records for deposition to PubChem
-
Pending
- Resume unfinished depositions
-
Deposited in PubChem
- Access previous depositions in PubChem
Please note that you may also use the main navigation bar tabs with identical
labels as the links, as they perform the same function.
2.2.2 Substances > New Tab
Clicking on Substances > New Tab opens up a third row of tabs, which
provide options for new record creation.
2.2.2.1 Substances > New > Upload File Tab
"Upload File" Tab provides an interface for uploading
a file into the PubChem Deposition Gateway.
The format of the file you upload is expected to be in
SD File format. (However, we also alow a CSV format using SD tags as described below.)
Each substance in the SD file must have a unique
registry ID in the appropriate SD field. A description of the allowed and
required SD fields is available
.
For examples of suitable SD files for deposition, see
this SD file.
For CSV input, all the SD tags are allowed, and they serve as column headers.
The SD tags can come in any order, with the exception of the PUBCHEM_EXT_DATASOURCE_REGID,
which should always be in the first column. Each substance can take up one line only.
When an SD tag is accociated with multiple values, they are separated with a newline character.
(Hint: when entering data with Excel, use Alt-Enter to create a newline in data cell or double
click a data cell before pasting multiple line entry into it.)
For an example of a CSV file for deposition, see
this CSV file. This file can be imported into any spreadsheet
program for viewing or editing.
Press the "Browse..." button to select a file to upload to the
PubChem Deposition Gateway. After selecting a file, provide comments in the
"Comments:" text box that will help you track this deposition and,
perhaps, provide useful information to the PubChem Deposition Gateway
administrators. Please remember to press the "Submit" button after
you have selected the appropriate file and provided necessary comments. When
the file transfer is complete, you will be transferred to "Pending"
displaying this submission.
Please note that the file you upload to the PubChem Deposition Gateway may be
compressed. Compressing your file may substantially reduce the time it takes to
transfer your data. We support files compressed using the
"gzip" compressor. Please note
that we do not support "zip" or "bzip2" compressed files.
2.2.2.2 Substances > New > Fill in Form Tab
Clicking on "Fill in Form" Tab produces a substance entry form
that allows one to create a single substance record for the deposition to PubChem.
When the form is filled out and submitted, a new deposition containing one record is created.
2.2.2.2.1 Substance Form Fields
The substance entry form contains the fields allowed in the SD file format. You can review these fields
in thisdocument. Note that the only required field is the
"Substance Name (External Registry ID)".
2.2.2.2.2 Importing/Exporting an SD File
Clicking on Import button prompts the user to "Browse" for a desired SD File on their computer.
The substance record (including chemical structure if any) contained in the selected file is then read
into the form and can be edited as needed. Note that if the file contains more than one record, only the first one would
be imported into the form.
Clicking on Export button gives the user a choice to either save or view an SD File that would be produced if
the form was submitted to PubChem.
2.2.2.2.3 Chemical Structure Input
In addition to importing an SD File into the form, there are two more ways to provide a chemical structure:
Sketch it - click on the structure image area or the "Edit" button to open the
PubChem Sketcher.
Use CID, SMILES, or InChi - enter the string into the textbox below the structure image area and click "Apply".
2.2.2.3 Substances > New > Revoke Substance Tab
Clicking on "Revoke Substance" Tab produces a screen that allows one to create a single revoke record for the deposition
to PubChem. When the form is filled out and submitted, a new deposition containing one record is created.
2.2.2.4 Category Assignment
When substances are deposited in PubChem, the depositor category will be
assigned to all substances. Based on the depositor's category, users can
expect to find additional category-specific information either on the PubChem
substance summary page or on the depositor's site.
The different categories and their descriptions are the following:
|
Status | Meaning |
| Biological Properties |
Depositor provides information about the biological properties of a substance
or compound |
| Chemical Reactions |
Depositor provides information about the reactivity, synthesis, or known
reactions of a substance or compound |
| Imaging Agents |
Depositor provides information about the contrast agent or imaging agent used
in, for example, MRI's |
| Journal Publishers |
Depositor is a journal publisher and has articles published about a substance
or compound |
| Metabolic Pathways |
Depositor provides information on the metabolic pathways involving a substance
or compound |
| Molecular Libraries Screening Center Network |
Depositor is part of the NIH Molecular Libraries Screening Center Network
(MLSCN) |
| NIH Substance Repository |
Depositor is an NIH Molecular Libraries Small Molecule Repositor servicing the
MLSCN |
| Physical Properties |
Depositor provides information about the experimental physical properties of a
substance or compound |
| Protein 3D Structures |
Depositor provides information about the experimental 3-D structure of a
substance or compound |
| Substance Vendors |
Depositor is a seller of a substance or compound |
| Theoretical Properties |
Depositor provides information about the theoretical properties of a substance
or compound
|
| Toxicology |
Depositor provides information about the toxicological properties of a
substance or compound |
2.2.3 Substances > Pending Tab
This tab gives you a list of your unfinished or recently added depositions
to PubChem.
A "Filter by Status" pull-down menu allows you to filter your
submissions by multiple criteria. By default, all submissions, "Any
Status", are shown. The other filter criteria are:
-
Failed - Submissions that failed validation checks at any processing stage
-
Processing - Submissions being validated by the PubChem System
-
Commit Required - Submissions validated by PubChem and awaiting your Commit
-
Committed - Submissions committed by you awaiting PubChem approval
-
Rejected - Submissions rejected by a PubChem administrator
-
Approved - Submissions approved by a PubChem administrator
-
Deposited in PubChem - Submissions now publicly available in PubChem
Below the "Filter by Status" is a summary line denoting the number of
submissions meeting the filtering criteria. Furthermore, the summary line
includes the total count of records parsed, substances standardized, and
processing (error, warning, and informational) messages. Please note that
substances standardized count refers to chemical structures considered valid by
PubChem.
The pending submission table columns provide a summary of each deposition:
ID
The submission id number used in the Deposition Gateway.
Submitter
The person who initiated this submission. This is only present for accounts
which have multiple users.
Started
The date and time on which you initiated your submission.
Status
Summary of the submission status denoted by a graphical depiction.
Each dot represents a particular phase of the submission process. Green dots
denote success. Red dots denote failure. Blue dots denote an uninitiated
submission phase. Yellow dots denote a submission phase in-progress. When all
the dots are green, your deposition is in PubChem.
Data Set
The name of your data set, which, if you went the file upload route, is most
likely the name of your uploaded file, unless you have subsequently modified
the deposition.
Records
The total number of substances uploaded in this submission.
Curator
The person handling your deposition (typically assigned after you have
committed it). Unlike the other fields, this field
points to the curator's email address.
Select
A checkbox used to denote a deposition as selected for merging with other
depositions
The "Merge" procedure joins the records in two or more depositions
into one new deposition. The original depositions are not retained. The
resulting deposition appears in the "Pending" depositions list, and can
be further edited, split, added to, merged, etc.
The "Merge" is performed by selecting (checking the boxes in the list)
the depositions to be merged and then clicking the "Merge" button.
Note that the "Merge" feature is only available for depositions that
have not yet been committed. Also note that as of now, the depositions are
fully re-processed after a merge, and therefore may take some time to reach
the "Commit" stage again.
With an exception of "Select" column, the pending submission
table column headings can be used to sort the table. For
example, clicking on "Started" will sort the table by date.
Clicking the "Started" column header for a second time will
reverse the order of the sort.
Clicking on any row will put you
in the Validation Summary View under a dynamically-created Pending
<Submission-Id> Tab for the corresponding submission.
2.2.4 Substances > Pending <Submission-Id> Tab
This tab is created when viewing any details for a particular submission
in process.
A. Submission Page Overview
In this section the basic elements common to all views for a particular submission
will be explained.
To Proceed box
On the left side below the tabs, this box gives a hint for the next
step needed to push your submission forward towards deposition. |
|
Progress Meter
In the middle below the tabs, this meter gives information about the
current state of the deposition. |
|
The progress meter shows a graphical timeline of your deposition. The main stages of
the process are written above the meter. Each step may have multiple actions that
must be completed before going on to the next step. The specific action that the
system is undergoing at the moment is written immediately below the meter.
The Summary information below the Progress Meter provides the Submitted timestamp
of your upload and a summary of processing statistics. Critical statistics will
be highlighted in red.
The phases of the submission process, in order, are:
-
Submit
-
Standardize
-
Approve
-
Deposit in PubChem
Submit
The Submit processing phase includes your initial file submission upload and the
system's syntax validation of your submission. If your file is incorrectly formatted,
you can consult the appropriate Views for more information, but you must fix and
resubmit the file in order to proceed.
Successful completion of this phase automatically initiates the
next processing phase.
Standardize
The Standardize processing phase performs several actions on the data. It begins
by examining the data provided by your submission.
The chemical structure information is validated and standardized for use within
PubChem. Your textual information is also examined and validated. Substances
that fail this standardization process will not be assigned a CID. It is
possible that this phase will fail for some poorly formatted files. If your
substances do not have chemical information available, then it is expected and
normal that they will "fail" Standardization. This is the only action in which
failing may not require your intervention. In addition to this action, the
Standardize phase also includes additional validation checks.
The first validation action crosschecks your submission with
other submissions you have within the PubChem Deposition Gateway. These
crosschecks detect duplicate registry ID's and duplicate structures.
The second validation action crosschecks your submission with
previous submissions already deposited in PubChem. These crosschecks detect
duplicate registry ID's and duplicate structures.
The Standardize phase is initiated automatically after successful completion of
the Submit phase.
Approve
The Approve processing phase is available only to deposition account users and
must be initiated by the depositor by pushing the "Commit" button.
This phase cannot be initiated if any previous processing phase has failed.
The Approval phase is completed by a PubChem registrar. The registrar examines
the deposition for completeness and, after a brief review, either approves or
rejects the submission for addition to PubChem. A PubChem registrar may contact
you by phone or e-mail concerning your submission.
If the PubChem registrar approves the submission, the final Deposit in PubChem
phase is initiated automatically.
Deposit in PubChem
The Deposit in PubChem phase is the final processing stage, where your submission is
actually loaded into the PubChem system. A report, which you may download, is
generated from this loading process. After your data is successfully loaded
into PubChem, the data will usually become publicly accessible within two
business days.
Pending Deposition Views Available
-
Validation Summary View
-
Validation Details View
-
List All View
-
List Failed Standardization View
-
PubChem Structure Preview View
-
History View
A. Validation Summary View
Clicking the Validation Summary View provides a summary table of the unique message
categories encountered during your deposition. The summary table columns are:
-
Severity - Submission message severity
-
Category - Submission message category
-
Count - Count of submission messages of the category type
-
Phase - Submission processing phase where the message was encountered
-
Split Records - An icon
,
clicking of which results in splitting off records with this particular issue
into a new deposition.
The column headings can be used to sort the table. For example, clicking on
"Category" will sort the submission table by category. Clicking the
"Category" column header a second time will reverse the order of the sort.
Clicking on text in a row will transfer you to the Validation Details View
filtered to show you only those records with that
particular message category corresponding to the row you click.
B. Validation Details View
Clicking the Validation Details View provides a detailed list of all messages
generated by the PubChem processing. The details table columns are:
-
Severity - Submission message severity
-
Category - Submission message category
-
Message - Submission message
-
Record - SD file record number
-
Line - SD file line number that generated the message
-
Phase - Submission processing phase where the message was encountered
-
Edit Records - Icons
and
for editing or deleting of records.
-
Split Records - An icon
,
clicking of which results in splitting off this particular record into a new deposition.
The column headings can be used to sort the table. For example, clicking on
"Message" will sort the submission table by category. Clicking the
"Message" column header a second time will reverse the order of the sort.
To filter the details table by category of validation message, return
to the Validation Summary View. To filter the details table for a substance,
proceed to the List All View.
Clicking on any text in a row will, typically, spawn a new browser window.
Depending on the context of the message, this window will display different
information. If the message is in the context of the records you uploaded, the
window may display the SD file record number, a depiction image of the SD file
record, and the uploaded SD file record, prefixed with the SD file line number.
Alternately, if the message context refers to collision with a different
submission, you will view the submission record from that deposition.
C. List All View
Clicking the List All View provides a summary view of the processed data
records with counts of messages associated with that record. The processed
records table allows for rapid navigation to particular records in your
submission. The standardized table columns are:
-
Record - Sequential record number of your submission
-
Depiction - Thumbnail depiction for the submitted substance
-
RegID - Your unique registry ID for the substance
-
Comments - Your comments provided at the time of submission.
-
Edit Records - Icons
and
for editing or deleting of records.
The column headings can be used to sort the table. For example, clicking on
"RegID" will sort the table by registry ID. Clicking the "RegID"
column header a second time will reverse the order of the sort.
Clicking on a row will open a new window for the PubChem Structure Preview
View showing the substance record corresponding to the row you click.
D. List Failed Standardization View
In the case that you have substances which have failed standardization, an
extra View will appear to isolate those failed substances. Clicking the List
Failed Standardization View displays the same information as the List All View
filtered for failed records only.
Please note that if your substances do not have chemical information available,
then it is expected and normal that they will "fail" Standardization.
As with the List All View, clicking on a row will open a new window for the PubChem
Structure Preview View showing the substance record corresponding to the row
you click.
E. PubChem Structure Preview View
Clicking a particular substance record in the List All View or the List Failed
Standardization View opens a summary page of complete individual
substance records. The display of your substance closely resembles how the
submitted data will appear in the PubChem Substance Summary CGI. The URL links
are available on this page so you can test the links back to your website to
verify that the URL's work as intended. Additionally, you can see how PubChem
processing affects the data you have provided.
Various export buttons are available to allow you to examine the PubChem data
records generated for your substance in ASN.1, XML, and SDF file formats.
F. History View
Clicking the History View displays a detailed chronology of the processing phases
for the submission. The history table columns are:
-
Originator - Your unique registry ID for the substance
-
Date - Timestamp when a particular stage has completed
-
Stage - Graphical depiction of the stage
-
Comments - Processing stage specific comments.
The column headings can be used to sort the table. For example, clicking on the
column "Date" will sort the submission table by date. Clicking the
"Date" column header a second time will reverse the order of the sort.
Action buttons
-
Download Data Set Icon
-
Add a New Substance Icon
-
Add a Revoke Record Icon
-
Split Listed Below Icon
-
Save Report Icon
-
Delete Icon
-
Commit Button
Commit Button
Clicking the Commit Button enables you to deposit the submission in PubChem.
Your submission will be reviewed and, if approved, will be made public in
the PubChem data system.
Download Data Set Icon
Clicking the Download Data Set Icon will allow you to download the data you submitted.
Add a New Substance Icon
Clicking on Add a New Substance Icon redirects the user to the
Substance > New > Fill in Form Tab, and the record created there gets appended to the current
deposition.
Add a Revoke Record Icon
Clicking on Add a Revoke Record Icon redirects the user to the
Substance > New > Revoke Substance Tab, and the record created there gets appended to the current
deposition.
Split Listed Below Icon
When a pending deposition is accessed via "Validation summary" View, "Split Listed Below"
Icon is available. In this view any record that that has an error, a warning or just an info message appears
on the list. Therefore clicking "Split Listed Below" would split off all of those records into a
new deposition, and the ones remaining would have no issues of any kind associated with them.
When a pending deposition is accessed via "Validation details" View, "Split Listed Below"
Icon is available as well. In this view, however, only seleceted record with particular errors appear
on the list. Therefore clicking "Split Listed Below" would split off those records into a
new deposition.
Save Report Icon
Clicking the Save Report Icon allows you to download a report file in CSV (comma
separated value) format.
Delete Icon
Clicking the Delete Icon enables you to delete the submission you are currently
viewing from the Deposition system only. This means the submission will have no
effect on PubChem.
2.2.5 Substances > Deposited in PubChem Tab
This tab gives you an archive list of your substance submissions which were
successfully deposited in PubChem.
Clicking on a row will take you to an
Archived Submission Details View. From
there you will be able to find specific substances and go to the corresponding
entry in PubChem.
The column headings can be used to sort the table. For example, clicking on the
column "PC-Aid" will sort the table by that id. Clicking this
column header a second time will reverse the order of the sort.
ID
The submission id number used in the Deposition Gateway.
Submitter
The person who initiated this submission. This is only present for accounts
which have multiple users.
Finished
The date and time on which your submission entered PubChem.
File
The name of your uploaded datafile.
Records
The total number of substances uploaded in this submission.
2.3 Assays Tab
Clicking the "Assays" Tab puts you onto the assay welcome page.
2.3.1 Assays > Welcome Tab
The assay welcome page lists the main assay deposition activities from
which you can choose:
Please note that you may also use the main navigation bar tabs with identical
labels as the links, as they perform the same function.
Choosing an Assay Action
Perhaps the biggest change in how the Deposition Gateway handles assay
depositions is that now you choose from four distinct assay actions
whenever you want to affect your public data in PubChem. The first choice
you must make is whether you want to create a new assay or modify one of
your existing PubChem assays. If you want to modify a PubChem assay, you
have three further choices of possible modifications:
-
New assay
-
Add/Change data (Modify)
-
Alter description (Modify)
-
Replace assay (Modify)
You can leave your unfinished assay action at any time and return later
to find it under the Pending tab.
Once your action becomes part of PubChem, it will be removed from the
pending table. If you'd like to make further modifications, you will
choose it under the Modify tab
in your list of PubChem assays.
2.3.2 Assays > New Tab
-
Understanding Basic Concepts
-
New Assay Deposition Overview
-
New Tab Description
Clicking the "New" Tab is the starting point for creating a new
biological assay deposition in the PubChem Deposition Gateway as a means
of making your data public in PubChem. We will now have a brief overview of
the assay deposition process. If you would like to skip to the specific
explanation of the New Tab, click
here.
Understanding Basic Concepts
Prior to depositing biological assay data into PubChem, it is important to
understand the nomenclature we use so that you and we are referring to the
same elements. Please read the following paragraph to make sure we are clear
on a few terms: Substance, Assay Description, Assay Data, and
Activity Summary.
An Assay Description refers to the protocol and parameters of an assay,
which can only be defined once. Assay Data are the actual results;
as long as they follow the protocol of the Assay Description, Assay Data on new
substances can be continually added. A Substance is the stuff
being tested; typically it is what is in an assay plate well. A Substance can
be a discrete chemical entity, e.g. aspirin, or a complex mixture, e.g. a plant
extract. If you think the material in two assay plate wells is the same, we ask
that you refer to it as the same Substance with a single Activity Summary. If
you think material in two wells differ, please refer to them as two distinct
Substances, hopefully with different chemical structures (or different
mixtures), and surely with distinct Activity Summaries. It is of course very
common to do replicates across different batches and salt forms of a Substance
when you believe the salt form to be irrelevant to activity. Your data,
however, must be reduced to a single Activity Summary per substance that
is submitted as an integer value: "inactive" - 1, "active"
- 2, "inconclusive" - 3 (if there are indeed contradictory
replicates), "unspecified" - 4, or "probe" - 5.
In this way, your results will be much more accessible and
understandable to users through the various searching and graphing functions of
the PubChem Bioassay system.
New Assay Deposition Overview
There are several steps in a new assay submission that must be followed
sequentially to complete the process:
(Submit Substances) > Create Description > Add Data > Approve >
Deposit in PubChem
-
Submit Substances
This step is not mentioned on the Progress Meter but is a prerequisite for
assay depositions. Before depositing assay data please make sure that all
substances tested in your assay are deposited in PubChem and have SIDs
assigned. In your assay data file you can refer to substances with either
PubChem-assigned SIDs or your own unique RegIDs, but in order to complete
the assay deposition we must be able to find a valid PubChem SID for each
substance. Details on substance deposition are given in the description of
the Substances tab.
-
Create/Edit Description
An Assay Description defines the results you wish to report. You have the
ability to provide a detailed description of the assay being performed. There
are separate sections to provide a description, protocol, comments,
annotated cross-references, result definitions, and restrictions on allowed
data values. The description for a particular assay must be input before the
corresponding data can be uploaded for the sake of validation. Note that
once an assay description is defined, you can continually add results
tested on new substances.
-
Add Data
Once your description has been entered and verified, you upload your
appropriately formatted assay data file. The data will be parsed and
validated. All issues ranging from informational to critical errors will
be reported back to you. If there are critical errors, you must fix
them and resubmit your data file by first deleting the data file with
errors. For more information, see here.
-
Approve
If your data has no critical errors, it will be available for Preview.
Either click on the 'Preview In PubChem' button or on the 'Preview' tab.
The Preview will show you how your assay will appear to users in PubChem. This
is the last opportunity to validate your data before you commit it. If
you confirm that your assay action should be public in PubChem,
click on the "Commit" button
and a reviewer will inspect and approve it. At this stage if you find
an error, you must contact the reviewer for any possible
"emergency" assistance as you have already approved
it for deposition. Once Deposited in PubChem, additional changes can
be made by starting a Modify
action.
-
Deposit in PubChem
Once your data is approved by a PubChem curator it awaits final processing.
The assay processing cycle is designed to run once a day. The processing
includes additional validation steps and intensive post-processing. Please
note that due to loading and synchronization schedules of the PubChem
database servers, a moderate publishing delay should be anticipated. After
curator approval, the data typically will become public in PubChem within 48 hours.
New Tab Description
Click on the New tab under the Assays tab to begin uploading a bioassay
into the NCBI PubChem Deposition Gateway. If you are returning to resume
working on a new assay deposition, please look under the
Pending tab to find it.
Progress Meter
Just under the rows of tabs in the middle of the page is the progress meter.
The progress meter shows a graphical timeline of your deposition. The main stages of
the process are written above the meter. Each step may have multiple actions that
must be completed before going on to the next step. A brief explanation of each
step can be found in the
previous section.
Input Assay Description
Once you have read this section and are ready to input your description, begin
by choosing your method of input:
Prior to uploading or entering anything, please review this help document
describing the allowed file formats. Once the bioassay has been deposited,
all parts of it must pass an automated validation procedure without errors
in order to be accepted into PubChem. If you need to make changes to your assay
after deposition to PubChem, please refer to the
"Modify" tab.
An Assay Description defines the results you wish to report. You have the
ability to provide a detailed description of the assay being performed. There
are separate sections to provide a description, the protocol, comments, the
activity outcome method, target data,
annotated cross-references, result definitions, and restrictions on allowed
data values. The description for a particular assay must be input before the
corresponding data can be uploaded for the sake of validation. Once your assay
is defined in PubChem with an initial set of data, you can continually add
results tested on new substances for the same assay description by going to the
"Modify" tab.
Descriptions can be input in a number of ways: by filling in the form on the
webpage, by uploading an XML file or a series of CSV files, or by using one of
your existing PubChem assays as a template.
You must define at least one result definition (TID). To see an example
description, download and upload this
example file.
Fill in Form
Enter each of the required and optional fields necessary to describe
the assay (as described below) into the corresponding boxes. Once the boxes are
completed, click on "Create" to enter the data or "Cancel"
to start over.
Upload Assay Description from File
There are two basic ways description information can be uploaded via a file.
- By using the native
PubChem Bioassay description specification, the appropriate XML (.xml,.xml.gz) or
or ASN.1 (.asn,.asnt,.asn.gz,.asnt.gz) file can be uploaded. This option requires the
depositor to have some programming experience to generate such a file,
though a file downloaded from the
PubChem FTP site could be modified and used as a guide. The real advantage
to this option is that it can be combined with BioAssay FTP uploads to give
an automated upload procedure for large numbers of assays. Here is an
example XML file.
-
By using CSV (.csv) files, individual sections
of the assay description can be uploaded one at a time and the
information will be progressively populated in the webform. Alternatively,
a single spreadsheet file, either OpenOffice (.ods) or Excel (.xls,.xlsx),
containing multiple sheets can be uploaded at once
(see examples). This allows one to take
description information saved in popular spreadsheet programs and upload
it without conversion to other formats. There is, however, a requirement
that the description information be organized in specific ways so that
our system can recognize and better validate it. The principal
way this is done is through standard header tags that must be used at
the top of each column. Details about how to setup such files can be
found in this
accompanying help document.
Use PubChem Assay as Template
Choose one of your existing PubChem assays from the pulldown
menu and click on "Load". This option is only for convenience;
the assay you are creating will have no special link to the assay you
chose for a template. You will be required to create a new RegID and Name
for your new assay as with the other two input methods.
Assay Description Fields
The description of the assay defines the assay purpose and parameters.
Fundamentally, the Assay Description defines the "columns" that are populated
by the Assay Data "rows". Each "column" is assigned a result type
identity (TID) in the Results Definition section. The Assay Data uploaded
later must be reported in the same order as the TIDs defined in the Assay
Description. Additionally, the Assay Data must be consistent with the Assay
Description TID definitions.
The description of an assay consists of nine parts: External Assay RegID, Name,
Description, Protocol, Comments, Activity Outcome Method, Target Data, XRef
Data and Results Definition.
External Assay RegID
The external assay identifier assigned by the depositor. This must be
unique amongst your other PubChem assays.
Name
A short, informative name of the assay for display purposes.
Description
A definition of the assay purpose and parameters.
Protocol
The assay protocol description must be provided here.
Comments
Any comments on the assay can be provided here.
Substance Type
By default assays are assumed to be tested on small molecules. With this
pulldown, nucleotides can also be specified.
Grant Number
For NIH screening centers only, a grant number can be specified. Note
that this string is not validated.
Hold Until Date
Optional hold-until date for bioassay data you upload into PubChem. If this
field is set to a future date, your bioassay data will be made accessible
to PubChem users only after that date. Your access to the data will also
be limited until that date, only via the PubChem deposition-system account
you have used for upload. Only set a hold-until date if you wish to delay
public release of bioassay data, for example to match public access in
PubChem with the publication date of a journal article about that bioassay.
And please note that PubChem will only accept bioassays with either no
hold-until date, or a hold-until date less than one year in future from
the initial upload date.
Project Category
- NIH Molecular Libraries Probe Production Network (MLPCN)
This assay category should be selected by depositors that participate in MLPCN and
the assay experiment was funded by MLPCN grant
- NIH Molecular Libraries Screening Center Network (MLSCN)
This assay category should be selected by depositors that participate in MLSCN and
the assay experiment was funded by MLSCN grant
- NIH Molecular Libraries Probe Production Network (MLPCN), Assay Provider
This category should be selected for bioassay depositions where assay data is provided
or developed by assay providers participating in MLPCN projects.
- NIH Molecular Libraries Screening Center Network (MLSCN) , Assay Provider
This category should be selected for bioassay depositions where assay data is provided
by assay providers participating in MLSCN projects.
- Literature, Extracted
This assay category should be used for assays that have their data extracted from
literature by 3rd party (not by author or article publisher)
- Literature, Author
This assay category should be used for assays that have their data extracted from
article by author
- Literature, Publisher
This assay category should be used for assays that have their data extracted from
literature by publisher
- RNAi Global Initiative
This assay category should be used for assays that are being deposited by under
RNAi Global Initiative
- Assay Vendor
This category should be used for bioassay depositions contributed by assay service providers
Activity Outcome Method
You must classify the activity outcome method of your assay here.
Choices include:
-
Screening assay
- Single Concentration Activity Observed:
Activity outcome was
defined based on the percentage of inhibition from test at a single dose.
-
Confirmatory assay
- Concentration-Response Relationship Observed:
Activity outcome
was defined based on EC50/IC50 values and so forth, derived from dose
response curves following testings with multiple concentrations.
-
Summary assay
- Candidate Probes/Leads with Supporting Evidence:
An assay which summarizes information from multiple assays.
Summary assay is a special kind of assay which gives
users a summary of the project and brief overview of all related
screening and confirmatory assays. A summary assay should be created
simultaneously with the first (screening or confirmatory) assay of
the project. At the beginning, data is optional
for creating a summary assay (unlike other assay types).
As the project progresses, the summary assay needs to be updated
with additional descriptions, related assays, any probes
identified and associated test results (if need be).
A summary assay should always reference all assays
it summarizes through its XRef fields
to related assays. When linking to related assays with XRefs,
make sure to provide a brief comment of how each assay fits into
the overall picture of the project. Note that if you are linking
to another assay which is pending in the deposition system, but not
yet deposited into PubChem, you must link to it with its regid that
you supplied (its PubChem AID will not yet be assigned).
To identify probes, depositors must minimally supply a CSV datafile with
two columns defined including their headers: PUBCHEM_SID and
PUBCHEM_ACTIVITY_OUTCOME, where the latter column will have a value of 5
set for probes. In case additional depositor-defined readouts are provided,
the regular CSV file format
should be used. Readouts previously reported in related assays do not need to
be repeated in the summary assay.
-
Other
- An assay which does not fall into the above categories
Active Concentration TID
For Confirmatory assays only,
an additional pulldown menu appears requiring the indication of which of your
TIDs provides active concentration summary. Such a summary might be reported
as the concentration which produces 50% of the maximum possible biological
response such as IC50, EC50, AC50, GI50 etc. or by reporting constant
parameters such as Ki, that based on which the activity outcome of your
assay is called. Please choose the column number and TID name as found in
your Results Definition list.
Target Data
For any assay designed to identify chemicals interacting with a protein target,
such as enzyme inhibitors, please add the identifier of the target molecule
from one of the following NCBI databases:
Please note that for such assays you should not add an
additional XRef protein link. In the opposite case, in which it is only
known that an assay is identifying
modulators that affect some biological processing, for example, to identify
compounds affecting certain protein expression, it is appropriate to
identify a protein with an XRef link (described in the next section) and not
with Target data.
Related Assay XRefs
The Related Assay XRefs section allows for linking an assay (e.g. "A") to other
PubChem assays (e.g. "B") including relevant assays from other depositors. To link
assays "A" and "B" depositor can add links (XRefs) to both of them and in that case
XRefs become part of the assay records. Being part of the records, links will be
included in assay ASN blobs when exporting those assays using PubChem web interfaces or FTP.
Depositors have option of adding Xrefs to only one of the assays (e.g. "A") and PubChem
then will automatically add reciprocal link to all display interfaces for assay "B".
In that case, however, the Xref link will not became part of the assays record for "B"
and will not be included in export functions (e.g. FTP). Also note, that PubChem does
not automatically build back-links from assay "B" when assay "A" has hold-until date.
PubChem-build back-links will appear after hold-until date.
Other XRefs
The Other XRefs section links to
relevant data from other NCBI databases and beyond. Examples include PubMed Ids
(PMIDs), Taxonomy Ids, OMIM Ids, reference URLs to your source database/assay, etc.)
Attention: for XRef protein links please
see the previous section on Target data to determine whether you should
make an XRef protein link or fill out Target data information. You should
not do both.
Type
Choose from a list to classify the data type.
Primary Citation? (PubMed-Id Type Only)
If checked for a PubMed-Id, indicates citation is directly relevant to the
assay, thereby allowing your assay to be discoverable in PubMed from the cited record.
Value
The actual data, such as a URL or an identifer.
Annotation
A comment to describe the XRef data.
Results Definition
Column definitions for the assay results that will be uploaded in the next
step. Use the "Add" and "Remove" buttons to create the same
number of results definitions as there are columns in the assay data. For each
definition there are the following fields:
Name
The name of a result. Keep this short, but informative.
Type
The result type typically is either a Float, Integer, Boolean or String.
Optionally, the type can be used to specify an identifer, such as one
coming from another NCBI Entrez database. For example, if PubMed Id
is chosen as the type, then all data values in this column will be checked
to ensure that they are valid PubMed identifiers. The following is a list of
accepted identifier types:
-
PubMed Id
-
MMDB Id
-
URL
-
Protein GI
-
Nucleotide GI
-
Taxonomy Id
-
OMIM Id
-
Gene Id
-
Probe Id
-
PubChem BioAssay Id
-
PubChem Substance Id
-
PubChem Compound Id
-
Protein Target GI
Use this only when an assay contains multiple targets.
-
Biosystems Target Id
Use this only when an assay contains multiple targets.
-
Target Name
Use this only when an assay contains multiple targets.
-
Target Description
Use this only when an assay contains multiple targets.
-
Target Tax-Id
Use this only when an assay contains multiple targets.
-
Gene Target Id
Use this only when an assay contains multiple targets.
-
DNA Target GI
Use this only when an assay contains multiple targets.
-
RNA Target GI
Use this only when an assay contains multiple targets.
Unit
Various units are available to choose from if appropriate.
Description
More description to the result beyond its name.
Constraint
Limits on the range of accepted values for integers and floats. The more
limits that can be introduced, the more validation can be performed on future
data added to the assay. A minimum and/or maximum can be specified or
specific acceptable values can be specified.
Set of Values
Individual allowed values for integer type only.
Minimal Value
A single number to specify minimum possible allowed value for integer or float
type only.
Maximal Value
A single number to specify the maximum possible allowed value for integer or
float type only.
Range
A Minimal Value and a Maximal Value.
Attribute: Tested Concentration
If box is checked, the micromolar concentration at which this result was
tested. This concentration attribute indicates that the readout under
this test result field is biological concentration-response data, the attribute
provides the value of the tested concentration in micromoles.
Attribute: Concentration-Response (CR) Plot Labels
Use this attribute to track concentration-response series for confirmatory assays only.
If the Tested Concentration attribute for a result
definition is filled in, then the optional "CR Plot Labels" menu appears for that TID.
By default, only one CR label appears in the menu but the user can add labels by
visiting the "Concentration-Response (CR) Plot Labels" section at the bottom of the
description page.
Multiple labels are useful for assays with multiple series of data
and tested concentrations. For each CR Plot Label series there should be at least
three activity data points with tested concentration attributes set.
Collecting this information allows PubChem to annotate and track the
concentration-response series reported, and will facilitate the development
of new features such as drawing dose-response curves upon request of PubChem users.
Derived by equation?
PubChem attempts to record and distinguish experimental dose-response data points vs.
data theoretically calculated such as using curve-fitting algorithms. For each
concentration-response series input, if this box is not checked, the status as
'experimental data' would be assigned and recorded.
If checked, this option allows one to define an alternative curve fit as desired,
(e.g., dropping outliers, using other fitting functions), by supplying just enough
data points (about 10 are recommended) to allow a Hill equation to draw a line
that presumably fits another experimental series that you have defined.
Is Panel Assay?
Panel Assay Introduction
PubChem now expands the bioassay data model to support the presentation and annotation
of profiling screening results. The following video gives a quick overview of
how a Panel Assay looks in PubChem.
A single panel-type PubChem bioassay record may contain
readouts and the respective bioactivity outcome annotations for screening tests over
multiple individual targets, cell lines or species. Each of such targets, cell lines
or species is regarded as a "panel component". Description of the
experiments, including a short name, general goal, specific experimental
protocol, and information of assay target, can be provided for each individual
panel component. A panel component should be associated with one or multiple
test result fields(TID). The test results for each panel component can be
designated as "bioactivity outcome", "active concentration"
if need be, or otherwise are treated as regular readouts.
Profiling test results is complex, this expansion of PubChem bioassay data model
allows one to describe a compound profiling screening test, and enables PubChem to
record and annotate multiple related bioactivity outcomes under a single AID.
Such grouping facilitates straightforward comparison and evaluation of compound
bioactivities using the profiling results through the PubChem data analysis tools.
To see a panel assay example, check out the
kinase profiling assay.
Creating a Panel Assay
The following video gives a quick overview of how to create a Panel Assay in the Deposition
Gateway:
The following video shows the appropriate format of your panel CSV file:
Checking the Is Panel Assay? box designates your assay as a multi-target panel assay and enables
an additional input mechanism to define your assay. Such assays are very complex
in nature and we have tried to make the interface as user-friendly as possible.
Please remember, however, that extra attention should be paid to panel assay
definitions and data to ensure their accuracy. Also remember, if the assay seems
too complicated to deposit, it may also be too complicated for PubChem users to
understand!
Name
Short name for the panel, such as "Kinase Profiling".
Description
Short description of the panel.
Load Panel Component Info from CSV file
A comma-delimited CSV file is used to define panel components. Note that this CSV file
is additional to and independent of the CSV file used later for your assay data.
The Panel Component Info CSV file consists of one required and several optional columns as follows:
-
PANEL_ID (Required)
This is your panel component id and is important because it allows you to associate
one or more result descriptions (TIDs) with it. It must be an integer starting from
one and ascending by ones.
-
PANEL_NAME (Optional)
Short name of panel component.
-
PANEL_DESCRIPTION (Optional)
Short description about specifics of panel component, such as about cell line,
or target information.
-
PANEL_PROTOCOL (Optional)
Specific procedure used to generate results for the panel.
-
PANEL_COMMENT (Optional)
Additional information.
- The following three labels are used to specify a
target, which
is often provided for profiling assays across protein families.
-
PANEL_TARGET_NAME (Optional)
Not necessary to provide - this will be filled in automatically
unless you provide a value.
-
PANEL_TARGET_ID (Optional)
This is mandatory if any of the target fields are present.
-
PANEL_TARGET_TYPE (Optional)
This is mandatory if any of the target fields are present. It
is an integer: Protein(1), DNA(2), RNA(3), Gene(4), BioSystems(5).
-
PANEL_TAXONOMY (Optional)
NCBI Taxonomy-id (integer).
-
PANEL_GENE (Optional)
NCBI Gene-id (integer).
-
PANEL_ACT_OUTCOME_METHOD (Optional)
Assay outcome qualifier (integer). Choices include screening (1),
confirmatory(2), summary(3) and other(0). See
here for more explanation.
-
PANEL_TID_NAMES_REGULAR (Optional)
Names of existing result descriptions (TIDs) in your assay separated by
a "|". This is a convenient way of mapping panel
components to one or more TIDs. In addition, all TIDs listed in this column
will get marked as regular type TIDs. This speeds up the input process so that
you do not need to choose from the pulldown menu of each TID. Please
make sure to first define your TIDs and then upload your panel info file.
-
PANEL_TID_NAMES_OUTCOME (Optional)
Names of existing result descriptions (TIDs) in your assay separated by
a "|". This is a convenient way of mapping panel
components to one or more TIDs. In addition, all TIDs listed in this column
will get marked as outcome type TIDs. This speeds up the input process so that
you do not need to choose from the pulldown menu of each TID. Please
make sure to first define your TIDs and then upload your panel info file.
-
PANEL_TID_NAMES_SCORE (Optional)
Names of existing result descriptions (TIDs) in your assay separated by
a "|". This is a convenient way of mapping panel
components to one or more TIDs. In addition, all TIDs listed in this column
will get marked as score type TIDs. This speeds up the input process so that
you do not need to choose from the pulldown menu of each TID. Please
make sure to first define your TIDs and then upload your panel info file.
-
PANEL_TID_NAMES_AC (Optional)
Names of existing result descriptions (TIDs) in your assay separated by
a "|". This is a convenient way of mapping panel
components to one or more TIDs. In addition, all TIDs listed in this column
will get marked as active concentration type TIDs. This speeds up the input process so that
you do not need to choose from the pulldown menu of each TID. Please
make sure to first define your TIDs and then upload your panel info file.
Alternative Ways to Upload a Panel Assay
All of the preceding discussion of panel assays assumes that a new panel assay
will be loaded into the Deposition Gateway in three steps: 1) fill out the
form in the web interface for the description, 2) upload the optional panel
csv file to define panel components, and 3) upload the data csv file.
Because of the complexity of panel assays, it may be more efficient, however,
to create an XML file of the assay based on our data specification to minimize the use of
the web form. This will allow you to bypass some or all of the steps mentioned
above. Here are the alternative options:
-
XML File Upload via the Web - Description Only
In this route the entire assay description including panel components is
defined in an XML file and uploaded using the file upload option for new
assays. This will prepopulate all fields in the description form and allow
you to accept it and move on to the step of uploading your regular data
csv file. This means you do not provide a special panel csv file.
-
XML File Upload via FTP
In this route the entire assay description and all data is
defined in one XML file and uploaded using a private FTP
account. The assay will show up in the Deposition Gateway after all parsing and
validation of data has completed. No other files are needed to define
your panel assay.
Create your Description
Once you have finished entering your description and verified that it is
accurate, click on the "Create" button. If the system finds no
errors with it, it will become a pending assay in the Deposition Gateway and
you will be routed to a dynamically-created tab entitled "New assay
assay-id" where the assay-id is an identifer to keep
track of the assay while it is in the Deposition Gateway. To continue
reading about the next step in the new assay deposition process, click on
Add Data. Otherwise, we will now discuss
the next Tab under the Assay Tab, the Modify Tab, which allows for various
operations to existing PubChem assays.
2.3.3 Assays > Modify Tab
Clicking on the "Modify" tab routes you to a Modify "Welcome"
tab starting on a third row of tabs.
2.3.3.1 Assays > Modify >
Welcome Tab
The modify assay welcome page lists the three types of modifications you
can make on one of your existing PubChem assays:
-
Add/Change Data
- without description change
-
Alter Description
- with changes such as fixing a typographical error or adding XRef
data like a URL (no data can be added)
-
Replace Assay
- to make a significant change such as adding/removing/modifying a
result column (all data must be resubmitted to replace existing data)
-
Revoke Entire Assay
- to suppress from searches of PubChem
2.3.3.2 Assays > Modify >
Add/Change Data Tab
This tab is the starting point for the most common type of modification to
an existing PubChem assay: adding or changing data results. With this
mechanism you can add new data results, replace selected data results, or
remove data results that are no longer valid.
Note that any duplicated substance (SID/RegID) test results for a given assay
(whether in the same data file or not) will be archived in PubChem. Only the most
recent one will be available for searching.
If you are returning to resume working on an Add/Change Data action, please
look under the Pending tab to find it.
To revoke test results please submit a csv file with the
following format.
If your intention is to revoke the actual substance, you must first revoke it
from any assays where it is a test result, then revoke it from the
Substancestab.
Progress Meter
Just under the rows of tabs in the middle of the page is the progress meter.
The progress meter shows a graphical timeline of your deposition. The main
steps of the process are written above the meter. These steps to Add/Change
Data to a PubChem assay must be followed sequentially to complete the process:
(Submit Substances) >
Add Data >
Approve >
Deposit in PubChem
Each step may have multiple actions that
must be completed before going on to the next step. Click on a step for a
brief explanation.
Submit CSV Data File
Pick the PubChem assay you wish to modify from the pull-down menu, click on
"Browse" to choose your CSV data file, and click "Submit".
Note that if you are already modifying this assay in a pending action,
you will not be able to proceed. When it loads, an
Add/Change Data <Assay-Id> Tab
tab will be created for you for the next step of validating your data.
Also note that for this action you can only view the description, but can not
modify it.
2.3.3.3 Assays > Modify >
Alter Description Tab
This tab is the starting point for making small changes to your description.
Typical examples of changes you can make here are fixing typographical errors
in the Description/Protocol/Comments sections
and adding XRef data, like a URL to your website. No data can be added in this
action. You can not change the meaning or number of Results columns
because such changes invalidate the assay's existing data. If you must do that,
please see the
Replace Assay tab.
If you are returning to resume working on an Alter Description action,
please look under the Pending tab
to find it.
For this action the revision of your PubChem assay will be incremented, but
the version will remain unchanged. While in the Deposition Gateway, the
pending assay will show a blank revision since it is being modified.
Progress Meter
Just under the rows of tabs in the middle of the page is the progress meter.
The progress meter shows a graphical timeline of your deposition. The main
steps of the process are written above the meter. These steps to Alter
Description of a PubChem assay must be followed sequentially to complete
the process:
Edit Description >
Approve >
Deposit in PubChem
Each step may have multiple actions that
must be completed before going on to the next step. Click on a step for a
brief explanation.
Choose existing Description to modify
Pick the PubChem assay you wish to modify from the pull-down menu and click
"Load". Note that if you are already modifying this assay in a
pending action, you will not be able to proceed. When it loads, an
Alter Description <Assay-Id> Tab
tab will be created for you for the next step of modifying your description.
Remember that for this action you can not submit data.
2.3.3.4 Assays > Modify > Replace Assay Tab
This tab is the starting point for making significant changes to your description.
Typical examples of changes you make here are adding or removing a Results
column or changing the data type of a Results column. For this action you
must resubmit all of your data results along with your description change.
If an existing data result is not resubmitted with this action, it will no
longer be available in PubChem when the change is made public.
Special note: This is a powerful action that should be used as a last
resort. It is your responsibility to maintain consistency with what this
assay currently means in PubChem. Think of PubChem users who expect that
the existing data to this assay may grow in number, but will not change
in definition. Even here you can not modify the
External Assay RegID. If this is what
you want to do, please consider creating a new assay.
You can not use this action to only make small description changes,
like adding a URL XRef. To do that please see the
Alter Description tab.
However, if you have a modification for a result's name or description which
would invalidate existing PubChem test results, then this is the correct action.
If you are returning to resume working on a Replace Assay action, please
look under the Pending tab to find it.
For this action the version of your PubChem assay will be incremented and
the revision will be reset to zero. While in the Deposition Gateway, the
pending assay will show a blank for both version and revision since they
are being modified.
Progress Meter
Just under the rows of tabs in the middle of the page is the progress meter.
The progress meter shows a graphical timeline of your deposition. The main
steps of the process are written above the meter. These steps to Add/Change
Data to a PubChem assay must be followed sequentially to complete the process:
(Submit Substances) >
Edit Description >
Add Data >
Approve >
Deposit in PubChem
Each step may have multiple actions that
must be completed before going on to the next step. Click on a step for a
brief explanation.
Choose existing Description to modify
Pick the PubChem assay you wish to modify from the pull-down menu and click
"Load". Note that if you are already modifying this assay in a
pending action, you will not be able to proceed. When it loads, a
Replace Assay <Assay-Id> Tab
tab will be created for you for the next step of modifying your description.
Remember that this action will replace all of this assay's existing
PubChem data.
2.3.3.5 Assays >
Modify > Revoke Entire Assay Tab
This tab allows you to suppress one of your PubChem Assays from Entrez searches.
Once your assay is revoked, it will only be publically available through the
PubChem BioAssay Summary service by providing the AID. This operation is
considered a major update for the bioassay record, for which you must provide
a comment for the reason of the revoke; the comment provided will be included
in the bioassay record. As with other deposition operations, it will be
reviewed by a curator.
2.3.4 Assays > Pending Tab
This tab gives you a list of your unfinished or recently added depositions
to PubChem. To resume working on a given assay (or simply to view its
detailed information), click on one of the fields in its row. If you
have not yet started on your desired assay operation, please choose from the
New or
Modify or tabs as appropriate.
Please note that once your deposition has been successfully uploaded to PubChem,
you will view it in PubChem and not in the Deposition Gateway. The successful
deposition will remain listed here for a short time and then you can see a
history of the operation under the
Deposited in PubChem tab.
If you'd like to make further modifications, you will
choose it under the Modify tab
in your list of PubChem assays.
Also note that unfinished assay actions will be deleted from the Deposition
Gateway after one month of inactivity. This will have no affect on PubChem
and only means that you will need to re-enter your description and/or data
as appropriate.
The column headings can be used to sort the table. For example, clicking on the
column "Action" will sort the table by the type of action. Clicking this
column header a second time will reverse the order of the sort.
Assay
The temporary id assigned to track substances while in the deposition system.
PC-AID (Ver.Rev)
The permanent assay id number assigned once the assay is accepted into the
PubChem system. Note that this identifier will only be non-zero for one of
the Modify operations to one of your
existing PubChem assays.
"(Ver.Rev)" refers to the Version and Revision of your PubChem
assay. These will also be blank if the PC-AID has not yet been assigned.
In addition, if the modify operation you are undergoing will have the
affect to change either the Revision
or the Version and Revision,
then the respective place holders will show a "-" to indicate
they are being updated.
Action
One of four types of actions you can perform on PubChem assays.
-
New assay
-
Add/Change data (Modify)
-
Alter description (Modify)
-
Replace assay (Modify)
Status
The current step of the assay in the deposition process.
RegID
The substance registry id as supplied by the depositor.
Name
A descriptive name of the assay.
Date
The date and time on which your assay operation began.
Curator
The person handling your deposition (typically assigned after you have
committed it). Unlike the other fields, this field
points to the curator's email address.
2.3.5 Assays > <Assay-Action> <Assay-Id> Tab
-
Assays > New Assay <Assay-Id> Tab
-
Assays > Add/Change data <Assay-Id> Tab
-
Assays > Alter description <Assay-Id> Tab
-
Assays > Replace assay <Assay-Id> Tab
This tab is created when working on and viewing any details for a particular assay
in process. It's name tells you the type of
action you are performing and
the temporary id assigned to track the assay while in the deposition system.
Please note that this is different than the permanent
PubChem-AID.
Page Layout
To Proceed box
The To Proceed box on the left side below the tabs gives you a hint of what
you must do next in order to advance your deposition to completion.
Views
The Views box on the lower left side lets you pick appropriate informational
views of your deposition relevant to the current stage of the process.
We will now go through a detailed explanation of the various views available.
Some views are unique to one step of the process, some are unique to one
of the four assay actions, and others are common (for example "View
Description"). Please find the View you have questions about and read
more about it.
Add Data View
This is the View where you upload your assay data file in
CSV format.
Click on "Browse" to choose your CSV data file, and click
"Submit". If you are trying to find this view and already have
data uploaded in your deposition, first click on
Delete Data,
then you will see this View. Also note that this View is not appropriate
for the Alter description action
as it does not allow data uploads.
The data will be parsed and validated against the description information to
find all relevant issues with the data. If there are any errors, you must
resolve them before the data can be committed into PubChem.
CSV formatted assay data
The PubChem BioAssay Deposition System allows the use of CSV (Comma Separated
Value) formatted data files for assay data deposition. The CSV column ordering
for the first seven columns is fixed and must be exactly as documented
below. Beyond that, there must be a column for each result (TID) defined
in the description.
The best way to get familiar with this format is to click on the
"CSV Template" link (in the Add Data View only) to
download a CSV template file using the Assay Description that you have already
entered. This is a guide so that you can cut and paste your data into
this CSV file while maintaining the correct number of columns.
For fields without data there will be nothing but consecutive commas. We
also have an example CSV file with
data. Your CSV file should have column headers show below as well as the names
of the result definitions that you have defined; any deviations will cause errors.
Note that any duplicated substance (SID/RegID) test results for a given assay
(whether in the same data file or not) will be archived in PubChem. Only the most
recent one will be available for searching.
The following columns are accepted in your CSV file along with column headers
using the names of your result definitions. If a particular data cell does not have
anything to report for a given column or it is not applicable, simply leave it blank.
Column 1: PUBCHEM_SID
If you have previously deposited your Substance description into PubChem, you
may use your Substance identifier (SID) assigned by PubChem. This must be an
unsigned integer value and, in nearly all cases, your organization must have
deposited the Substance associated with this SID. Optionally, you may choose to
use "Column2" instead, to provide your own Substance identifier, and, if you
do, you must set this column value to be "0". If you have not previously
deposited your Substance descriptions into PubChem, you must, at a minimum,
have these in the PubChem deposition system prior to uploading Assay Data. If
you have Substance descriptions in the PubChem deposition system, you may have
Assay Data refer to these by setting the value in this column to "0" and use
"Column2" to provide your identifier to this Substance.
Column 2: PUBCHEM_EXT_DATASOURCE_REGID
You may use your own identifier for Substance descriptions previously loaded
into either PubChem or the PubChem deposition system. If you provide a value in
this column, you must set the value in "Column1" to "0" or leave it blank.
If you choose to identify the Substance for which you are providing data using
"Column1", please leave this column blank.
Column 3: PUBCHEM_ACTIVITY_OUTCOME
The Activity Summary for every Substance has two parts, the outcome and the
score. The outcome for each Substance is reported as an integer value in this
column and must be one of five different values:
1 - Substance is considered inactive.
2 - Substance is considered active.
3 - Substance activity outcome is inconclusive.
4 - Substance activity outcome is unspecified.
5 - Substance identified as a probe.
Column 4: PUBCHEM_ACTIVITY_SCORE
The Activity Summary for every Substance has two parts, the outcome and the
score. The score for this Substance is reported in this column and must be an
integer value, where larger values are more active and smaller values are less
active. Please make sure your scores are on a linear scale because that's how
they will be interpreted. We encourage depositors to consider using the range
0-100, although values larger and smaller are allowed. The score values are
used to allow PubChem users to partition, sort, and profile Assay Data results
within and between biological assays.
Column 5: PUBCHEM_ACTIVITY_URL
An URL may optionally be provided for Assay Data reported for this Substance in
this column. This URL will be provided within PubChem displays to allow a
PubChem user to link to your website, where you may choose to provide
additional information or interfaces to your Assay Data, for example,
dose-response curves, replicate data, etc.
Column 6: PUBCHEM_ASSAYDATA_COMMENT
Your textual annotation and comments may optionally be provided for Assay Data
reported for this Substance in this column.
Column 7: PUBCHEM_ASSAYDATA_REVOKE
When you submit the data you must leave this blank or put a value '0' in this
column. You may optionally suppress Assay Data for this Substance by putting a
value of "1" in this column. In this case, leave all other columns blank except
for Column 1: PUBCHEM_SID. Suppressing Assay Data does not delete data
from PubChem, rather it eliminates all references and links to this
information; however, all pre-existing links to this information will still
function and a disclaimer will be displayed specifying this data is revoked.
You may un-revoke Assay Data for a Substance by depositing either the same or
new data for this Substance. Do not revoke and submit the same substance in the
same file.
Columns 8 and higher (one column per TID): PUBCHEM_ASSAYDATA_VALUE
All remaining columns are an order dependent one-to-one correspondence between
the result definitions (TIDs)
defined in the associated Assay Description. All defined "columns" must be
present; however, values are optional in individual fields. Consult the
auto-generated
CSV template file with your description information to see the layout.
Validation Summary View
Display issue categories related to the parsing and validation of your assay
data. This view shows the general types of issues found in processing
the data including errors, warnings and info. If errors are found,
they must be resolved before the data will be accepted into PubChem.
Warnings and info issues do not have to be resolved, but often indicate
something that should be adjusted.
N
Issue count.
Severity
Issue type: Error, Warning or Info. All
Error issues must be resolved.
Category
A short description of the issue. For greater detail, go to the
Validation Details View.
Count
The number of instances of this issue found.
Depositors are able to modify/change their uploaded CSV file by uploading a
new one.
Validation Details View
Display all instances of issues related to the parsing and validation of
your assay data.
This view lists a line for each issue found in processing the data including
errors, warnings and info. This list can be very large in some cases, so
it is best to begin with the
Validation Summary View.
If errors are found,
they must be resolved before the data will be accepted into PubChem.
Warnings and info issues do not have to be resolved, but often indicate
something that should be adjusted.
N
Issue count.
Severity
Issue type: Error, Warning or Info. All
Error issues must be resolved.
Category
A short description of the issue.
Message
The detailed message for this instance of the issue.
Data Row
The record number from the input file where the issue was found, if applicable,
otherwise set to 0. Note that this number will be one less than the line
number of your CSV file because of the header line.
Column
The column in the input file where this issue was found if applicable.
View Description View
Review the description of a pending assay in read-only format. To see
the description in machine-readable format, click on the
Export Files Pulldown and choose either
the XML or ASN format (if you have data loaded, those options will also
include the data in the file). If you want to edit the description, you
must first delete any uploaded data,
then go to the Edit Description View.
If you want to remove the assay from the Deposition Gateway (no affect on
PubChem), again make sure any uploaded data is deleted, then click the
Delete Session Icon.
For more information on specific assay description fields, see
here.
Edit Description View
Make modifications to the description of a pending assay.
This view is only available when any uploaded data has been
deleted from your pending deposition.
This view is never available for the
Add/Change data action.
Restrictions on what you can edit apply in particular for the
Alter Description action, but in
no case can you edit the External Assay RegID.
For more information on specific assay description fields, see
here.
History View
This view displays a detailed chronology of the processing steps for the
pending assay action. This history is only for the current pending
action and does not include previous actions you have committed to PubChem
for the same PubChem AID. To see an overall history of the actions you
have committed to PubChem for all of your assays, click on the
Deposited in PubChem tab. The
columns are:
N
Count of processing step.
Originator
Who initiated the step, typically you, a curator or an automatic
process ("Service Daemon").
Date
The date and time the step was taken.
Status
The name of the assay deposition step.
Comments
Additional description of the step taken.
View Data View
Display uploaded assay data in read-only format. Of course this view
is only appropriate if you have uploaded an assay data file. If that
file has passed the first phase of the
Add Data step, which is
"Data Parsing", then you will see your data file parsed into
columns with the corresponding headers at the top and the data displayed on
multiple pages as necessary.
The first column,
"N", numbers the records and the next seven columns are the
predefined columns specified earlier
for the CSV format. The second
column, SID,
is the PubChem Substance identifier. Each SID number links back to the
appropriate PubChem substance summary page. The remaining columns,
TID1..N,
correspond to the Results
Definitions as shown in the
View Description View. If you have
failed "Data Validation", the second phase of the
Add Data
step, it is useful to look at this parsing and make sure it is what you
intended. Perhaps you forgot a comma somewhere in your CSV file and your
data is lined up with the wrong column headers.
Note that if your file
could not get past the first phase of "Data Parsing", then an
attempt will be made to show the text of your file as is. For
convenience we will add line numbers, "N", and then show the
text under the header "Unparsed Text".
If you would like to change something in your data file, first click on
Delete Data, and then resubmit your
modified file. If you would like to download your original CSV file or
the machine-readable (XML/ASN) file generated from it, click on the
Export Files Pulldown.
Assay Action buttons
-
Export Files Pulldown
-
Delete Data Icon
-
Delete Session Icon
-
Commit Button
Export Files Pulldown
Clicking the Export Files Pulldown allows you to download various files.
If you have submitted a CSV data file, you can download it or the parsed
XML or ASN file that we create from it. You can also download the description
only as an XML or ASN file.
Delete Data Icon
Clicking the Delete Data Icon enables you to delete the attached data file so
that you can resubmit it or go backwards to edit your description or delete
the action from the Deposition System. The Delete Data Icon is required for
going backwards in the deposition process. Also, remember that deleting here
refers to the Deposition system only; this action will have no affect on
PubChem.
Delete Session Icon
Clicking the Delete Session Icon enables you to delete the current assay action
from the Deposition System's pending list. This action will have no affect on
assays in PubChem.
Commit Button
Clicking the Commit Button enables you to deposit the submission in PubChem.
Your submission will be reviewed and, if approved, will be made public in
the PubChem data system.
2.3.6 Assays > Deposited in PubChem Tab
This tab gives you a history of all assay actions taken by the Deposition Gateway
which successfully affected PubChem. Each action will be listed on one
line. This means that for one PubChem assay it will have a line for when
the assay was first created. It could have additional lines for when it
was modified, either by adding more data or by modifiying its description.
Clicking on a row will take you to the corresponding entry in PubChem.
The column headings can be used to sort the table. For example, clicking on the
column "PC-Aid" will sort the table by that id. Clicking this
column header a second time will reverse the order of the sort.
PC-AID
The permanent assay id number assigned to an assay in PubChem (not the
temporary assay-id used in the Deposition Gateway).
Version
The version of your assay in PubChem upon completion of this action.
This number will increase when you make significant changes to your
description; please see the
Replace assay tab for details.
Revision
The revision of your assay in PubChem upon completion of this action.
In general this number will increase when you make small changes to your
description; please see the
Alter description tab for details.
Action
One of four types of actions you can perform on PubChem assays.
-
New assay
-
Add/Change data (Modify)
-
Alter description (Modify)
-
Replace assay (Modify)
Started
The date and time on which your assay operation began.
Finished
The date and time on which your assay operation entered PubChem.
User
The person who initiated this assay action. This is useful for accounts
which have multiple users.
nRecords
The total number of tested substances uploaded in this action.
nRevoked
The number of substances marked to be revoked for this assay in this action.
Datafile
The name of your uploaded datafile. Note that this will be blank for the
Alter Description action.
Curator
The person who handled your deposition. Unlike the other fields, this field
points to the curator's email address.
2.4 Account Info Tab
This tab allows you to manage your account preferences and contact information.
It creates a second row of the following tabs:
By default you are placed under the
Account Tab in the second row.
Multiple Users on One Account
It is now possible to create one deposition account that contains multiple
users, each having their own login and password. For an overview of the
process, click here.
Views
The Views box on the lower left side lets you pick appropriate views of
your account information relevant to the second-row tab that you are under.
A detailed explanation of the various views available are discussed under
the sections explaining each of the second-row tabs. To read more, please
find the View you have questions about.
2.4.1 Account Info > Account Tab
This tab puts you by default into the "View Account" View that
allows you to manage your account information.
For an explanation of individual fields under this tab, please see the
appropriate test or
deposition account description.
View Account View
This View gives you read-only access to your account information.
Edit Account View
This View allows you to modify some of your account information. If there
is information you need to update, but the field cannot be edited, please
contact the
PubChem Deposition Help Desk. After you make all desired changes, be sure to
press the "Update" button to commit your changes.
2.4.2 Account Info > Contacts Tab
The Contacts Tab is only available for the primary user of a deposition
account (i.e. the user who first opened the account for your data source).
List Contacts View
This view displays a summary of contact information for the "Primary
Contact" (primary user) and below that one row for each of the
"Additional Contacts". Clicking on the Primary Contact row takes
you back to the
Account tab. Clicking on a row of
the Additional Contacts takes you to the
View Contact view for that
contact.
The contacts listed include information from the following columns:
-
Full Name - Full name of the contact
-
Email - E-mail address of the contact
-
Title - Contact's title within the data source organization
-
Phone - Phone number of the contact.
-
Notify - Should this contact be sent
deposition status e-mails.
Note that at the end of each Additional Contacts row is a "Delete"
link. Use this very carefully as it will remove one of your organization's
users from our system.
Add Contact View
This view allows you to add a contact for this deposition account.
Please fill in all fields as completely as possible. You must fill
in fields with a red "*".
Allow to login independently
The first checkbox on the "Add Contact" form determines whether
this new contact will be able to login independent of the primary user.
- Checked -
"Username" and "Password" fields will
appear. Ask the contact what username she would like; it must be unique
within our system. For the password, either have the contact fill it in
at your computer or set a temporary one and then the contact can change
it. For more information on having multiple users on your account, click
here.
- Unchecked - The contact can receive update notifications on
submissions as requested and his contact information is available for
reference, but he must use the primary login/password to get access
to the account.
After completing the contact information form, click on the
"Register" button.
View Contact View
This View gives you read-only access to the contact's account information.
Individual fields are defined just like the primary
deposition account.
Edit Contact View
This View allows you to modify the contact's account information. Note that
once a contact with independent login has been added, you should be very
careful to make any changes to their account information. Each user can make
his own changes. Also note that the primary user can change a contact's
password, but can not view it. Individual fields are defined just like the primary
deposition account.
Please note: If you uncheck the
Allow Login box for a contact that has an
existing login, both his login and password will be lost.
2.4.3 Account Info > Preferences Tab
This tab displays a few preferences that you can review and revise. As with
the other tabs, if you wish to make modifications, click on the "Edit
Preferences" View on the left.
Data Source Description Terms
One of the more powerful aspects of PubChem and its background search engine,
Entrez, is its categorization and linking of related data. This section
offers a list of terms to categorize the type of data you provide to PubChem.
Please choose at least one term (more than one is ok too). PubChem users
looking only for toxicology data, for example, will be able to limit their
search to those data sources, thereby making your data more accessible.
Auto-Confirm Substance & Assay FTP Depositions
This checkbox only applies to substance and assay depositions made via
FTP. If checked, all such substance and assay
(Alter description only) depositions will be
automatically confirmed on your end if they pass validation.
This means you will not have to click on the "Commit" button on the
user interface in such a case. The submission will still need to be reviewed
and approved by a PubChem curator, but one manual step will be eliminated.
Note that for assay depositions this automated process only applies to the
Alter description type of deposition.
Resolve Substance Names
If this box is checked and if no structure is provided in the substance record,
PubChem processing will attempt to use provided synonyms to auto-generate the
deposited compound structure. This processing includes the use of CID as synonym
(e.g. "CID1" will use the structure of CID 1 for the structure record),
matching synonym to MeSH (e.g. "Aspirin" will use the structure of CID 2244),
and name to structure software (e.g. "2-acetyloxybenzoic acid" will yield the
same structure as CID 2244).
Consider 3D Substance Coordinates as Experimental
If this box is checked, the depositor confirms that all 3D substance coordinates
supplied were experimentally-derived. If 3D coordinates were generated by a
computational algorithm, do not mark this box as it is not in the scope of the
PubChem database to display such information.
Include CIDs with Get SIDs Download Report
If this box under the Preferences tab is checked, an extra CID column will
be included at the end of the CSV file downloaded with the 'Get SIDS' link for
substance depositions.
Note: If you are currently looking at a Substance deposition, you can find this
checkbox by clicking on the Account Info tab and then the Preferences
tab. If you are not the primary user on your account, you will need to ask that
person to login and check this box.
Ignore Past Hold-Until dates for Substances
If this box under the Preferences tab is checked, any substance record
Hold-Until date set in the past will be stripped out and ignored for the sake of
versioning.
Avoid registry ID in list of chemical structure synonyms
If this box under the Preferences tab is checked, registry IDs will not be used as synonyms, so they
will not get used as preferential names for records.
Use outside RNAi substance provider
If this box under the Preferences tab is checked, RNAi assay depositors are able to use substance records from an
outside RNAi provider in addition to their own deposited substance records.
View Preferences View
This View gives you read-only access to your preferences.
Edit Preferences View
This View allows you to modify your preferences. After you make all desired
changes, be sure to press the "Update" button to commit your changes.
Add Icon
Clicking the Add Icon allows you to add a secondary contact for a deposition
account. Please fill in all fields as completely as is possible. You must fill
in all fields with a red "*". After completing the contact
information form, please push the "Register" button.
2.5 Navigation Icons
2.5.1 Check Mark Icon
Clicking the "Checkmark" icon in the Main Navigation Bar spawns a new
web browser window and displays the PubChem Deposition Agreement in PDF format,
requires Adobe
Acrobat Reader to view. PubChem Depositors must (electronically) sign
this agreement prior to adding any data to PubChem.
2.5.2 Movie Man Icon
Clicking the "Movie Man" icon in the Main Navigation Bar spawns a new
web browser window and plays a movie, requires
Macromedia Flash Player plug-in to view, within that window
demonstrating the use of the PubChem deposition system.
2.5.3 Question Mark Icon
Clicking the "Question Mark" icon in the Main Navigation Bar spawns a
new web browser window and displays the PubChem Deposition Gateway help
document. You can learn about the various features of the deposition system by
exploring this document.
2.5.4 Person Icon
Clicking the "Person" icon in the Main Navigation Bar prompts you if
you would like to log out of the PubChem Deposition Gateway.
3. PubChem Deposition Gateway FAQ's
Q: I uploaded my file, now what?
A: The PubChem Deposition Gateway will parse and validate the data you
submitted. You can watch this process proceed, or you may submit another file
or logout and come back later. When this processing is complete, as denoted by
the submission status bar or by receipt of an e-mail, you may want to review
the submission. If you have a Test Account, and the submission proceeded
without error, you have successfully tested your data and can be assured that
your data is ready for use with the PubChem Deposition System. If you have a
Deposition Account, and the submission proceeded without errors, you may commit
your data to PubChem by pressing the "Commit" button.
Q: Can I supply an additional datasource URL as well as my datasource URL? Can I
supply an additional substance URL as well as my substance URL?
A: You have two URL's per substance. One URL is associated with your data
source name and the other is associated with your unique registry ID. Beyond
that, you would need to use the Entrez "link-out" mechanism that can
"piggy-back" URL's on your (or anyone's) substances.
Q: Can I supply multiple lines of additional searchable text per Substance?
A: All additional information should be put in the comments
("PUBCHEM_SUBSTANCE_COMMENT") section of the SD file. You can have as
many lines as you need there. You could also put URL's there, too.
Q: If I have new substance information available, how do I update PubChem?
Do I need to re-deposit the complete substance record (including the new information)
or can I just deposit the new information?
A: To update, please re-deposit the complete substance record, including the new information,
using the same registry identifier. Updates are versioned but only the most current data
will be readily visible, searchable, or downloadable. Please note that the revised record
will still have the same SID. Please also note that PubChem will not version substance
records if nothing has changed.
Q: If a substance ceases to be part of my data, how do I delete the record in
PubChem?
A: You will need to re-deposit the record such that it contains empty CTAB section,
the registry ID tag ("PUBCHEM_EXT_DATASOURCE_REGID "), and a revoke tag ("PUBCHEM_REVOKE_SUBSTANCE").
We suggest that you provide a comment (a line of text) with PUBCHEM_SUBSTANCE_COMMENT tag to designate why you
revoked the record, e.g. "Deprecated in favor of record ABC123". As an example we added a revoke record to the example
SDF file (last record).
Q: If I find a mistake in my PubChem Substance record, is it better to update or
remove my substance?
A: The best way is to "update" the substance record.
Q: After I get a deposition account, is my test account still active? Can I
still use it to test submissions?
A: Yes, the test account is still active. You may continue to use that account
for testing. Please be advised that test accounts will not allow you to deposit
data into PubChem. Deposition accounts will allow you to deposit data into
PubChem, after processing has successfully completed.
Q: We have various flags "nucleophile", "electrophile",
"yuck" that we are starting to attach to molecules in our deposition.
We'd like to send that data to PubChem in the most useful way possible. We
think of them as "properties". What is the best way to do that?
A: The substance/compound properties you mentioned above will go to PubChem's
"Comment". You can simply put them under sd tag
<PUBCHEM_SUBSTANCE_COMMENT>.
Q: If we have CAS registry numbers, is it best to put them in
<PUBCHEM_GENERIC_REGISTRY_NAME> or <PUBCHEM_SUBSTANCE_SYNONYM> ?
Does it matter?
A: The PubChem original design let user to put all "Registry" items
under <PUBCHEM_GENERIC_REGISTRY_NAME>. Since many depositors already put
the CAS numbers under their own synonyms field, those CAS numbers will
automatically go to the <PUBCHEM_SUBSTANCE_SYNONYM>. So it doesen't
matter and up to you to put them in which field.
Q: We are starting to collect annotations of compounds e.g. inhibits enzyme XYZ
with Ki=10uM using a spetrophotometric assay. Also, we annotate compounds as
being aggregators (non specific inhibitors) at a particular concentration. This
is starting to sound like something that overlaps with the PubChem Assay
database. So far we don't have a lot of data to upload, but that may soon
change. Can you advise us on the best way to send this data to you?
A: Yes. You are right. Such bio-data related to your substance will go to
PubChem BioAssay database.
Q: From time to time, compounds become depleted at various suppliers. We would
like to either A. indicate in the comment record that this supplier's stock is
depleted. OR B. remove a supplier from the comment record completely.
A: Once you update your record, we will archive all old version content. We
recommend you indicate in your comment.
Q: Is the only way to do this to upload the full SD record again, overwriting
the previous one? I think this is true, but wanted to make sure.
A: Yes.
Q: Do you want compounds that are depleted in PubChem? I figure the answer is
yes, because what you are really looking for is maximum coverage of chemical
space. So I'm thinking, why don't you just run a combichem/de novo design
program to enumerate millions of molecules, and then load them into PubChem?
Obviously, just chemical space isn't what you're after. Can you help me
understand the PubChem perspective on this issue?
A: PubChem substance database is depositors based. Every deposited substance
will be assigned an SID. PubChem compound database is a non-redundant,
structure unique database. Every compound in the database has a unique CID. If
substance(s) linked with this compound become depleted, the compound will be
deprecated/suppressed. We will keep all deprecated/suppressed compounds
archived, and compounds will be never depleted.
4. FTP Depositions & FAQ
FTP-based deposition provides a path for completely automated data upload into
PubChem. If you have a large amount of data to be uploaded into PubChem or if
you update your data on a daily or weekly basis, you may be a good candidate to
use the PubChem FTP deposition method.
To get started with FTP-based depositions, you must:
-
1. Have an approved deposition account
-
2. Have performed previous data uploads into PubChem
-
3. Request an FTP account from PubChem
Please note that an FTP account is independent of your PubChem deposition
account with different login credentials. The PubChem deposition account will
be configured to interact directly with data uploaded via FTP. The procedure to
create, setup, and configure your FTP account to interact with your PubChem
deposition account will take one or more business days.
Substance-based FTP Depositions
To deposit data for a Substance deposition via FTP, you must:
-
1. Upload a file using your FTP account
-
2. Name the file you upload with the suffix ".sdf.in" (or ".sdf.gz.in", if a
compressed file)
Please note that the file suffix lets the deposition system know your file is
intended to be a new Substance deposition. After the file is recognized as
being present, the file is transferred into the deposition system. There may be
a delay between completion of your FTP upload transfer and before your uploaded
file is processed, considering the deposition system processes FTP deposited
data at particular times of the day and may wait to verify that your transfer
is actually complete. FTP-based deposition processing begins when you notice
the ".sdf.in" (or ".sdf.gz.in") suffixed file is removed from your FTP account
directory and a status file is created. This status file has a suffix
".sdf.status". For example, if you upload the file "smid.sdf.gz.in", the status
file created will be "smid.sdf.status".
The status file informs you of the processing progress. The possible status
file contents and their meaning are listed below.
|
Status | Meaning |
| I |
Submitted |
| -P |
Parsing |
| !P |
Parsing Failed |
| P |
Parsed |
| -S |
Standardizing |
| !S |
Standardization Failed |
| S |
Standardized |
| -V0 |
Validating I |
| !V0 |
Validation I Failed |
| V0 |
Validated I |
| -V1 |
Validating II |
| !V1 |
Validation II Failed |
| V1 |
Validated II |
| C |
Committed for PubChem |
| A |
Approved for PubChem |
| R |
Rejected for PubChem |
| -D0 |
Uploading to PubChem |
| -D |
Depositing to PubChem |
| !D |
Depositing to PubChem Failed |
| D |
Deposited in PubChem |
After processing completes to the point of "Validated II", you will need to log
into the deposition system, review your submission, and then, if there are no
issues, commit your data to be loaded into PubChem. An auto-commit feature can
be requested, whereby the deposition commit step is performed on your behalf
automatically. This removes the necessity for you to login and commit your data
into PubChem. In many ways, FTP-based deposition is much like a normal
deposited file. You can login to your deposition account at any time to see the
progress of your deposition(s) or to get your SIDs. When processing is complete
and your data is loaded into PubChem, you will see the suffixed file ".sdf.err"
and, if all went well, the suffixed file ".sdf.out". The file with the
".sdf.out" suffix (e.g., "smid.sdf.out") is your report file containing your
PubChem Substance identifiers (SID's).
Please note that the ".sdf.out" log file is a CSV text file, easily read by
Excel or other spreadsheet applications. These files contain no column headers.
The columns are in a following order:
-
Data Source Name
-
External Registry ID
-
SID
-
SID Version
-
Load Code
The "Load Code" column values, described below, allow you to know or track the
substances that you have added, modified, or suppressed.
|
Load Code | Description |
| 0 |
substance load failed (internal error) |
| 1 |
existing substance replaced (internal use only) |
| 2 |
new substance created |
| 3 |
new substance version, PubChem structure same |
| 4 |
new substance version, PubChem structure changed |
| 5 |
no change, identical substance |
| 6 |
no change, but new PubChem structure (internal use only) |
| 7 |
substance revoked/suppressed |
| 8 |
substance is "on-hold" |
The presence of a non-zero length file containing the suffix ".sdf.err" (e.g.,
"smid.sdf.err"), will indicate that there was a problem with your uploaded file
and that your data may not be loaded into PubChem. The ".sdf.err" file will
contain a human readable text message explaining why the FTP uploaded file
failed. Please note that the status file is not deleted after processing and
publishing are completed. The final contents, if all went well, will be "D",
which will mean "Published".
FAQ
Q: How do I specify the URL associated with my data source name? Does PubChem
use the URL I provided for my deposition account at registration time? Or, does
PubChem use the URL specified using PUBCHEM_EXT_DATASOURCE_URL in the SDfile?
A: The URL specified when you created your deposition system account is used in
the PubChem source page and is associated with your data source name in the
PubChem sources display page, for example, the
BioCyc data source name. The URL provided in the SD tag
"PUBCHEM_EXT_DATASOURCE_URL" gives the organization (data source) URL per substance,
which is allowed to change from substance to substance for the same data source name
that you deposit.
Q: How do I deposit substance data using FTP?
A: You login to your private PubChem FTP account, upload your file(s), commit
your processed data, and check back to the FTP account for your load report
containing your SID's. Your uploaded substance deposition file must have the
suffixed extension ".sdf.in" or ".sdf.gz.in", if "gzip" compressed. After the
file is "recognized" by the PubChem deposition system, it will disappear from
your FTP account and you will see a file with the suffix ".status". The status
file will let you know at what processing stage your uploaded file is via a
code in the ".sdf.status" file. After the processing is completed and your data
is successfully loaded, there will be ".sdf.out" file containing your SID's.
Q: Is there any kind of report on the success or failure of the FTP uploaded
substance data?
A: Yes, you can examine the ".sdf.status" file, the ".sdf.err" file, or the
".sdf.out" file. The ".sdf.status" file contains the current status of the data
processing. The ".sdf.out" file contains a load report containing a list of
records and the load action taken. The ".sdf.err" file may contain a human
readable error as to why your FTP-based substance deposition failed.
Q: After we deposit our substances, will we get the SID's or CID's for linking
purpose? How do we get the SID's or CID's? Will you put it on the ftp site for
us to pick up?
A: After processing is completed and your substances are loaded into PubChem, a
".sdf.out" file will appear in your FTP account. This file will contain your
SID's that correspond to your registry ID's. CID's are not provided.
Q: How do I compose a URL to link back to PubChem from my website?
A: To generate a URL to link to your substance, for example, SID 2244, the URL
will be:
//pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=2244
We do not recommend linking directly back to a CID associated with your
substance using the "summary.cgi", as it may change as we change our preference
for different tautomer or resonance forms of a structure. You may "safely"
generate an URL to the CID associated with your substance via, for example,
SID 2244:
//www.ncbi.nlm.nih.gov/entrez/query.fcgi? cmd=PureSearch&db=pccompound&details_term=2244%5BSID%5D
Q: Is it possible to link back to PubChem without having to know the Substance
identifier (SID)?
A: Yes, you can compose an URL using your data source name and external
registry ID. For example, to make a like to the substance for the data source
name "NIAID" and the external registry ID "115500", the URL would be:
//www.ncbi.nlm.nih.gov/entrez/query.fcgi? cmd=PureSearch&db=pcsubstance&details_term= niaid%5Bsourcename%5D%20AND%20115500%5Bsourceid%5D
Q: If we send records with the previous source ID again, will the previous
record be overwritten?
A: Yes, the registry ID you provide is the key to the record. Whenever you send
us a record with that registry ID, we will interpret it as an update, replacing
the complete record with the information provided.
Assay-based FTP Depositions
Bioassay data depositions can be initiated via FTP in much the same way that substance
depositions can, but for assays you must additionally specify the type
of assay deposition operation.
To begin, follow the same three-step setup procedure as described for
substance FTP depositions. Note that you use the
same FTP account for depositing either substance or assay data.
Once your FTP account is setup, you should have the following directory structure
under your top level directory:
You must decide which of the four types of assay operations you want to perform
and place your file to be deposited into the appropriate directory highlighted above.
You should be familiar with performing these assay deposition operations
before trying them with FTP. For more information on them, see
here.
Assay FTP Deposition File Format
To upload any kind of assay data or description changes, a single XML or ASN.1 file
is required. This file must adhere to the specification for assays and be filled
out as appropriate. Search in the specification file (XML Schema,
ASN.1) for the tag PC-AssayContainer; this will
always be the outermost container for your assay, whether it contains description and data
or only description. You can find examples
of such XML files from the
PubChem public FTP site of bioassays. For assay deposition path-specific XML examples look at
Bioassay XML examples for FTP section. No CSV files are permitted using FTP.
You can also download templates of XML files from pending depositions that you are making
in the Deposition Gateway. You will need one file with both the data and description
filled out in the cases of new, data_only or replace_all operations.
For the alter_descr operation, only the description should be filled out. Let's now
reiterate these instructions by assay deposition operation:
- New assay deposition
For this operation you need to fill out new description information including a unique
aid-soure (RegId) and name for your assay, and assay data results. You can look at one
of your existing assays from the
PubChem FTP site for guidance.
Your assay FTP upload file goes in the directory /assay/new. XML example is here.
- Modify existing PubChem Assay
For these three operations, you are doing something to affect your current assay in PubChem.
Therefore, you need to specify the assay AID correctly so that it can be found. The best
way to do this is to first copy the XML file of your current assay and modify it as you wish.
In the following three types of Modify operations, we'll briefly mention what you
should change.
- Add/Change data without description change
For this operation, you should take a copy of your current assay's
XML file and replace
the data section with the data that you want to add, delete or modify. Be careful to
first remove all data or the system will think that you want to add that data again!
Also note that for this operation you must make no changes to the existing description.
Your assay FTP upload file goes in the directory /assay/modify/data_only.
XML example is here.
- Alter description
For this operation, you should take a copy of your current assay's
XML description file
without the data section and make your minor alterations. Any significant
changes, such as adding TID data result definitions, will result in an error.
Your assay FTP upload file goes in the directory /assay/modify/alter_descr.
XML example is here.
- Replace assay
For this operation, you should take a copy of your current assay's
XML file and make your
description and data section modifications. Please note that all of your existing data
for this assay in PubChem will be replaced by the contents of this uploaded file.
Your assay FTP upload file goes in the directory /assay/modify/replace_all.
XML example is here.
XML Validation against PubChem XSD Schema
To increase the efficiency of the data exchange for your Bioassay FTP
submission, PubChem highly recommends that depositors first validate XML files
before uploading them to the PubChem FTP site for processing.
XML validation will make sure that your file conforms to the PubChem Bioassay
specification and should help speedup the deposition time by isolating XML
errors. To check if your XML document conforms to the PubChem XSD Schema, the
XML document must be validated against that XSD Schema. You can find PubChem's
XSD schema
here.
One XML validator that you might use is xmllint which is often included
in standard Linux installations. To validate XML using xmllint one would run the
following Linux command:
xmllint --noout --schema "ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem.xsd" FileName.xml
Please be advised that PubChem does not support or maintain xmllint, but you
can find more information on it here.
Depositors may of course use any other equivalent XML package for validation.
Assay FTP Deposition Communication and Processing
As with substance FTP depositions, initial communication between you and our deposition system
occurs through files in your FTP directory using a naming convention. Your input XML/ASN.1 file
must end in ".in". Once that file is picked up by our system, it will try to process it and put
the status of the deposition in another file with the extension ".status". There will also be a
file ending in ".err" which will contain an explanation of any errors found. In some cases if a
processing error occurs right at the start, the deposition will not have a status yet and the
".status" file will be empty.
The status file informs you of the processing progress. The possible status
file contents and their meaning are listed below.
|
Status | Meaning |
| I |
Description created |
| U |
Data Submitted |
| -P |
Parsing Data |
| !P |
Data Parsing Failed |
| P |
Parsed |
| -V |
Validating |
| !V |
Data Validation Failed |
| V |
Data Validated |
| C |
Committed for PubChem |
| A |
Approved for PubChem |
| R |
Revise for PubChem |
| -D |
Depositing to PubChem |
| !D |
Depositing to PubChem Failed |
| D |
Deposited in PubChem |
After processing completes to the point of "Validated", you will need to log
into the deposition system, review your submission, and then, if there are no
issues, commit your data to be loaded into PubChem. An auto-commit feature can
be requested, whereby the deposition commit step is performed on your behalf
automatically. This removes the necessity for you to login and commit your data
into PubChem. In many ways, FTP-based deposition is much like a normal
deposited file. You can login to your deposition account at any time to see the
progress of your deposition(s).
Once you have resolved any processing errors that might come up, your assay will proceed to the
validated stage. At this point, you can switch to the Deposition Gateway web interface and view
your deposition. This gives you more interactive information about your deposition and is
necessary for you to confirm the validity of your new assay or changes to your existing assay.
From the validated stage you will no longer need the FTP system.
5. PubChem Deposition Documents and Examples

Specifications
Examples
-
Variations of acceptable substance entries
(SDF)
-
BioAssay examples for web-based Deposition System
-
Substances not yet deposited in PubChem
Submit SD file, then submit
non-panel bioassay testing results (includes all three files):
SD file,
Description file,
Data CSV file
-
Substances already deposited in PubChem
2.1. Submit non-panel bioassay testing results
(includes description and CSV files):
Description file,
Data CSV file
2.2. Submit panel bioassay testing results (2 options):
a) single XML file containing description with panel members and single CSV file for data
Description file with panel members,
Data CSV file
b) single XML file for description, CSV file for panel members, and CSV file for data
Description file,
Panel CSV file,
Data CSV file
-
BioAssay Description via Spreadsheet (Web Only)
-
Small Molecule deposition:
1.1. Series of CSV files which are progressively loaded via the Web:
General fields of Description,
Result Definitions of Description,
Cross References (XRefs) of Description,
Target of Description,
Categorized Comments of Description
1.2. Excel file (.xlsx or .xls)
containing multiple sheets loaded via the Web
1.3. OpenOffice Spreadsheet file
containing multiple sheets loaded via the Web
-
RNAi deposition:
1.1. Spreadsheet example from PubChem BioAssay
1904:
Excel file of description
containing multiple sheets loaded via the Web
CSV file of substance information
1.2. Series of CSV files which are progressively loaded via the Web:
General fields of Description,
Result Definitions of Description,
Cross References (XRefs) of Description,
Target of Description,
Categorized Comments of Description
1.3. Excel file (.xlsx or .xls)
containing multiple sheets loaded via the Web
1.4. OpenOffice Spreadsheet file
containing multiple sheets loaded via the Web
-
BioAssay XML examples for FTP depositions
-
New assay deposition: XML file, Panel XML file
-
Altering assay description: XML file
-
Replacing assay: XML file, Panel XML file
-
Adding/changing data of assay: XML file
-
PubChem Course Examples
- Confirmatory Assay: Description, Data
- Primary Assay: Description, Data
- Summary Assay: Description, Data
- Dose-Response Series Assay: Description, Data
|