PubChem Assay Tags

PUBCHEM_EXT_DATASOURCE_REGID

The external assay identifier assigned by you, the assay owner. This must be unique amongst all of your PubChem assays.

PUBCHEM_ASSAY_NAME

A short, informative name of the assay for display purposes.

PUBCHEM_ASSAY_DESCRIPTION

A definition of the assay purpose and parameters.

PUBCHEM_ASSAY_PROTOCOL

The protocol used to generate the assay. This might include an explanation of how the Activity Outcome and Score values in the Assay Data were determined.

PUBCHEM_ASSAY_COMMENTS

Additional information that might not fit in the Description or Protocol sections.

CATEGORIZED COMMENTS

These are Tag-Value pairs which provide a convenient place-holder for the definition of submitter-defined ontologies or other definitions outside the scope of the PubChem specification. All such comments will be searchable in PubChem. The tutorial includes an example of Tag-Value pairs.

TARGET

For any assay designed to identify chemicals interacting with a target such as an enzyme inhibitor, please specify the sequence identifier here. For a chemical assay, the target is typically a protein, but it can also be a gene, nucleotide or pathway id. Cell-based assays can skip this field.

Note that only 1 or a couple of targets should be identified here. If you have something like an RNAi assay, target definitions which change for each tested result should be specified in a column within the Assay Data.

X-REFERENCES

Cross-references (XRefs) can be made to many NCBI database records related to your assay. This includes other PubChem BioAssays either by AID number or RegId, but it also includes PubMed Ids, Taxonomy Ids, and many other databases.

Note, please do not duplicate the protein identifier if used in the Target section.

ASSAY DATA

For most assay submissions, the assay data contains the actual data reported for each tested substance. The submitter may define as many columns as desired reporting numerical values, such as IC50s and Percent Inhibition, but also labels and database identifiers.

One column of activity outcome values must be reported to give a submitter-defined judgment call on whether each row should be considered inactive (1) or active (2).

Read more about our assay data specifications and file formats.

PANEL INFO

PubChem expands the bioassay data model to support the presentation and annotation of profiling screening results.

Panel assays are very complex in nature and we have tried to make the interface as user-friendly as possible. Please remember, however, that extra attention should be paid to panel assay definitions and data to ensure their accuracy.

To see a panel assay example, please take a look at this kinase profiling assay.

PUBCHEM_SUBSTANCE_TYPE

Please indicate the type of substances tested in your assay to help categorize assays into chemical and RNAi types, for example.

PUBCHEM_ACTIVITY_OUTCOME_METHOD

Classify your assay by how the activity outcome was defined. Choices include:

Screening assay - Single concentration activity observed.

Confirmatory assay - Concentration-Response Relationship Observed (EC50,IC50,etc.)

Summary assay - Overview of and links to multiple, related assays.

Other - Assay does not fall into the above categories.

PUBCHEM_PROJECT_CATEGORY

Classify your assay if a specialized project was used for its creation. If none of these apply to you, please choose 'Other'.

Literature, Extracted - Select if assay data extracted from literature by 3rd party (not by author or article publisher).

Literature, Author / Publisher - Select if assay data extracted from article by author or by publisher.

RNAi Global Initiative - Select if work is from a member of the RNAi Global Initiative.

Assay Vendor - Select if contributed by an assay service provider.

NIH Molecular Libraries - Select if an assay experiment was funded by the relevant NIH Molecular Libraries program.

PUBCHEM_GRANT_NUMBER

A grant number can be specified. Note that this string is not validated.

PUBCHEM_ASSAY_GROUP

A label to be added to multiple assays for the purpose of logically grouping them.

PUBCHEM_HOLD_UNTIL_DATE

Optional hold-until date to delay public access of assay data in PubChem. This may be useful, for example, to coordinate release of data with a journal publication.

Note that your access to the data will be restricted until that date via your PubChem Upload account.

Please consult our help documentation about how best to manage public exposure.

PUBCHEM_SID

If you have previously deposited your Substance description into PubChem, you may use your Substance identifier (SID) assigned by PubChem. This must be an unsigned integer value and, in nearly all cases, your organization must have deposited the Substance associated with this SID.

PUBCHEM_EXT_DATASOURCE_REGID

You may use your own identifier for Substance descriptions previously loaded into PubChem.

PUBCHEM_ACTIVITY_OUTCOME

This field allows the submitter to make an expert judgment call about the activity of each test result. Using a number, the value is set to 1 (inactive) or 2 (active) based on whatever means appropriate. An explanation of that determination should be provided in the Protocol or Comments section of the Assay Description.

In addition to active/inactive, this field can also be set to 3 (inconclusive), 4 (unspecified) or 5 (probe). The 'probe' designation indicates that the activity of the test result has been tested and confirmed though multiple rounds of experimental inquiry.

PUBCHEM_ACTIVITY_SCORE

The activity of a test result may be assigned a normalized score between 0 and 100 where the most active result rows have scores closer to 100 and inactive closer to 0, so that one can rank the result based on this data and prioritize hits.

PUBCHEM_ACTIVITY_URL

An URL may optionally be provided for Assay Data reported for this Substance in this column. This URL will be provided within PubChem displays to allow a PubChem user to link to your website, where you may choose to provide additional information or interfaces to your Assay Data, for example, dose-response curves, replicate data, etc.

PUBCHEM_ASSAYDATA_COMMENT

Your textual annotation and comments may optionally be provided for Assay Data reported for this Substance in this column.

PUBCHEM_ASSAYDATA_REVOKE

When you submit the data you must leave this blank or put a value '0' in this column. You may optionally suppress Assay Data for this Substance by putting a value of "1" in this column. In this case, leave all other columns blank except for Column 1: PUBCHEM_SID. Suppressing Assay Data does not delete data from PubChem, rather it eliminates all references and links to this information; however, all pre-existing links to this information will still function and a disclaimer will be displayed specifying this data is revoked.

You may un-revoke Assay Data for a Substance by depositing either the same or new data for this Substance. Do not revoke and submit the same substance in the same file.

CUSTOM

Define your own result definition here, one per column. You must give it a name and you can also specify parameters like the data type and unit. For example if you want to report an EC50, you can name it "EC50", set the data type to "FLOAT" and the unit to "MICROMOLAR".

General Description Items

DESCR_TAG

A table column header for general description tags.

DESCR_VALUE

A table column header for general description values.

Result Definitions Items

PUBCHEM_RESULT_TAG

This header goes in the first row, first column of the spreadsheet. Immediately under it are optional tags to define properties of result definitions, such as RESULT_UNIT. In all data rows below that, this column contains an increasing number starting from one.

RESULT_TYPE

The result type typically is either a Float, Integer, Boolean or String.

Optionally, the type can be used to specify an identifer, such as one coming from another NCBI Entrez database. For example, if PubMed Id is chosen as the type, then all data values in this column will be checked to ensure that they are valid PubMed identifiers.

RESULT_UNIT

Various units are available to better define the measurement of a given result column.

RESULT_DESCR

An optional description to explain what is being measured for a given result column.

RESULT_ATTR_CONC_MICROMOL

An optional micromolar concentration at which this result was tested. This attribute implies that the result is biological concentration-response data.

RESULT_CONC_RESPONSE_SERIES_ID

For confirmatory assays, an optional id starting from 1 to group columns into series for defining dose-response curves. If one series is defined, all columns in that series will have a '1' in this field. A second series would use a '2' and so forth.

RESULT_IS_ACTIVE_CONCENTRATION

For confirmatory assays, this column allows an optional "1" for the one result column that summarized the active concentration. This is typically reported as an IC50, EC50, AC50, GI50 etc. or by reporting constant parameters such as Ki

Categorized Comments Items

CAT_COMMENT_TAG

A submitter-defined tag to define a categorized comment. This tag column must appear as the first column of the spreadsheet.

CAT_COMMENT_VALUE

The value of a submitter-defined categorized comment.

Target Items

TARGET_TYPE

The database type of target identifier supplied.

TARGET_ID

The required, first table column header for target data. The values in this column are the actual primary identifiers from one of the accepted databases.

TARGET_NAME

The optional name of the target. If left blank, a standard name from the sequence database will be used where possible.

TARGET_DESCRIPTION

Any additional description of the target beyond its name.

TARGET_COMMENT

Any additional comments or annotations for the target.

XREF Items

XREF_TYPE

The database type of XRef identifier supplied. This type column must appear as the first column of the spreadsheet.

XREF_VALUE

The actual identifer value from the cross-referenced NCBI database.

XREF_ANNOTATION

An explanatory text describing the relevance of this cross-referenced item to the assay.

PANEL Items

RESULT_PANEL_ID (Required)

Integer (from 1) that is the same for one or more result description columns, thereby grouping them together. Alternatively, a panel type can also be added after the number, like 1_REGULAR, 1_OUTCOME, 1_SCORE or 1_AC. By default, a plain integer is interpreted as the regular type.

RESULT_PANEL_NAME (Optional)

Short name of panel component.

RESULT_PANEL_DESCR (Optional)

Short description about specifics of panel component, such as about cell line, or target information.

RESULT_PANEL_PROTOCOL (Optional)

Specific procedure used to generate results for the panel.

RESULT_PANEL_COMMENT (Optional)

Additional information.

RESULT_PANEL_TARGET_NAME (Optional)

Not necessary to provide - this will be filled in automatically unless you provide a value.

RESULT_PANEL_TARGET_ID (Optional)

This is mandatory if any of the target fields are present.

RESULT_PANEL_TARGET_TYPE (Optional)

This is mandatory if any of the target fields are present. It is an integer: Protein(1), DNA(2), RNA(3), Gene(4), BioSystems(5).

RESULT_PANEL_TAXONOMY (Optional)

NCBI Taxonomy-id (integer).

RESULT_PANEL_GENE (Optional)

NCBI Gene-id (integer).

RESULT_PANEL_ACT_OUTCOME_METHOD (Optional)

Assay outcome qualifier (integer). Choices include screening (1), confirmatory(2), summary(3) and other(0).