Steps to downloading:
1) Perform a search in PC-Substance or PC-Compound Entrez.
2) From the Tools buttons near the top of the Entrez results page, select
Structure Download (an icon representing a local disk).
3) This will take you to a format selection page. There are two menus, one to choose the data format:
Text ASN.1: ASN.1 is PubChem's native data format; this is the record as it exists in our database. The text flavor of ASN.1 is both computer and human readable (to some extent), but is not an industry-supported type of ASN.1 data - so ASN.1 parsing libraries other than NCBI's may not be able to read it. The ASN.1 specification for this data is available on the PubChem FTP site.
Binary ASN.1: This is industry-standard ASN.1 in binary (non-human readable) format. Any ASN.1 parsing library should be able to read this format. The ASN.1 specification for this data is available on the PubChem FTP site.
XML: This data is exactly equivalent to the ASN.1 data, but is in standard XML format. The XML schema for this data is available on the PubChem FTP site.
SDF: This data is in standard SDF format, converted from the original ASN.1. A full description of all the SD tags used to present PubChem records is available on the PubChem FTP site.
Image / Small Image: Retrieves the images used in PubChem web pages, in large (currently 300x300) or small (100x100) size. The data is always returned in PNG format, stored as SID/CID-numbered files in a ZIP archive, regardless of
compression selection below.
SMILES: Retrieves the isomeric SMILES description of the records. The format is a text file, where each line
contains SID/CID - [tab] - SMILES string.
InChI: Retrieves the InChI description of the records (see http://www.iupac.org/inchi). The format is a text file, where each line
contains SID/CID - [tab] - InChI string.
A second menu lets you choose the compression for the resulting data file:
GZip: This is the default and recommended compression, as it is recognized by most modern decompression
applications. Information on GZip (.gz) is available at www.gzip.org.
BZip2: These files are slightly smaller, but the format is not as widely used and takes a little longer
to decompress. Information on BZip2 (.bz2) is available at www.bzip.org.
None: No compression.
If 3D coordinates for the records are desired, then select the "Use 3D" checkbox. Note that this affects only the following download types. Substances with deposited 3D coordinates will always be returned in 3D. Any records that do not have 3D information will be omitted from the download.
Substance images: When the deposited form of the substance has 3D coordinates supplied by the depositor, this will change the images to a 3D rendering of these coordinates.
Compound images: When the compound has computed 3D coordinates, this will change the images to a 3D rendering of these coordinates.
Compound records: When the compound has computed 3D coordinates and the requested format is one that includes coordinate information, this will change the coordinates to 3D. Multiple conformers of each CID may be requested, though not all compounds may have that many conformers available.
4) Press the Download button to begin the download process. Because the records are being retrieved directly from the PubChem database, it is necessary to queue download requests in order to prevent server overload. You will see a series of self-refreshing pages during this process.
In particular, the Queue status shows what's happening:
Waiting: There are requests in the queue already, and this job is waiting for its turn.
Running: This job's turn has come, and the download file is being prepared.
Done: The request has been completed.
You do not have to keep your browser open on this page the entire time; you can bookmark this status page and come back to it later to check your request's progress, anytime within 24 hours of the initial request.
5) When the download is finished, your file should start transferring automatically.
You can also download by FTP from the given URL link - either directly through your browser or with any FTP client. Your file will remain on the FTP site for at least a week.
It is now possible to download directly without going through Entrez. Simply navigate to the download service URL
(https://pubchem.ncbi.nlm.nih.gov/pc_fetch), select a database, and supply a list of IDs. These should be SIDs
for PubChem Substance or CIDs for PubChem Compound, and one may either
enter them in the web page form or upload a local file of IDs.
The IDs may be integers separated by any combination of white space, comma, or semicolon.
One may also choose from a list of prior Entrez searches, if available, but note that the history item selection
must match the database selection.
The rest of the download operation then proceeds as described above.
Note that these additional inputs will not appear in the web form when downloading from Entrez.
The Save Job button produces an XML data structure that may be used with PUG, or as a model for constructing PUG download requests.
See https://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html for
more information on accessing PubChem through PUG.