Archived Sequence Data
Raw 16S rRNA data for each sample (before removing primers) is available on figshare at doi:10.25573/data.14686665. Raw ITS data for each sample (before removing primers) is available on figshare at doi:10.25573/data.14686755.
All trimmed sequence data (primers removed) is deposited at the European Nucleotide Archive (ENA) under the study accession number PRJEB45074 (ERP129199). The trimmed 16S rRNA data (primers removed) are deposited under sample accession numbers ERS6485270-ERS6485284. The trimmed ITS data (primers removed) are deposited under sample accession numbers ERS6485285-ERS6485299.
Pipeline Data
Data for each individual pipeline are available through the Smithsonian figshare under a single collection at doi:10.25573/data.c.5667571. In addition, data from each pipeline are available for download from figshare using the links at the bottom of each workflow page (where applicable).
Submitting Sequence Data
We submitted out data to the European Nucleotide Archive (ENA). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number PRJEB45074 (ERP129199). The RAW files on our figshare site (see above).
To submit to the ENA you need two data tables (plus your sequence data). One file describes the samples and the other file describes the sequencing data.
You can download our submission data tables here:
Note, these forms are study specific, so please use these as guides only.
Instructions for Submitting to the ENA
Even though I have done this dozens of times, the process of submitting sequence data to read archives still baffles me.
Note: I submit data with primers, barcodes, etc removed
Register Project & Upload Sample Data
- go to https://www.ebi.ac.uk/ena/submit and select Submit to ENA Interactively.
- Login or Register.
- Go to New Submission tab and, if this is a new project, select Register study (project).
- Hit Next
- Enter details and hit Submit.
- Next, Select Checklist. This will be specific to the type of samples you have and basically will create a template so you can add your sample metadata. For this study I chose GSC MIxS soil, checklist accession number ERC000022
- Next
- Now go through and select/deselect fields as needed. Note, some fields are mandatory.
- Once finished, hit Next to fill in any details that apply to All samples and the download the template. Alternatively, you can download the template and fill in the data by hand.
- Upload the sample sheet.
- Once everything looks good and uploaded, click the New Submission tab.
Upload Sequence Data
- Hit Skip and then select Two Fastq files (Paired), Download the template and fill in the details.
- Next, make sure all the trimmed fastq files are gzipped .gz (these are what you submit).
- Navigate to the directory with the trimmed, compressed fastq files and run:
md5sum *.gz
- Add the checksums and file names to the fastq submission form. You can read more about Preparing A File For Upload here.
STOP
Before uploading the sheet, you first must upload the fastq files. Documentation for Uploading Files To ENA can be found here.
- In the same directory, run:
lftp webin2.ebi.ac.uk -u Webin-XXXXX
where Webin-XXXXX
is your user name. Enter your password
at the prompt run:
mput *.gz
# when finished
bye
The files should begin uploading. Depending on internet speed and/or file sizes/numbers, this could take a while.
- Once the upload is finished, upload and submit the fastq submission form. If the
sample alias
field dows not autofill you may need to upload the sample form again.
Source Code
The source code for this page can be accessed on GitHub by clicking this link.
Last updated on
[1] "2022-06-29 07:31:27 EST"