SWELTR - Data Availability

Archived Sequence Data

Raw 16S rRNA data for each sample (before removing primers) is available on figshare at doi:10.25573/data.14686665. Raw ITS data for each sample (before removing primers) is available on figshare at doi:10.25573/data.14686755.

All trimmed sequence data (primers removed) is deposited at the European Nucleotide Archive (ENA) under the study accession number PRJEB45074 (ERP129199). The trimmed 16S rRNA data (primers removed) are deposited under sample accession numbers ERS6485270-ERS6485284. The trimmed ITS data (primers removed) are deposited under sample accession numbers ERS6485285-ERS6485299.

Pipeline Data

Data for each individual pipeline are available through the Smithsonian figshare under a single collection at doi:10.25573/data.c.5667571. In addition, data from each pipeline are available for download from figshare using the links at the bottom of each workflow page (where applicable).

Show descriptions and DOI’s for each workflow

Scott, Jarrod (2022): (high-temp/16s) No 0. Raw 16S rRNA fastq data files for the high-temp SWELTR study. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.14686665.v1

Scott, Jarrod (2022): (high-temp/ITS) No 0. Raw ITS fastq data files for the high-temp SWELTR study. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.14686755.v1

Scott, Jarrod (2022): (high-temp) No 1. DADA2 Workflow (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.14687184.v1

Scott, Jarrod (2022): (high-temp) No 2. Data Preparation (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.14690739.v1

Scott, Jarrod (2022): (high-temp) No 3. Filtering: (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.14701440.v1

Scott, Jarrod (2022): (high-temp) No 4. Taxonomic diversity: (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.16826737.v1

Scott, Jarrod (2022): (high-temp) No 5. Aplha diversity (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.16826779.v1

Scott, Jarrod (2022): (high-temp) No 6. Beta diversity (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.16828063.v1

Scott, Jarrod (2022): (high-temp) No 7. Differentially Abundant ASVs & Taxa (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.16828249.v1

Scott, Jarrod (2022): (high-temp) No 8. Metadata Analysis (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. https://doi.org/10.25573/data.16828294.v1

Submitting Sequence Data

We submitted out data to the European Nucleotide Archive (ENA). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number PRJEB45074 (ERP129199). The RAW files on our figshare site (see above).

To submit to the ENA you need two data tables (plus your sequence data). One file describes the samples and the other file describes the sequencing data.

You can download our submission data tables here:

Note, these forms are study specific, so please use these as guides only.

Instructions for Submitting to the ENA

Even though I have done this dozens of times, the process of submitting sequence data to read archives still baffles me.

Note: I submit data with primers, barcodes, etc removed

Register Project & Upload Sample Data

go to https://www.ebi.ac.uk/ena/submit and select Submit to ENA Interactively.
Login or Register.
Go to New Submission tab and, if this is a new project, select Register study (project).
Hit Next
Enter details and hit Submit.
Next, Select Checklist. This will be specific to the type of samples you have and basically will create a template so you can add your sample metadata. For this study I chose GSC MIxS soil, checklist accession number ERC000022
Next
Now go through and select/deselect fields as needed. Note, some fields are mandatory.
Once finished, hit Next to fill in any details that apply to All samples and the download the template. Alternatively, you can download the template and fill in the data by hand.
Upload the sample sheet.
Once everything looks good and uploaded, click the New Submission tab.

Upload Sequence Data

Hit Skip and then select Two Fastq files (Paired), Download the template and fill in the details.
Next, make sure all the trimmed fastq files are gzipped .gz (these are what you submit).
Navigate to the directory with the trimmed, compressed fastq files and run:

md5sum *.gz

Add the checksums and file names to the fastq submission form. You can read more about Preparing A File For Upload here.

STOP

Before uploading the sheet, you first must upload the fastq files. Documentation for Uploading Files To ENA can be found here.

In the same directory, run:

lftp webin2.ebi.ac.uk -u Webin-XXXXX

where Webin-XXXXX is your user name. Enter your password

at the prompt run:

mput *.gz
# when finished
bye

The files should begin uploading. Depending on internet speed and/or file sizes/numbers, this could take a while.

Once the upload is finished, upload and submit the fastq submission form. If the sample alias field dows not autofill you may need to upload the sample form again.

Source Code

The source code for this page can be accessed on GitHub by clicking this link.

Last updated on

[1] "2022-06-29 07:31:27 EST"

--- title: "Data Availability" description: | On this page, you can find information on obtaining sequence data, data products, and processing scripts. Code is embedded in indivudual workflows on the website or can be accessed through the project GitHub repo. --- ## Archived Sequence Data Raw **16S rRNA** data for each sample (before removing primers) is available on figshare at [doi:10.25573/data.14686665](https://doi.org/10.25573/data.14686665). Raw **ITS** data for each sample (before removing primers) is available on figshare at [doi:10.25573/data.14686755](https://doi.org/10.25573/data.14686755). All trimmed sequence data (primers removed) is deposited at the European Nucleotide Archive (ENA) under the study accession number [PRJEB45074 (ERP129199)](https://www.ebi.ac.uk/ena/browser/view/PRJEB45074). The trimmed **16S rRNA** data (primers removed) are deposited under sample accession numbers [ERS6485270-ERS6485284](https://www.ebi.ac.uk/ena/browser/view/ERS6485270-ERS6485284). The trimmed **ITS** data (primers removed) are deposited under sample accession numbers [ERS6485285-ERS6485299](https://www.ebi.ac.uk/ena/browser/view/ERS6485285-ERS6485299). ## Pipeline Data Data for each individual pipeline are available through the Smithsonian figshare under a single collection at [doi:10.25573/data.c.5667571](https://doi.org/10.25573/data.c.5667571). In addition, data from each pipeline are available for download from figshare using the links at the bottom of each workflow page (where applicable). </br> :::{.callout-note collapse="true" icon=false} ## **Show** descriptions and DOI's for each workflow Scott, Jarrod (2022): (high-temp/16s) No 0. Raw 16S rRNA fastq data files for the high-temp SWELTR study. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.14686665.v1](https://doi.org/10.25573/data.14686665.v1) Scott, Jarrod (2022): (high-temp/ITS) No 0. Raw ITS fastq data files for the high-temp SWELTR study. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.14686755.v1](https://doi.org/10.25573/data.14686755.v1) Scott, Jarrod (2022): (high-temp) No 1. [DADA2 Workflow (16S rRNA/ITS)](https://sweltr.github.io/high-temp/dada2.html) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.14687184.v1](https://doi.org/10.25573/data.14687184.v1) Scott, Jarrod (2022): (high-temp) No 2. [Data Preparation](https://sweltr.github.io/high-temp/data-prep.html) (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.14690739.v1](https://doi.org/10.25573/data.14690739.v1) Scott, Jarrod (2022): (high-temp) No 3. [Filtering](https://sweltr.github.io/high-temp/filtering.html): (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.14701440.v1](https://doi.org/10.25573/data.14701440.v1) Scott, Jarrod (2022): (high-temp) No 4. [Taxonomic diversity](https://sweltr.github.io/high-temp/taxa.html): (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.16826737.v1](https://doi.org/10.25573/data.16826737.v1) Scott, Jarrod (2022): (high-temp) No 5. [Aplha diversity](https://sweltr.github.io/high-temp/alpha.html) (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.16826779.v1](https://doi.org/10.25573/data.16826779.v1) Scott, Jarrod (2022): (high-temp) No 6. [Beta diversity](https://sweltr.github.io/high-temp/beta.html) (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.16828063.v1](https://doi.org/10.25573/data.16828063.v1) Scott, Jarrod (2022): (high-temp) No 7. [Differentially Abundant ASVs & Taxa](https://sweltr.github.io/high-temp/da.html) (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.16828249.v1](https://doi.org/10.25573/data.16828249.v1) Scott, Jarrod (2022): (high-temp) No 8. [Metadata Analysis](https://sweltr.github.io/high-temp/metadata.html) (16S rRNA/ITS) Output. Smithsonian Tropical Research Institute. Dataset. [https://doi.org/10.25573/data.16828294.v1](https://doi.org/10.25573/data.16828294.v1) ::: ## Submitting Sequence Data We submitted out data to the [European Nucleotide Archive (ENA)](https://www.ebi.ac.uk/ena). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number **PRJEB45074 (ERP129199)**. The RAW files on our figshare site (see above). To submit to the ENA you need two data tables (plus your sequence data). One file describes the samples and the other file describes the sequencing data. You can download our submission data tables here: * [Sample submission form](include/data-availability/ENA_Sample_SUBMISSION_SWELTR-high-temp.tsv) * [FASTq submission form](include/data-availability/ENA_PAIRED_FASTQ_SUBMISSION_SWELTR-high-temp.tsv) Note, these forms are study specific, so please use these as guides only. ### Instructions for Submitting to the ENA Even though I have done this dozens of times, the process of submitting sequence data to read archives still baffles me. > Note: I submit data with primers, barcodes, etc **removed** #### Register Project & Upload Sample Data 1) go to https://www.ebi.ac.uk/ena/submit and select **Submit to ENA Interactively**. 2) Login or Register. 3) Go to **New Submission** tab and, if this is a new project, select **Register study (project)**. 4) Hit Next 5) Enter details and hit **Submit**. 6) Next, [Select Checklist](https://www.ebi.ac.uk/ena/submit/checklists). This will be specific to the type of samples you have and basically will create a template so you can add your sample metadata. For this study I chose **GSC MIxS soil**, checklist accession number [ERC000022](https://www.ebi.ac.uk/ena/data/view/ERC000022) 7) Next 8) Now go through and select/deselect fields as needed. Note, some fields are mandatory. 9) Once finished, hit **Next** to fill in any details that apply to *All* samples and the download the template. Alternatively, you can download the template and fill in the data by hand. 10) Upload the sample sheet. 11) Once everything looks good and uploaded, click the **New Submission** tab. #### Upload Sequence Data 12) Hit **Skip** and then select **Two Fastq files (Paired)**, Download the template and fill in the details. 13) Next, make sure all the trimmed fastq files are gzipped **.gz** (these are what you submit). 14) Navigate to the directory with the trimmed, compressed fastq files and run: ``` md5sum *.gz ``` 16) Add the checksums and file names to the fastq submission form. You can read more about [Preparing A File For Upload here](https://ena-docs.readthedocs.io/en/latest/submit/fileprep/preparation.html). **STOP** Before uploading the sheet, you first must upload the fastq files. Documentation for Uploading Files To ENA can be found [here](https://ena-docs.readthedocs.io/en/latest/submit/fileprep/upload.html). 17) In the same directory, run: ``` lftp webin2.ebi.ac.uk -u Webin-XXXXX ``` where `Webin-XXXXX` is your user name. Enter your password at the prompt run: ``` mput *.gz # when finished bye ``` The files should begin uploading. Depending on internet speed and/or file sizes/numbers, this could take a while. 18) Once the upload is finished, upload and submit the fastq submission form. If the `sample alias` field dows not autofill you may need to upload the sample form again. ```{r} #| message: false #| results: hide #| eval: true #| echo: false remove(list = ls()) ### COmmon formatting scripts ### NOTE: captioner.R must be read BEFORE captions_XYZ.R source(file.path("assets", "functions.R")) ``` #### Source Code {.appendix} The source code for this page can be accessed on GitHub `r fa(name = "github")` by [clicking this link](`r source_code()`). #### Last updated on {.appendix} ```{r,echo=FALSE, eval=TRUE} Sys.time() ```