Skip to main content

Table 1 The details of the real datasets investigated by MIOSTONE

From: Modeling microbiome-trait associations with taxonomy-adaptive neural networks

Dataset

Sample size

Feature size

Data source

Notes

AlzBiom

175 samples (75 amyloid-positive and 100 healthy control)

8350 taxa

EBI-ENA ID: PRJEB47976

The sequencing data is clean (QC’ed, host-filtered)

ASD

60 samples (30 typically developing and 30 constipated ASD)

7287 taxa

EBI-ENA ID: PRJNA451479

The sequencing data is raw (non-QC’ed)

GD

162 samples (100 Graves’ disease and 62 healthy control)

8487 taxa

EBI-ENA ID: PRJNA602729, PRJNA602731, PRJNA602732, PRJNA638403, PRJNA638404, PRJNA638405

Most samples have a pair of FASTQ files. However, 4 samples (three.lst) have a third, unpaired FASTQ file that is very small, and it should be excluded from the analysis. 12 samples have only one FASTQ file, which appears to be single-end sequences. Two samples: GA61 (SRR12000211) and GA89 (SRR12005695) are missing from the metadata. Therefore they were dropped from the data

RUMC

114 samples (42 Parkinson’s disease and 72 healthy control)

7256 taxa

Qiita ID: 12975

20 samples in BIOM are missing in metadata. These samples were dropped

TBC

113 samples (46 Parkinson’s disease and 67 healthy control)

6227 taxa

Qiita ID: 14476

5 samples in BIOM are missing in metadata. These samples were dropped

IBD

174 samples (108 Crohn’s disease and 66 ulcerative colitis)

5287 taxa

Qiita ID: 12675

The dataset contains metagenomic sequencing data and associated metadata. More details can be found at: https://qiita.ucsd.edu/study/description/12675

HMP2

1158 samples (728 Crohn’s disease and 430 ulcerative colitis)

10614 taxa

Qiita ID: 11484

The dataset contains metagenomic sequencing data and associated metadata from the Human Microbiome Project. More details can be found at: https://hmpdacc.org/ihmp