From: Modeling microbiome-trait associations with taxonomy-adaptive neural networks
Dataset | Sample size | Feature size | Data source | Notes |
---|---|---|---|---|
AlzBiom | 175 samples (75 amyloid-positive and 100 healthy control) | 8350 taxa | EBI-ENA ID: PRJEB47976 | The sequencing data is clean (QC’ed, host-filtered) |
ASD | 60 samples (30 typically developing and 30 constipated ASD) | 7287 taxa | EBI-ENA ID: PRJNA451479 | The sequencing data is raw (non-QC’ed) |
GD | 162 samples (100 Graves’ disease and 62 healthy control) | 8487 taxa | EBI-ENA ID: PRJNA602729, PRJNA602731, PRJNA602732, PRJNA638403, PRJNA638404, PRJNA638405 | Most samples have a pair of FASTQ files. However, 4 samples (three.lst) have a third, unpaired FASTQ file that is very small, and it should be excluded from the analysis. 12 samples have only one FASTQ file, which appears to be single-end sequences. Two samples: GA61 (SRR12000211) and GA89 (SRR12005695) are missing from the metadata. Therefore they were dropped from the data |
RUMC | 114 samples (42 Parkinson’s disease and 72 healthy control) | 7256 taxa | Qiita ID: 12975 | 20 samples in BIOM are missing in metadata. These samples were dropped |
TBC | 113 samples (46 Parkinson’s disease and 67 healthy control) | 6227 taxa | Qiita ID: 14476 | 5 samples in BIOM are missing in metadata. These samples were dropped |
IBD | 174 samples (108 Crohn’s disease and 66 ulcerative colitis) | 5287 taxa | Qiita ID: 12675 | The dataset contains metagenomic sequencing data and associated metadata. More details can be found at: https://qiita.ucsd.edu/study/description/12675 |
HMP2 | 1158 samples (728 Crohn’s disease and 430 ulcerative colitis) | 10614 taxa | Qiita ID: 11484 | The dataset contains metagenomic sequencing data and associated metadata from the Human Microbiome Project. More details can be found at: https://hmpdacc.org/ihmp |