Skip to main content

Data Sources

One of the more challenging aspects of working with genomics data can be finding real data to get your hands on. This page serves as a directory of places where one can access both open access and protected access (requires an application) data. Please be aware that many sites offer both open access and controlled access data, so they will show up in both lists in that case.

caution

This list is by no means complete! New data sources are being added all of the time, and we have only added a few data sources to start this page off. Please make a pull request with data sources you are aware of so we can incorporate them.

Open Access Data

Data Source NameTypes of DataDescriptionData Source URL
1000 Genomes ProjectWGS, VariantOne of the first large-scale, international sequencing efforts running from 2008 to 2015. You can learn more at the 1000genomes.org website.Main Page, FTP
Kids First Data Resource CommonsExpression CountsLearn more.Link
St. Jude CloudExpression CountsSt. Jude Cloud hosts pediatric disease and cancer survivorship data for over 17,500 subjects. Learn more by reading the publication associated with St. Jude Cloud.Link
Refine.bio.RNA-Seq and othersA search engine for normalized transcriptome data, created by Alex's Lemonade Stand FoundationLink

Protected Access Data

Data Source NameTypes of DataDescriptionData Source URL
International Cancer Genome ConsortiumWGS, WES, RNA-Seq, VariantLearn more.Link
Kids First Data Resource CommonsWGS, WES, RNA-Seq, VariantLearn more.Link
St. Jude CloudWGS, WES, RNA-Seq, VariantSt. Jude Cloud hosts pediatric disease and cancer survivorship data for over 17,500 subjects. Learn more by reading the publication associated with St. Jude Cloud.Link