HiChIP Data Sets

To download one of the data sets, simply use the wget command:

wget https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/HiChiP_CTCF_2M_R1.fastq.gz
wget https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/HiChiP_CTCF_2M_R2.fastq.gz

For testing purposes, we recommend using the 2M reads data sets, for any other purpose we recommend using the 800M reads data set.

Sequenced (human) libraries:

Library	Link
GM12878 CTCF 2M	https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/HiChiP_CTCF_2M_R1.fastq.gz https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/HiChiP_CTCF_2M_R2.fastq.gz
GM12878 CTCF (deep sequencing)	https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/CTCF-DS_R1.fastq.gz https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/CTCF-DS_R2.fastq.gz
GM12878 H3K27Ac (deep sequencing)	https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/H3K27Ac_R1.fastq.gz https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/H3K27Ac_R2.fastq.gz
GM12878 H3K4me3 (deep sequencing)	https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/H3K4me3_R1.fastq.gz https://s3.amazonaws.com/dovetail.pub/HiChIP/fastqs/H3K4me3_R2.fastq.gz

Human, hg38, Peak files from ENCODE project

Sample	Target	Accession	URL	Output type
GM12878	CTCF	ENCFF017XLW	https://www.encodeproject.org/files/ENCFF017XLW/@@download/ENCFF017XLW.bed.gz	conservative IDR thresholded peaks
IMR-90	H3K4ac	ENCFF823NUO	https://www.encodeproject.org/files/ENCFF823NUO/@@download/ENCFF823NUO.bed.gz	replicated peaks
GM12878	H3K4me3	ENCFF188SZS	https://www.encodeproject.org/files/ENCFF188SZS/@@download/ENCFF188SZS.bed.gz	replicated peaks
IMR-90	H3K14ac	ENCFF106EAN	https://www.encodeproject.org/files/ENCFF106EAN/@@download/ENCFF106EAN.bed.gz	replicated peaks
GM12878	H3K27ac	ENCFF367KIF	https://www.encodeproject.org/files/ENCFF367KIF/@@download/ENCFF367KIF.bed.gz	replicated peaks
GM12878	H3K27me3	ENCFF153VOQ	https://www.encodeproject.org/files/ENCFF153VOQ/@@download/ENCFF153VOQ.bed.gz	replicated peaks
GM12878	H3K36me3	ENCFF268HMO	https://www.encodeproject.org/files/ENCFF268HMO/@@download/ENCFF268HMO.bed.gz	replicated peaks
GM12878	SMC3	ENCFF534PUK	https://www.encodeproject.org/files/ENCFF534PUK/@@download/ENCFF534PUK.bed.gz	bed
MCF-7	Klf4	ENCFF287QDZ	https://www.encodeproject.org/files/ENCFF287QDZ/@@download/ENCFF287QDZ.bed.gz	conservative IDR thresholded peaks
GM23338	Nanog	ENCFF897LBK	https://www.encodeproject.org/files/ENCFF897LBK/@@download/ENCFF897LBK.bed.gz	conservative IDR thresholded peaks
GM12878	POLR2A	ENCFF794VYB	https://www.encodeproject.org/files/ENCFF794VYB/@@download/ENCFF794VYB.bed.gz	conservative IDR thresholded peaks

Data used for HiChIP Comparative Analysis (Mouse, mm10)

To get a list of all the files generated from the HiChIP Comparative Analysis tutorial, including the required reference genomes, you can use the command:

aws s3 ls s3://dovetail.pub/HiChIP/compare_samples/

Use wget to download any given file, replacing “s3://” with “https://s3.amazonaws.com/”, followed by the remaining path to the file. For example:

wget https://s3.amazonaws.com/dovetail.pub/HiChIP/compare_samples/Reference_Genome/mm10.fa

Data Set	Link
Fastqs (Sample A)	https://s3.amazonaws.com/dovetail.pub/HiChIP/compare_samples/fastq_inputs/sampleA_R1.fastq.gz https://s3.amazonaws.com/dovetail.pub/HiChIP/compare_samples/fastq_inputs/sampleA_R2.fastq.gz
Fastqs (Sample B)	https://s3.amazonaws.com/dovetail.pub/HiChIP/compare_samples/fastq_inputs/sampleB_R1.fastq.gz https://s3.amazonaws.com/dovetail.pub/HiChIP/compare_samples/fastq_inputs/sampleB_R2.fastq.gz

Note: The full dataset, including input files and generated output is ~183Gb (roughly 5h with a network speed of 10Mb/s).