Introduction

ChIP-exo is a technology used in molecular biology to demarcate protein binding locations on the genome. Conceptually it’s similar to ChIP-seq, but has higher resolution. It has following key steps:

  1. Crosslink protein to target DNA
  2. fragmentation
  3. Chromatin immunoprecipitation (ChIP) pull down
  4. lambda exonuclease that digests double-stranded DNA in the 5′-3′ direction
  5. high-throughput sequencing
alternate text

Installation

Prerequisite:

Install procedure:

tar zxf ChEAP-VERSION.tar.gz
cd ChEAP-VERSION
python setup.py install #will install ChEAP in system level.
python setup.py install --root=/home/user/ChEAP #will install ChEAP at user-level.
export PYTHONPATH=/home/user/ChEAP/usr/local/lib/python2.7/site-packages:$PYTHONPATH
export PATH=/home/user/gencat/usr/local/bin:$PATH

Walkthrough example of ChIP-exo data analysis

Step1: Download ChIP-exo data fastq files (Accession number SRA044886):

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR346/SRR346401/SRR346401.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR346/SRR346402/SRR346402.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR346/SRR346403/SRR346403.fastq.gz

Step2: Aligment (fastq -> sam). Description: Use bowtie to map the reads to the genome. The color space human genome index is required (download from here):

bowtie -S -C -q -m 1 PATH/bowtie/indexes/colorspace/hg19c SRR346401.fastq Hg19_CTCF_1.sam
bowtie -S -C -q -m 1 PATH/bowtie/indexes/colorspace/hg19c SRR346402.fastq Hg19_CTCF_2.sam
bowtie -S -C -q -m 1 PATH/bowtie/indexes/colorspace/hg19c SRR346403.fastq Hg19_CTCF_3.sam

Step3: Convert SAM into BAM using samtools, then sort and index BAM file. You only need the index step if the aligner you used already produce sorted BAM file:

samtools view -bS Hg19_CTCF_1.sam > Hg19_CTCF_1.bam
samtools sort Hg19_CTCF_1.bam Hg19_CTCF_1.sorted
samtools index Hg19_CTCF_1.sorted.bam

samtools view -bS Hg19_CTCF_2.sam > Hg19_CTCF_2.bam
samtools sort Hg19_CTCF_2.bam Hg19_CTCF_2.sorted
samtools index Hg19_CTCF_2.sorted.bam

samtools view -bS Hg19_CTCF_3.sam > Hg19_CTCF_3.bam
samtools sort Hg19_CTCF_3.bam Hg19_CTCF_3.sorted
samtools index Hg19_CTCF_3.sorted.bam

You can download our results:

wget http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep1.bam
wget http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep1.bam.bai
wget http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep2.bam
wget http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep2.bam.bai
wget http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep3.bam
wget http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep3.bam.bai

Step4: Convert BAM into BigWig format. Reads mapped to forward strand will define one boundary of binding region, and reads mapped reverse strand will define the another boundary of binding region, so finally two bigwig files will be produced. Note you can download WigToBigWig from UCSC. hg19 chromosome size file download from here

Bam2wig.py -i Hg19_CTCF_1.sorted.bam  -r hg19.size -o Hg19_CTCF_1_1nt -e 1
wigToBigWig Hg19_CTCF_1_1nt_Forward.wig hg19.size Hg19_CTCF_1_1nt_Forward.bw
wigToBigWig Hg19_CTCF_1_1nt_Reverse.wig hg19.size Hg19_CTCF_1_1nt_Reverse.bw

Bam2wig.py -i Hg19_CTCF_2.sorted.bam  -r hg19.size -o Hg19_CTCF_2_1nt -e 1
wigToBigWig Hg19_CTCF_2_1nt_Forward.wig hg19.size Hg19_CTCF_2_1nt_Forward.bw
wigToBigWig Hg19_CTCF_2_1nt_Reverse.wig hg19.size Hg19_CTCF_2_1nt_Reverse.bw

Bam2wig.py -i Hg19_CTCF_3.sorted.bam  -r hg19.size -o Hg19_CTCF_3_1nt -e 1
wigToBigWig Hg19_CTCF_3_1nt_Forward.wig hg19.size Hg19_CTCF_3_1nt_Forward.bw
wigToBigWig Hg19_CTCF_3_1nt_Reverse.wig hg19.size Hg19_CTCF_3_1nt_Reverse.bw

Visualizing bigwig tracks using UCSC genome browser (copy the following lines and pasted into UCSC. Note the Assembly version is hg19

track type=bigWig name="Hg19_CTCF_rep1_1nt_Forward" visibility=2 color=0,0,153 windowingFunction=maximum db=hg19 viewLimits=0:30 autoScale=on alwaysZero=on bigDataUrl=http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep1_1nt_Forward.bw
track type=bigWig name="Hg19_CTCF_rep1_1nt_Reverse" visibility=2 color=153,0,0 windowingFunction=maximum db=hg19 viewLimits=0:30 autoScale=on alwaysZero=on bigDataUrl=http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep1_1nt_Reverse.bw
track type=bigWig name="Hg19_CTCF_rep2_1nt_Forward" visibility=2 color=0,0,153 windowingFunction=maximum db=hg19 viewLimits=0:30 autoScale=on alwaysZero=on bigDataUrl=http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep2_1nt_Forward.bw
track type=bigWig name="Hg19_CTCF_rep2_1nt_Reverse" visibility=2 color=153,0,0 windowingFunction=maximum db=hg19 viewLimits=0:30 autoScale=on alwaysZero=on bigDataUrl=http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep2_1nt_Reverse.bw
track type=bigWig name="Hg19_CTCF_rep3_1nt_Forward" visibility=2 color=0,0,153 windowingFunction=maximum db=hg19 viewLimits=0:30 autoScale=on alwaysZero=on bigDataUrl=http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep3_1nt_Forward.bw
track type=bigWig name="Hg19_CTCF_rep3_1nt_Reverse" visibility=2 color=153,0,0 windowingFunction=maximum db=hg19 viewLimits=0:30 autoScale=on alwaysZero=on bigDataUrl=http://dldcc-web.brc.bcm.edu/lilab/ChIP_exo_Analysis_Package/Hg19_CTCF_rep3_1nt_Reverse.bw

Step5. Filter out unreproducible signal:

#filter signal (forward strand)
BigWig_overlay.py -s hg19.size -i Hg19_CTCF_1_1nt_Forward.bw -j Hg19_CTCF_2_1nt_Forward.bw -a geometricMean -o CTCF_Forward.wig
#convert wig to bigwig
wigToBigWig CTCF_Forward.wig hg19.size CTCF_Forward.bw
#filter signal (reverse signal)
BigWig_overlay.py -s hg19.size -i Hg19_CTCF_1_1nt_Reverse.bw -j Hg19_CTCF_2_1nt_Reverse.bw -a geometricMean -o CTCF_Reverse.wig
#convert wig to bigwig
wigToBigWig CTCF_Reverse.wig hg19.size CTCF_Reverse.bw

Step6. Peak calling. Call peaks from the filtered signal profiling:

ChEAP_PeakCalling.py -s hg19.size -b CTCF_Forward.bw -d CTCF_Reverse.bw -o Hg19_CTCF

Step7. Peak Pairing. Pair the peaks on forward strand and reverse strand:

ChEAP_PeakPairing.py -i Hg19_CTCF.single_nt_peak.xls -b CTCF_Forward.bw -d CTCF_Reverse.bw -o Hg19_CTCF_peakspair

Usage

Bam2Wig.py

Convert BAM format file into wiggle format. Wiggle can be easily converted into bigwig using UCSC tools. To use Bamw2Wig.py, BAM file must be sorted and indexed properly using Samtools.

Options:
--version show program’s version number and exit
-h, --help show this help message and exit
-i INPUT_FILE, --input-file=INPUT_FILE
 Input file in BAM format. BAM file must be sorted and indexed using samTools. HowTo: http://genome.ucsc.edu/goldenPath/help/bam.html
-r CHROMSIZE, --chromSize=CHROMSIZE
 Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name, second column is size of the chromosome.
-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX
 Prefix of output wig files(s). “Prefix_Forward.wig” and “Prefix_Reverse.wig” will be generated
-b BIN, --bin=BIN
 Chromosome chunk size. Each chomosome will be cut into samll chunks of this size. Decrease chunk size will save more RAM. default=100000 (bp)
-e EXTENSION, --extension=EXTENSION
 Extended coverage from 5’ end of read. default=none (full read coverage will be used)

BigWig_extract_signal.py

Extract signals from bigwig file for regions specified in BED file (chrom, start, end):

Usage: ../bin/BigWig_extract_signal.py    input.bed    input.bw
* Bed file: chrom  start  end
* Direct output to STDOUT

BigWig_summary_signal.py

Similar to BigWig_extract_signal.py, but for each specified region report “sum”, “mean”, “median” and “std” instead of raw signals:

Usage: ../bin/BigWig_summary_signal.py    input.bed    input.bw
* This script will append 'sum','mean','median','std' to each entry in bed file
* Direct output to STDOUT

BigWig_overlay.py

Manipulate two bigwig files. Operations include “Add”,”Average”,”Division”,”Max”,”Min”,”Product”,”Subtract” and “genometric mean”

Options:
--version show program’s version number and exit
-h, --help show this help message and exit
-i BIGWIG_FILE1, --bwfile1=BIGWIG_FILE1
 BigWig files
-j BIGWIG_FILE2, --bwfile2=BIGWIG_FILE2
 BigWig files
-a ACTION, --action=ACTION
 After pairwise align two bigwig files, perform the follow actions (Only select one keyword):”Add” = add signals. “Average” = average signals. “Division”= divide bigwig2 from bigwig1. Add 1 to both bigwig. “Max” = pick the signal that is larger. “Min” = pick the signal that is smaller. “Product” = multiply signals. “Subtract” = subtract signals in 2nd bigwig file from the corresponiding ones in the 1st bigwig file. “geometricMean” = take the geometric mean of signals.
-o OUTPUT_WIG, --output=OUTPUT_WIG
 Output wig file
-s CHROMSIZE, --chromSize=CHROMSIZE
 Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name, second column is size of the chromosome.
-c CHUNK_SIZE, --chunk=CHUNK_SIZE
 Chromosome chunk size. Each chomosome will be cut into samll chunks of this size. Decrease chunk size will save more RAM. default=100000 (bp)

ChEAP_PeakCalling.py

Perform peak calling for ChIP-exo data from bigwig files.

Options:
--version show program’s version number and exit
-h, --help show this help message and exit
-b FORWARD_BW, --forward=FORWARD_BW
 BigWig file for forward reads (extend 1 nt from 5’ end of read)
-d REVERSE_BW, --reverse=REVERSE_BW
 BigWig file for reverse reads (extend 1 nt from 5’ end of read)
-s CHROMSIZE, --chromSize=CHROMSIZE
 Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name, second column is size of the chromosome.
-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX
 Prefix of output files
-z FUZZY_SIZE, --fuzziness=FUZZY_SIZE
 Peaks within fuzzy window will be merged. default=10 (bp)
-w WINDOW_SIZE, --bgw=WINDOW_SIZE
 Background window size used to determine background signal level (lambda in Poisson model). default=200 (bp)
-c CHUNK_SIZE, --chunk=CHUNK_SIZE
 Chromosome chunk size. Each chomosome will be cut into samll chunks of this size. Decrease chunk size will save more RAM. default=100000 (bp)
-p PVALUE_CUTOFF, --pvalue=PVALUE_CUTOFF
 Pvalue cutoff for peak detection. default=0.1
-r BG_ROOT_NUM, --bg-root-num=BG_ROOT_NUM
 Background peak root number. default=100
-e EXTENTION_SIZE, --extention=EXTENTION_SIZE
 Window size used to calculate peak area. Larger number will signficantly reduce speed, and make peak calling more meaningless. default=5

ChEAP_PeakPairing.py

Perform peak pairing for ChIP-exon data, based on peak calling results.

Options:
--version show program’s version number and exit
-h, --help show this help message and exit
-i INPUT_FILE, --input-file=INPUT_FILE
 File generated by ChEAP_PeakCalling.py
-b FORWARD_BW, --forward=FORWARD_BW
 BigWig file for forward reads (extend 1 nt from 5’ end of read)
-d REVERSE_BW, --reverse=REVERSE_BW
 BigWig file for reverse reads (extend 1 nt from 5’ end of read)
-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX
 Output peak-pair file in bed format
-m MAX_DISTANCE, --max-dist=MAX_DISTANCE
 maximum distance allowed for peak pairing. default=50
-n MIN_DISTANCE, --min-dist=MIN_DISTANCE
 minimum distance allowed for peak pairing. default=18
-t MAX_DS_PEAK_NUM, --mdpn=MAX_DS_PEAK_NUM
 maximum number of downstream peaks considered. set to 1 equivalent to picking teh nearest downstream peak. default=1

Cross_strand_euclidean.py

Determine the optimum peak pair size through calculating cross strand euclidean distance.

Options:
--version show program’s version number and exit
-h, --help show this help message and exit
-p PEAK_FILE, --peak-file=PEAK_FILE
 Peak file generated by ChEAP_PeakCalling
-f FORWARD_PEAK, --forward=FORWARD_PEAK
 BigWig file of forward peak (first 5nt)
-r REVERSE_PEAK, --reverse=REVERSE_PEAK
 BigWig file of reverse peak (first 5nt)
-c CHROMSIZE, --chromSize=CHROMSIZE
 Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name, second column is size of the chromosome.
-w WINDOW_SIZE, --window=WINDOW_SIZE
 Window size (on genome) to calculate cross strand distance. default=5
-s MAX_DISTANCE, --shift-size=MAX_DISTANCE
 Maximum shift size. default=100

Contacts

  • Wang Liguo: wangliguo78 AT gmail.com
  • Junsheng Chen: johnsonchen1987 AT gmail.com