Creating pie charts of binding from ChIP-seq data using HOMER

Feeds:: Posts; Comments

Creating pie charts of binding from ChIP-seq data using HOMER

April 25, 2012 by wginy

This post will go through the steps to make a pie graph annotating binding locations using HOMER output. HOMER is typically used to analyze chip-seq/faire-seq data.

HOMER is a set of programs that can be used to call and annotate chip-seq peaks (among other things).
This post will go through the steps create a pie chart of genomic annotations output by HOMER using R

First, I would like to begin with a disclaimer from the documentation of the pie() function in R:

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

However, pie charts are still often used in biology to display this kind of information.

First you will want to generate annotation data outlined on this page of HOMER documentation. I ran something similar:
annotatePeaks.pl mypeaks.bed hg19 > anno_mypeaks.txt
This will give you a file that looks like this:
head -n 3 anno_mypeaks.txt

PeakID (cmd=mypeaks.bed.txt hg19)

Chr

Start

End

Strand

Peak Score

Focus Ratio/Region Size

Annotation

Detailed Annotation

Distance to TSS

Nearest PromoterID

Entrez ID

Nearest Unigene

Nearest Refseq

Nearest Ensembl

Gene Name

Gene Alias

Gene Description

1504

chr2

2.33E+08

0.044207

Intergenic

-3591

NR_031717

1E+08

NR_031717

MIR1471

MIRN1471|hsa-mir-1471

microRNA 1471

1485

chr2

2.2E+08

0.043998

intron (NM_025216, intron 2 of 3)

2321

NM_025216

80326

Hs.121540

NM_025216

ENSG00000135925

WNT10A

FLJ14301|SSPS

wingless-type MMTV integration site family, member 10A

Next, read the annotation output into R:
anno = read.table("anno_mypeaks.txt", sep="\t", header=T, quote="")
You will notice that the 8th row (Annotation) conetains information on where the peak overlaps. By cutting this out, we can create a pie chart.
pietable = table(unlist(lapply(strsplit(as.character(anno$Annotation), " \\("),"[[",1))) #take the part before first ' (' pie(pietable, main="mypeaks annotation")

Now if you look closely, you will notice that this pie chart is missing a category, on the HOMER annotation page, it lists 7 categories but the above chart is only showing 6 (missing promoter-TSS). To solve this, we can initalize a table of 0s called newtable.
newtable = table(c("3' UTR","5' UTR","exon","Intergenic","intron","promoter-TSS","TTS")) newtable[names(newtable)] = 0 #reset everything to 0 newtable[names(pietable)] = pietable pie(newtable, main="mypeaks annotation")

Next, I will add %s and numbers to each of the labels:
names(newtable) = paste(names(newtable), "(", round(newtable/sum(newtable)*100), "%, ", newtable, ")", sep="") pie(newtable, main="mypeaks annotation")

Next, I will choose better looking colors than the default, this can be done in two ways, either specifying the colors or choosing colors from gradient, I use the latter:
#use colors() to see all possible colors #rainbow(7) looks ugly pie(newtable, main="mypeaks annotation", col=rainbow(9))

The figures thus far look small and cramped, you can export the figures are a bigger size using
dev.copy(png, "mypeaks.png", width=800, height=600);dev.off();

After some cropping, this is the same picture as the one shown in the beginning of the post.

It is also possible to make higher quality figures by copying to pdf and in dev.copy() and converting the pdf into png using the convert program on linux.

Now keep in mind that what you probably want to know is enrichment over genomic background and this figure will not tell you that. It will only tell you where the regions that you have identified are located even though there is more intergenic regions in the genome. This plot will also depend on how long your regions are since promoter-TSS > TTS > exon > 5′ UTR > 3′ UTR > introns > Intergenic (see homer documentation)

Lastly, the information stored from HOMER annotation output can also be used to make other figures such as histogram of distance from TSS just make sure you know what you are looking at since distance to TSS can be calculated in many ways (closest end of region? peak of region (given in macs output)? middle of region?, HOMER uses the closest end for this calclation)

Posted in Uncategorized | Leave a Comment

Comments RSS

Learning bioinformatics

Tips I picked up while learning bioinformatics