This post will go through the steps to make a pie graph annotating binding locations using HOMER output. HOMER is typically used to analyze chip-seq/faire-seq data.
HOMER is a set of programs that can be used to call and annotate chip-seq peaks (among other things).
This post will go through the steps create a pie chart of genomic annotations output by HOMER using R
First, I would like to begin with a disclaimer from the documentation of the pie() function in R:
Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.
However, pie charts are still often used in biology to display this kind of information.
First you will want to generate annotation data outlined on this page of HOMER documentation. I ran something similar:
annotatePeaks.pl mypeaks.bed hg19 > anno_mypeaks.txt
This will give you a file that looks like this:
head -n 3 anno_mypeaks.txt
Next, read the annotation output into R:
anno = read.table("anno_mypeaks.txt", sep="\t", header=T, quote="")
You will notice that the 8th row (Annotation) conetains information on where the peak overlaps. By cutting this out, we can create a pie chart.
pietable = table(unlist(lapply(strsplit(as.character(anno$Annotation), " \\("),"[[",1))) #take the part before first ' ('
pie(pietable, main="mypeaks annotation")
Now if you look closely, you will notice that this pie chart is missing a category, on the HOMER annotation page, it lists 7 categories but the above chart is only showing 6 (missing promoter-TSS). To solve this, we can initalize a table of 0s called newtable.
newtable = table(c("3' UTR","5' UTR","exon","Intergenic","intron","promoter-TSS","TTS"))
newtable[names(newtable)] = 0 #reset everything to 0
newtable[names(pietable)] = pietable
pie(newtable, main="mypeaks annotation")
Next, I will add %s and numbers to each of the labels:
names(newtable) = paste(names(newtable), "(", round(newtable/sum(newtable)*100), "%, ", newtable, ")", sep="")
pie(newtable, main="mypeaks annotation")
Next, I will choose better looking colors than the default, this can be done in two ways, either specifying the colors or choosing colors from gradient, I use the latter:
#use colors() to see all possible colors
#rainbow(7) looks ugly
pie(newtable, main="mypeaks annotation", col=rainbow(9))
The figures thus far look small and cramped, you can export the figures are a bigger size using
dev.copy(png, "mypeaks.png", width=800, height=600);dev.off();
After some cropping, this is the same picture as the one shown in the beginning of the post.
It is also possible to make higher quality figures by copying to pdf and in dev.copy()
and converting the pdf into png using the convert
program on linux.
Now keep in mind that what you probably want to know is enrichment over genomic background and this figure will not tell you that. It will only tell you where the regions that you have identified are located even though there is more intergenic regions in the genome. This plot will also depend on how long your regions are since promoter-TSS > TTS > exon > 5′ UTR > 3′ UTR > introns > Intergenic (see homer documentation)
Lastly, the information stored from HOMER annotation output can also be used to make other figures such as histogram of distance from TSS just make sure you know what you are looking at since distance to TSS can be calculated in many ways (closest end of region? peak of region (given in macs output)? middle of region?, HOMER uses the closest end for this calclation)
Click here to comment