Plotting huge files in R can be a cake walk. Here is how you can proceed:
Suppose say you have a huge file with 2 million entries in[TAB] format.
Then go ahead and read the file in the following format:
>data <- read.table(file="fileName",head=TRUE,sep=",")
# read.table is a file reading function
# Note here, if you have no header then say head=FALSE and if you data is separated by
# \t then say sep="\t"
#You may like to see summary of your data by saying
> summary(data)
V1 V2
HPAA-aaa57a02.b1: 1 Min. :0.000000
HPAA-aaa57a02.g1: 1 1st Qu.:0.009985
HPAA-aaa57a03.b1: 1 Median :0.020565
HPAA-aaa57a03.g1: 1 Mean :0.021674
HPAA-aaa57a04.b1: 1 3rd Qu.:0.033898
HPAA-aaa57a04.g1: 1 Max. :0.065789
(Other) :1140845
# Pretty Cool..
If you have not written down the column names in the data file, don't worry R provides one for you.
>names(data)
>[1] "V1" "V2"
# So, V1 and V2 are your column names.
# To Plot a Histogram simply say
>hist(data$V2)
# If you did not select any output device by default it will plot as a Rplots.ps file
# on your working directory
# You can however copy this to a file of your interest
>dev.copy(postscript,"myfile.ps",height=7,width=7)
# Here Height and width are in inches
# Sometimes it does not allow you to do a dev.copy when your dev.list() is NULL.
# You can populate it by first plotting something or by doing a hist(data$V2)
# or just a plot(data$V2)
# Then do
>hist(data$V2)
# You can check how many devices are attached to your screen by doing a
> dev.list()
postscript postscript postscript
2 3 4
# You can switch off your current device by doing a
>dev.off()
postscript
2
> dev.list()
postscript postscript
2 3
# I Plotted my histogram using the following command:
5
> hist(data$V2,breaks=100,main="Trace File Blast hit Frequency plot, cutoff=30"
,xlab="Y=M/(R+L - 2O)",col="red",las=1,freq=TRUE)
> lines(density(data$V2))
This will save the file in the current file "myfile.ps". Every subsequent call to a plot function will keep plotting in different pages of the same file.
Alternatively a truehist() can be called to plot histograms. But before that do a
>require(MASS)
> truehist(data$V2,nbins=100,main="Trace File Blast hit Frequency plot, cutoff=
30",xlab="Y=M/(R+L - 2O)",col="blue",las=1,prob=TRUE)
Making a boxplot:
Plotting a boxplot is equally easier using R. Follow the following commands:
data <- read.table(file="bgcorrect.out",head=FALSE,sep="\t")
summary(data)
#boxplot(data$vals,main='normalized data after synthetic normalization', ylab='G
ene Expression values');
boxplot(log2(data))
help(boxplot)
SAVING OUTPUT AS TEXT FILE
#In case you are just interested to save the histogram output file into a text file
> h <- hist(data$V2,breaks=100,plot=FALSE) # Plot= FALSE will not plot, default true
> save(h,ascii=TRUE,file="data.txt") # Will save data in ascii format, default=FALSE
# If you want to read the content of h
>h
# You will get the following
# $breaks -> break points
# $count -> frequency
# $density -> plot density
# $intensities -> plot intensity
# $mids -> mid points of data
So, just printing a single variable will be
>h$mids # Will print the mid point..
Suppose say you have a huge file with 2 million entries in
Then go ahead and read the file in the following format:
>data <- read.table(file="fileName",head=TRUE,sep=",")
# read.table is a file reading function
# Note here, if you have no header then say head=FALSE and if you data is separated by
# \t then say sep="\t"
#You may like to see summary of your data by saying
> summary(data)
V1 V2
HPAA-aaa57a02.b1: 1 Min. :0.000000
HPAA-aaa57a02.g1: 1 1st Qu.:0.009985
HPAA-aaa57a03.b1: 1 Median :0.020565
HPAA-aaa57a03.g1: 1 Mean :0.021674
HPAA-aaa57a04.b1: 1 3rd Qu.:0.033898
HPAA-aaa57a04.g1: 1 Max. :0.065789
(Other) :1140845
# Pretty Cool..
If you have not written down the column names in the data file, don't worry R provides one for you.
>names(data)
>[1] "V1" "V2"
# So, V1 and V2 are your column names.
# To Plot a Histogram simply say
>hist(data$V2)
# If you did not select any output device by default it will plot as a Rplots.ps file
# on your working directory
# You can however copy this to a file of your interest
>dev.copy(postscript,"myfile.ps",height=7,width=7)
# Here Height and width are in inches
# Sometimes it does not allow you to do a dev.copy when your dev.list() is NULL.
# You can populate it by first plotting something or by doing a hist(data$V2)
# or just a plot(data$V2)
# Then do
>hist(data$V2)
# You can check how many devices are attached to your screen by doing a
> dev.list()
postscript postscript postscript
2 3 4
# You can switch off your current device by doing a
>dev.off()
postscript
2
> dev.list()
postscript postscript
2 3
# I Plotted my histogram using the following command:
5
> hist(data$V2,breaks=100,main="Trace File Blast hit Frequency plot, cutoff=30"
,xlab="Y=M/(R+L - 2O)",col="red",las=1,freq=TRUE)
> lines(density(data$V2))
This will save the file in the current file "myfile.ps". Every subsequent call to a plot function will keep plotting in different pages of the same file.
Alternatively a truehist() can be called to plot histograms. But before that do a
>require(MASS)
> truehist(data$V2,nbins=100,main="Trace File Blast hit Frequency plot, cutoff=
30",xlab="Y=M/(R+L - 2O)",col="blue",las=1,prob=TRUE)
summary(data)
#boxplot(data$vals,main='normalized data after synthetic normalization', ylab='G
ene Expression values');
boxplot(log2(data))
help(boxplot)
SAVING OUTPUT AS TEXT FILE
#In case you are just interested to save the histogram output file into a text file
> h <- hist(data$V2,breaks=100,plot=FALSE) # Plot= FALSE will not plot, default true
> save(h,ascii=TRUE,file="data.txt") # Will save data in ascii format, default=FALSE
# If you want to read the content of h
>h
# You will get the following
# $breaks -> break points
# $count -> frequency
# $density -> plot density
# $intensities -> plot intensity
# $mids -> mid points of data
So, just printing a single variable will be
>h$mids # Will print the mid point..
No comments:
Post a Comment