一~~~~笑 » 日志 » R Data Frame
R Data Frame
一笑 发表于 2008-10-19 06:12:23
don't know an R command?
use apropos()
e.g histogram? apropos("hist")
Already know a command but don't know how to use it?
use ? for help
e.g ?hist, will give you the manual how to use the hist() function.
data:three basic types: categorical, discrete numeric and continuous numeric
x<- c(1,2,3)
books<-1:8
books<-seq(1,8,1) # from 1 to 8, increasment is 1
Statistic Functions for R:
mean(x) mean(x, trim=1/20) #trim 5percent of data from the top and bottom
median(x) max(x) min(x)
var(x) sd(x) #not std(x)
summary(x) quantile(x, .25) quantile(x, c(0.05,0.95))
fivenum(x) # the (min .25Q median .75Q max)
IQR(x) #interquartile range
sum(x)
pairs(myframe) # produce two by two Scatterplots.
vector:
a<-c(1,3,5,9,7)
which(a==3) #which element equals to 3?
length(a) #how many elements?
cummax(a) #give the cumulative maxvalue
cummin(a) #give the cummulative minimum value.
a[-2] #all but the second position:: output: [1] 1 5 7 9
a[2] #the second element
a[2:4] #the 2 to 4 elements
a[c(1,5)] # the first and fifth elements
a[a<3 | a>5] # the elements less than 3 or greater than 5
diff(a) #what is the difference of the current element with the next element? 2 2 2 -2
categorical vector:
YN<-c("Yes","No","No","Yes","No")
table(YN) #give table of YN
categorical data is often useful to make into table, in R called it factors.
factors(YN) or as.factors(YN)
Want to build the data set like SRS or STATA or Excel?
use Data Frames
in a data frame, like the dataset in SRS and STATA all columns need to have the same number of elements.
Let's first build two variables, X, and Y
X<-seq(0:0.5:100)
Y<-2*X
myframe<-data.frame(X,Y)
Now you have a data frame called myframe.
To access it, just like using a matrix
myframe[2,2] will give the the element of row 2 and column 2. (which is the second value of Y)
myframe[2,] will give you the second row.
myframe[,1] will give you the first column (the X values)
or you can "name" your columns
names(myframe)<- c("source", "results")
then your can refer your columns by names using a prefix "$" sign.
e.g myframe$source[3] is equivalent to myframe[3, 1]
e.g myframe$result[3] is equivalent to myframe[3, 2]
if you forgot the names you gave to a fram
use: names(myframe) will give you the result "source" "result"
pairs(myframe) # two by two scatter plots
Subseting a data frame.
sometimes we only need part of the data not all to do analysis.
use the logical vector or the subset() function.
x<-c("b","b","b","g","g","g","g","b","g","b")
y<-c(1.80, 1.70, 1.75, 1.50,1.65,1.72,1.55,1.75,1.58,1.77)
bgheight<-data.frame(x,y)
if you want to calculate the mean, median and variance of the height of boys.
use logical vector
mean(bgheight$y[x=="b"]) or mean(bgheight[x=="b", 2])
sd(bgheight$y[x=="g"])
use subset:
bgheight.boy<-subset(bgheight, subset= x=="b", select=y)
mean(bgheight.boy) median(bgheight.boy) sd(bgheight.boy)
ADD variables into a dataframe
grade=c(10,11,11,12,10,10,11,11,12,12)
bgheight$grade<-grade
use apropos()
e.g histogram? apropos("hist")
Already know a command but don't know how to use it?
use ? for help
e.g ?hist, will give you the manual how to use the hist() function.
data:three basic types: categorical, discrete numeric and continuous numeric
x<- c(1,2,3)
books<-1:8
books<-seq(1,8,1) # from 1 to 8, increasment is 1
Statistic Functions for R:
mean(x) mean(x, trim=1/20) #trim 5percent of data from the top and bottom
median(x) max(x) min(x)
var(x) sd(x) #not std(x)
summary(x) quantile(x, .25) quantile(x, c(0.05,0.95))
fivenum(x) # the (min .25Q median .75Q max)
IQR(x) #interquartile range
sum(x)
pairs(myframe) # produce two by two Scatterplots.
vector:
a<-c(1,3,5,9,7)
which(a==3) #which element equals to 3?
length(a) #how many elements?
cummax(a) #give the cumulative maxvalue
cummin(a) #give the cummulative minimum value.
a[-2] #all but the second position:: output: [1] 1 5 7 9
a[2] #the second element
a[2:4] #the 2 to 4 elements
a[c(1,5)] # the first and fifth elements
a[a<3 | a>5] # the elements less than 3 or greater than 5
diff(a) #what is the difference of the current element with the next element? 2 2 2 -2
categorical vector:
YN<-c("Yes","No","No","Yes","No")
table(YN) #give table of YN
categorical data is often useful to make into table, in R called it factors.
factors(YN) or as.factors(YN)
Want to build the data set like SRS or STATA or Excel?
use Data Frames
in a data frame, like the dataset in SRS and STATA all columns need to have the same number of elements.
Let's first build two variables, X, and Y
X<-seq(0:0.5:100)
Y<-2*X
myframe<-data.frame(X,Y)
Now you have a data frame called myframe.
To access it, just like using a matrix
myframe[2,2] will give the the element of row 2 and column 2. (which is the second value of Y)
myframe[2,] will give you the second row.
myframe[,1] will give you the first column (the X values)
or you can "name" your columns
names(myframe)<- c("source", "results")
then your can refer your columns by names using a prefix "$" sign.
e.g myframe$source[3] is equivalent to myframe[3, 1]
e.g myframe$result[3] is equivalent to myframe[3, 2]
if you forgot the names you gave to a fram
use: names(myframe) will give you the result "source" "result"
pairs(myframe) # two by two scatter plots
Subseting a data frame.
sometimes we only need part of the data not all to do analysis.
use the logical vector or the subset() function.
x<-c("b","b","b","g","g","g","g","b","g","b")
y<-c(1.80, 1.70, 1.75, 1.50,1.65,1.72,1.55,1.75,1.58,1.77)
bgheight<-data.frame(x,y)
if you want to calculate the mean, median and variance of the height of boys.
use logical vector
mean(bgheight$y[x=="b"]) or mean(bgheight[x=="b", 2])
sd(bgheight$y[x=="g"])
use subset:
bgheight.boy<-subset(bgheight, subset= x=="b", select=y)
mean(bgheight.boy) median(bgheight.boy) sd(bgheight.boy)
ADD variables into a dataframe
grade=c(10,11,11,12,10,10,11,11,12,12)
bgheight$grade<-grade
相关日志:
收藏:
QQ书签
del.icio.us
