R Data Frame

一笑 发表于 2008-10-19 06:12:23

don't know an R command?
use apropos() 
e.g histogram? apropos("hist")


Already know a command but don't know how to use it?
use ? for help

e.g  ?hist, will give you the manual how to use the hist() function.

data:three basic types: categorical, discrete numeric and continuous numeric
x<- c(1,2,3)
books<-1:8
books<-seq(1,8,1) # from 1 to 8, increasment is 1

Statistic Functions for R:
mean(x)     mean(x, trim=1/20) #trim 5percent of data from the top and bottom     
median(x)       max(x)      min(x) 
var(x)          sd(x) #not std(x)
summary(x)  quantile(x, .25)  quantile(x, c(0.05,0.95))
fivenum(x)    # the (min .25Q median .75Q  max)
IQR(x)   #interquartile range
sum(x) 
pairs(myframe)   # produce two by two Scatterplots.      

vector:   
a<-c(1,3,5,9,7)
which(a==3)         #which element equals to 3?
length(a)              #how many elements?
cummax(a)          #give the cumulative maxvalue
cummin(a)           #give the cummulative minimum value.
a[-2]                     #all but the second position:: output: [1] 1 5 7 9
a[2]                      #the second element
a[2:4]                   #the 2 to 4 elements
a[c(1,5)]              # the first and fifth elements
a[a<3 | a>5]        # the elements less than 3 or greater than 5
diff(a)                  #what is the difference of the current element with the next element? 2 2 2 -2

categorical vector:
YN<-c("Yes","No","No","Yes","No")
table(YN)                                                    #give table of YN
categorical data is often useful to make into table, in R called it factors.
factors(YN)   or as.factors(YN)

Want to build the data set like SRS or STATA or Excel?

use Data Frames

in a data frame, like the dataset in SRS and STATA all columns need to have the same number of elements.
Let's first build two variables, X, and Y
X<-seq(0:0.5:100)
Y<-2*X
myframe<-data.frame(X,Y)
Now you have a data frame called myframe.

To access it, just like using a matrix
myframe[2,2] will give the the element of row 2 and column 2. (which is the second value of Y)
myframe[2,] will give you the second row.
myframe[,1] will give you the first column (the X values)

or you can "name" your columns
names(myframe)<- c("source", "results")
then your can refer your columns by names using a prefix "$" sign.
e.g myframe$source[3]  is equivalent to myframe[3, 1]
e.g myframe$result[3]  is equivalent to myframe[3, 2]

if you forgot the names you gave to a fram
use: names(myframe) will give you the result "source" "result"

pairs(myframe)   # two by two scatter plots

Subseting a data frame.
sometimes we only need part of the data not all to do analysis.
use the logical vector or the subset() function.
x<-c("b","b","b","g","g","g","g","b","g","b")
y<-c(1.80, 1.70, 1.75, 1.50,1.65,1.72,1.55,1.75,1.58,1.77)
bgheight<-data.frame(x,y)
if you want to calculate the mean, median and variance of the height of boys.
use logical vector
mean(bgheight$y[x=="b"])  or  mean(bgheight[x=="b", 2])
sd(bgheight$y[x=="g"])
use subset:
bgheight.boy<-subset(bgheight, subset= x=="b", select=y)
mean(bgheight.boy) median(bgheight.boy)  sd(bgheight.boy)

ADD variables into a dataframe
grade=c(10,11,11,12,10,10,11,11,12,12)
bgheight$grade<-grade








         

相关日志:

最新评论

发表评论

*昵称

已经注册过? 请登录

Email
网址
*评论