Quantitative
Data Analysis

  Working with R
Working with R
What is R
   A computer language, with orientation toward statistical
   applications
Advantages
   Completely free, just download from Internet
   Many add-on packages for specialized uses
   Open source
Getting Started: Installing R
Have Internet connection
Go to http://cran.r-project/
R for Windows screen, click “base”
Find, click on download R
Click Run, OK, or Next for all screens
End up with R icon on desktop
At http://cran.r-project.org/
  Haga clic para modificar el estilo de texto del patrón
     Segundo nivel
            ● Tercer nivel

                  ● Cuarto nivel

                        ● Quinto nivel
Downloading Base R
Click on Windows
Then in next screen, click on “base”
Then screens for Run, OK, or Next
And finally “Finish”
   will put R icon on desktop
Rgui and R Consolen
 ending with R prompt (>)
Haga clic para modificar el estilo de texto del patrón
   Segundo nivel
          ● Tercer nivel

                ● Cuarto nivel

                      ● Quinto nivel
The R prompt (>)
> This is the “R prompt.”
  It says R is ready to take your command.
Enter these after the prompt, observe output
     >2+3
   >2^3+(5)
   >6/2+(8+5)
   >2 ^ 3 + (5)
Installing Packages and
                Libraries
install.packages("akima")
install.packages("chron")
install.packages("lme4")
install.packages("mcmc")
install.packages("odesolve")
install.packages("spdep")
install.packages("spatstat")
install.packages("tree")
install.packages("lattice")
Installing Packages and
        Libraries
Installing Packages and
                Libraries
R.version
installed.packages()
update.packages()
setRepositories()
Help
help(mean)
?mean
help will not find a function in a package unless you install it and
load it with library
help.search(“aspline”) will find functions in packages installed
but not loaded
apropos("lm")
Help
For help on whole package:
    help(package=akima)
   objects(grep("akima",search()))

library(“akima”)
my.packages <- search()
aki <- grep("akima",my.packages)
my.objects <- objects(aki)
Help
example(mean)

demo()
demo(package = packages(all.available = TRUE))
demo(graphics)

vignette(all=TRUE)
V <- vignette("sp")
print(V)
edit(V)
Maintenance
ls() / objects()
search()
class(a)
rm(a,b,c)
rm(list=ls())
Maintenance
getwd()
setwd()
source("myprogram.R ")
save(list = ls(all=TRUE), file= "all.Rdata")
load("all.Rdata")
save.image()
savehistory()
To cite use of R
To cite the use of R for statistical work, R
documentation recommends the following:
  R Development Core Team (2010). R: A language and
environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,
URL http://www.R-project.org/.


Get the latest citation by typing citation ( ) at the
prompt.
Email Support Lists
http://r-project.org under "mailing lists"
r-help is the most general one
Before posting, read:
 http://www.R-project.org/postingguide.html
Send the smallest possible example of your problem (generated data
is handy)
sessionInfo() will list your computer & R details to cut/paste to
your question
Quantitative
Data Analysis

Programming with R
Basic concepts
Code
Commands
Programs
Objects
Types
Functions
Operators
assignment
a <- 1
assign("b", 2)
Mathematical operators
+ - */ ^ arithmetic
> >= < <= == != relational
! & logical
$ list indexing (the ‘element name’ operator)
: create a sequence
~ model formulae
Logical operators
! logical NOT
& logical AND
| logical OR
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== logical equals (double =)
!= not equal
&& AND with IF
|| OR with IF
xor(x,y) exclusive OR
isTRUE(x) an abbreviation of identical(TRUE,x)
all(x)
any(x)
Mathematical functions
log(x) log to base e of x
exp(x) antilog of x ex
log(x,n) log to base n of x
log10(x) log to base 10 of x
sqrt(x) square root of x

factorial(x) x!
choose(n,x) binomial coefficients n!/(x! n−x!)
gamma(x) x, for real x x−1!, for integer x
lgamma(x) natural log of x
Mathematical functions
floor(x) greatest integer <x
ceiling(x) smallest integer >x
trunc(x) round(x, digits=0) round the value of x to an integer
abs(x) the absolute value of x, ignoring the minus sign if there is one
signif(x, digits=6) give x to 6 digits in scientific notation
Trigonometrical functions
cos(x) cosine of x in radians
sin(x) sine of x in radians
tan(x) tangent of x in radians
acos(x), asin(x), atan(x) inverse trigonometric transformations of real
or complex numbers
acosh(x), asinh(x), atanh(x) inverse hyperbolic trigonometric
transformations of real or complex numbers
Infinity and Things that Are Not
            a Number
Inf (is.finite,is.infinite)
     3/0
    2 / Inf
    exp(-Inf)
    (0:3)^Inf
NaN (is.nan)
   0/0
Vectors
a <- c(1,2,3,4,5)
a <- 1:5
a <- scan()
a <- seq(1,10,2)
b <- 1:4
a <- seq(1,10,along=b)
x <- runif(10)
which(a == 2)
Plotting functions
x<-seq(-10,10,0.1)
y<-x^3
plot(x,y,type=‘l’)
Vector functions
max(x) maximum value in x
min(x) minimum value in x
sum(x) total of all the values in x
sort(x) a sorted version of x
rank(x) vector of the ranks of the values in x
order(x) an integer vector containing the permutation to sort x into
ascending order
range(x) vector of minx and maxx
More functions
cumsum(x) vector containing the sum of all of the elements up to
that point
cumprod(x) vector containing the product of all of the elements up to
that point
cummax(x) vector of non-decreasing numbers which are the
cumulative maxima of the values in x up to that point
cummin(x) vector of non-increasing numbers which are the
cumulative minima of the values in x up to that point
pmax(x,y,z) vector, of length equal to the longest of x y or z,
containing the maximum of x y or z for the ith position in
eachpmin(x,y,z) vector, of length equal to the longest of x y or z,
containing the minimum of x y or z for the ith position in each
rowSums(x) row totals of dataframe or matrix x
colSums(x) column totals of dataframe or matrix x
functions
Geometric mean (p.49)
geometric<-function (x)
exp(mean(log(x)))
Harmonic mean (p.51)
harmonic<-function (x)
  1/mean(1/x)
Exercises
Finding the value in a vector that is closest to a specified value
closest<-function(xv,sv){
  xv[which(abs(xv-sv)==min(abs(xv-sv)))]
}

Calculate a trimmed mean of x which ignores both the
smallest and largest values

trimmed.mean <- function (x) {
  mean(x[-c(which(x==min(x)),which(x==max(x)))])
}
Sets
union(x,y)
intersect(x,y)
setdiff(x,y)
setequal(x,y),
is.element(el,set)
Matrices
X<-matrix(c(1,0,0,0,1,0,0,0,1),nrow=3)
dim(X)
is.matrix(X)

vector<-c(1,2,3,4,4,3,2,1)
V<-matrix(vector,byrow=T,nrow=2)
dim(vector) <- c(2,4)
Matrices
X<-rbind(X,apply(X,2,mean))
X<-cbind(X,apply(X,1,var))
sweep
matdata<-read.table("datasweepdata.txt")
cols<-apply(matdata,2,mean)
sweep(matdata,2,cols)
lists
person <- list()
person$name <- "Alberto”
person$age <- 37
person$nationality <- "Spain“
class(persona)
[1] "list"

> persona
$name
[1] "Alberto"

$age
[1] 37

$nationality
[1] "Spain"

names(persona)
[1] “name"       “age"      "nationality"
Strings
phrase<-"the quick brown fox jumps over the lazy dog"
letras <- table(strsplit(phrase,split=character(0)))
numwords<-1+table(strsplit(phrase,split=character(0)))[1]

words <- unlist(strsplit(phrase,split=" "))
words[grep("o",words)]
"fox" %in% unlist(strsplit(phrase,split=" "))
unlist(strsplit(phrase,,split=" ")) %in% c("fox","dog")
Strings
nchar(words)
paste(words[1],words[2])
toupper(words)
Regular expressions
grep("^t", words)
words[grep("^t", words)]
words[grep("s$", words)]
gsub("o","O",words)
regexp()
Dataframes
lista <- data.frame()
lista[1,1] = "Alberto"
lista[1,2] = 37
lista[2,1] = "Ana"
lista[2,2] = 23
names(lista) <- c("Ana", "Edad")
Missing values
NA (is.na)
x<-c(1:8,NA)
mean(x)
mean(x,na.rm=T)
which(is.na(x))
as.vector(na.omit(x))
x[!is.na(x)]
Dates and Times in R
date()
date<- as.POSIXlt(Sys.time())
unlist(unclass(date))
difftime()
excel.dates <- c("27/02/2004", "27/02/2005",
"14/01/2003“,"28/06/2005", "01/01/1999")
strptime(excel.dates,format="%d/%m/%Y")
Testing and Coercing in R
if
if (y > 0) print(1) else print (-1)
z <- ifelse (y < 0, -1, 1)
Loops and Repeats
for (i in 1:10) print(i^2)


t = 1
while(t<=10) {
            print(i^2)
            i <- i + 1
}


t = 1
repeat {
    if (i > 10)break
            print(i^2)
            i <- i + 1
        }
Exercise
Compute the Fibonacci series 1, 1, 2, 3, 5, 8

    fibonacci<-function(n) {
               a<-1
               b<-0
               while(n>0)
               {swap<-a
               a<-a+b
               b<-swap
               n<-n-1 }
    b }
Avoid loops
x<-runif(10000000)
system.time(max(x))


pc<-proc.time()
cmax<-x[1]
for (i in 2:length(x)) {
    if(x[i]>cmax) cmax<-x[i]
}
proc.time()-pc
switch

central<-function(y, measure) {
    switch(measure,
    Mean = mean(y),
    Geometric = exp(mean(log(y))),
    Harmonic = 1/mean(1/y),
    Median = median(y),
    stop("Measure not included"))

}
Quantitative
Data Analysis

Working with datasets
Help for Datasets
To list built-in datasets:

data()
data(package = .packages(all.available = TRUE))
data(swiss)

For help on a dataset: help(swiss)
 “Standardized fertility measure and socio-economic indicators for
each of 47 French-speaking provinces of Switzerland at about 1888.”
The attach Command
To access individual variables, do this:
> attach(swiss)
Now try:
> mean(Fertility)
> detach(swiss)
Using R Functions: Simple Stuff

rownames(swiss)
colnames(swiss)
•   summary(swiss)
Applying functions
    mean(swiss$Fertility)
    sd(swiss$Fertility)
    apply(swiss,2,max)
Factors
class(Detergent)
nlevels(Detergent)
levels(Detergent)
as.factor()
Working with your dataset
fix(swiss)
hist(Agriculture)
plot(Catholic,Fertility)
Working with your own datasets

write.table(swiss, "swiss.txt")
swiss2 <- read.table("swiss.txt")

data<-
read.table(file.choose(),header=T)

readLines()
Reading data from files
read.table(file) reads a file in table format and creates a data frame
from it; the default separator sep="" is any whitespace; use
header=TRUE to read the first line as a header of column names; use
as.is=TRUE to prevent character vectors from being converted to
factors; use comment.char="" to prevent "#" from being interpreted
as
a comment; use skip=n to skip n lines before reading data; see the
help for options on row naming, NA treatment, and others
read.csv("filename", header=TRUE) id. but with defaults set for
reading comma-delimited files
read.delim("filename", header=TRUE) id. but with defaults set
for reading tab-delimited files
read.fwf(file,widths)
read a table of f ixed width f ormatted data into a ’data.frame’; widths
is an integer vector, giving the widths of the fixed-width fields
Example
data<-
read.table(".datadaphnia.txt",header=T)
names(data)
attach(data)
table(Detergent)
tapply(Growth.rate,Detergent,mean)
aggregate(Growth.rate,list(Detergent), mean)
tapply(Growth.rate,list(Water,Daphnia),media
n)
with(data,boxplot(Growth.rate ~ Detergent))

Introduction to R programming