- Gentle introduction to:
- Programming languages
- R for data science
- Provide a foundation for the course of Gerko Vink’s course
Introduction to R and RStudio
install.packages("ggplot2") #Install new package (you only need to do it once)
library(ggplot2) #Load the package
Write your R code (load data, clean it, model it, etc)
All the variables that you have defined
File explorer, find your files.
Get information about code (super useful!)
Write R code (not recommended at this point) and see the output of your R scripts
See the plots, and export it
History of all the code you have run.
All packages that you have loaded (I don’t recommend loading/unloading packages this way)
Run commands on your terminal (this is not R, you won’t need to use this)
../somefile.csv”: find “somefile.csv” one level down../../somefile.csv”: find “somefile.csv” two levels down./somefile.csv”: find “somefile.csv” in the current level (not so useful, it is identical to “somefile.csv”)~/somefile.csv”: find “somefile.csv” in your home directoryTell the computer to save an object (a number, a string, a spreadsheet) with a name.
Creating variables in R is very straightforward:
<- (assignment operator)For example, if you assign the value 100 (an element) to variable a, you would type
a <- 100 print(a)
## [1] 100
character: “some text”numeric: e.g., 2.1integer: e.g., 2Llogical: TRUE/FALSEfactor: e.g., factor(“amsterdam”)vector: c(2, 4, 2)list: list(first_col = 1, second = “a”, third = TRUE)matrix: matrix(c(4, 4, 4, 4), nrow = 2, ncol = 2)data.frame: The most important ~ spreadsheetEverything that is published on the Comprehensive R Archive Network (CRAN) and is aimed at R users, must be accompanied by a help file.
If you know the name of the function that performs an operation, e.g. anova(), then you just type ?anova or help(anova) in the console, or use the “Help” menu.
If you do not know the name of the function: type ?? followed by your search criterion. For example ??anova returns a list of all help pages that contain the word ‘anova’
Alternatively, the internet will tell you almost everything you’d like to know and sites such as http://www.stackoverflow.com and http://www.stackexchange.com, as well as Google and LLM can be of tremendous help.
R related issues; use ‘R:’ as a prefix in your search termYou just use type the name you have given to the object
For example, we assigned the value 100 to object a.
a <- 100
To call object a, we would type
a
## [1] 100
# This is a comment, it won't be read by R
student_number <- 4
paste("The number of students is: ", student_number, sep = " ")
## [1] "The number of students is: 4"
#sep can be any character, or "\n" (newline), "\t" (tab),
# install.packages("tidyverse") #installing packages
library(readr) #loading the library to read csv, usually on top of the file
# Using the readr library (the readr:: is optional, but useful when the function)
data <- readr::read_csv("../common_datasets/dataset_boys.csv", col_select = c("age","hgt"))
## Rows: 748 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): age, hgt
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Summary statistics
summary(data)
## age hgt
## Min. : 0.035 Min. : 50.00
## 1st Qu.: 1.581 1st Qu.: 84.88
## Median :10.505 Median :147.30
## Mean : 9.159 Mean :132.15
## 3rd Qu.:15.267 3rd Qu.:175.22
## Max. :21.177 Max. :198.00
## NA's :20
Goal: Get used to RStudio using R as a calculator, and install one library