Rows: 748 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): gen, phb, reg
dbl (6): age, hgt, wgt, bmi, hc, tv
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Practical C
Exercises
In this practical we will go over reproducibility, control flow (if-else statements and for loops), and functions.
First, create a new project (File -> New Project), install renv and initialize the dependency management (renv::init()).
Then, create a new notebook for this homework (File -> New File -> Quarto document -> Document)
In the following exercises we will use the same dataset as last time, boys.
You need to download dataset_boys.csv from here and add it to a folder “data” in the same folder as your markdown file. In t
Then, read the file dataset_boys.csv using the code in the next cell.
Code
#install.packages(c("readr", "stringr), repos = "http://cran.us.r-project.org")
library(readr)
library(stringr)
# Reading the file in the background
boys <- readr::read_delim("data/dataset_boys.csv",delim=",")
# Keep only a few columns
boys <- boys[, c("age","wgt","bmi")]
# Drop missing values
boys <- na.omit(boys)Exercise 1-3: Control-flow
- Create a
for-loop that loops over all numbers between 0 and 10, but only prints numbers 3, 4, and 5.
Code
for (i in 0:10) {
if (i %in% 3:5) {
print(i)
}
}[1] 3
[1] 4
[1] 5
- Try to do the same thing without a for-loop, by subsetting a vector from 0 to 10 directly.
Code
num <- 0:10
num[num >= 3 & num <=5][1] 3 4 5
Exercise 3–5: Functions
- Create a function that calculates the sample standard deviation. Validate it by comparing the output to the function
sd(). Write documentation for such function.
\[\sigma = \sqrt{\frac{1}{N-1}\sum_{i=0}^N{(x_i - \bar{x})^2}}\]
Code
#' Calculate the standard deviation of a
#' \sigma = \sqrt{\frac{1}{N-1}\sum_{i=0}^N{(x_i - \bar{x})^2}}$$
#'
#' @param values A numeric vector
#' @returns STD of the vector
#' @examples
#' std_own(c(3,5,4))
std_own <- function(values) {
s <- sqrt(1/(length(values)-1)*sum((values - mean(values))^2))
return(s)
}
std_own(c(1,2,3,3))[1] 0.9574271
Code
sd(c(1,2,3,3))[1] 0.9574271
- Use a for loop to apply the function to the columns
c("age","hgt")of the datasetboys. Boys is a tibble (more on this later today), you’ll need to use the notationboys[["age"]]to extract the vector of values of the column.
Code
for (col in c("age","wgt")) {
print(col)
print(std_own(boys[[col]]))
}[1] "age"
[1] 6.876048
[1] "wgt"
[1] 26.04846
- Redo exercise 5 using apply, sapply or lapply.
Code
#Returns vector
apply(boys[, c("age","wgt")], MARGIN=2, FUN=std_own) age wgt
6.876048 26.048460
Code
#Returns vector
sapply(boys[, c("age","wgt")], FUN=std_own) age wgt
6.876048 26.048460
Code
#Returns list
lapply(boys[, c("age","wgt")], FUN=std_own)$age
[1] 6.876048
$wgt
[1] 26.04846
Exercise 6–7: Dependency files and a more complicated example
- Export your dependency files (
renv::snapshot()). Which files were created?
Code
renv::snapshot()- The lockfile is already up to date.
Code
#Look inside renv.lock and .Rprofile- Find all libraries used in the .Rmd files of https://github.com/jgarciab/R. Steps:
- Download the materials;
- Use
list.files(path=???, pattern=???, full.names=TRUE, recurseive=TRUE)to find all Rmd files; - Use
sapplyandreadLinesto read all files; - Use
lapplyandstringr::str_match_allto find the patternlibrary\\(.*\\).
Code
# Find the name of the files (you'll have to adjust the path)
files <- list.files(path="../..", pattern="*qmd", full.names=TRUE, recursive=TRUE)
# Read files
files_list <- sapply(files, readLines)
# Find the pattern in all files
files_list <- lapply(files_list,
function(x) str_match_all(x, "library\\(.*\\)"))
#Create a (named) vector
files_list <- unlist(files_list)
# Print pattern
files_list../../Material/Part B - Data types and structures/Practical_B_walkthrough.qmd1
"library(readr)"
../../Material/Part B - Data types and structures/Practical_B_walkthrough.qmd2
"library(dplyr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd1
"library(readr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd2
"library(stringr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd3
"library(readr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd4
"library(stringr)"
- If you are done, try to upload your files to github.. Steps:
- Create an account.
- Create a new repository.
- Upload files.
If interested, try to set up git in your RStudio (https://support.posit.co/hc/en-us/articles/200532077)