Author

Javier Garcia-Bernardo

Published

November 27, 2023

Exercises

In this practical we will go over reproducibility, control flow (if-else statements and for loops), and functions.

First, create a new project (File -> New Project), install renv and initialize the dependency management (renv::init()).

Then, create a new notebook for this homework (File -> New File -> Quarto document -> Document)

In the following exercises we will use the same dataset as last time, boys.

You need to download dataset_boys.csv from here and add it to a folder “data” in the same folder as your markdown file. In t

Then, read the file dataset_boys.csv using the code in the next cell.

Rows: 748 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): gen, phb, reg
dbl (6): age, hgt, wgt, bmi, hc, tv

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
#install.packages(c("readr", "stringr), repos = "http://cran.us.r-project.org")
library(readr)
library(stringr)

# Reading the file in the background
boys <- readr::read_delim("data/dataset_boys.csv",delim=",")
# Keep only a few columns
boys <- boys[, c("age","wgt","bmi")]
# Drop missing values
boys <- na.omit(boys)

Exercise 1-3: Control-flow

  1. Create a for-loop that loops over all numbers between 0 and 10, but only prints numbers 3, 4, and 5.
Code
for (i in 0:10) {
  if (i %in% 3:5) {
    print(i)
  }
}
[1] 3
[1] 4
[1] 5
  1. Try to do the same thing without a for-loop, by subsetting a vector from 0 to 10 directly.
Code
num <- 0:10
num[num >= 3 & num <=5]
[1] 3 4 5

Exercise 3–5: Functions

  1. Create a function that calculates the sample standard deviation. Validate it by comparing the output to the function sd(). Write documentation for such function.

\[\sigma = \sqrt{\frac{1}{N-1}\sum_{i=0}^N{(x_i - \bar{x})^2}}\]

Code
#' Calculate the standard deviation of a 
#' \sigma = \sqrt{\frac{1}{N-1}\sum_{i=0}^N{(x_i - \bar{x})^2}}$$
#' 
#' @param values A numeric vector
#' @returns STD of the vector
#' @examples
#' std_own(c(3,5,4))
std_own <- function(values) {
  s <- sqrt(1/(length(values)-1)*sum((values - mean(values))^2))
  return(s)
}

std_own(c(1,2,3,3))
[1] 0.9574271
Code
sd(c(1,2,3,3))
[1] 0.9574271
  1. Use a for loop to apply the function to the columns c("age","hgt") of the dataset boys. Boys is a tibble (more on this later today), you’ll need to use the notation boys[["age"]] to extract the vector of values of the column.
Code
for (col in c("age","wgt")) {
  print(col)
  print(std_own(boys[[col]]))
}
[1] "age"
[1] 6.876048
[1] "wgt"
[1] 26.04846
  1. Redo exercise 5 using apply, sapply or lapply.
Code
#Returns vector
apply(boys[, c("age","wgt")], MARGIN=2, FUN=std_own)
      age       wgt 
 6.876048 26.048460 
Code
#Returns vector
sapply(boys[, c("age","wgt")], FUN=std_own)
      age       wgt 
 6.876048 26.048460 
Code
#Returns list
lapply(boys[, c("age","wgt")], FUN=std_own)
$age
[1] 6.876048

$wgt
[1] 26.04846

Exercise 6–7: Dependency files and a more complicated example

  1. Export your dependency files (renv::snapshot()). Which files were created?
Code
renv::snapshot()
- The lockfile is already up to date.
Code
#Look inside renv.lock and .Rprofile
  1. Find all libraries used in the .Rmd files of https://github.com/jgarciab/R. Steps:
  • Download the materials;
  • Use list.files(path=???, pattern=???, full.names=TRUE, recurseive=TRUE) to find all Rmd files;
  • Use sapply and readLines to read all files;
  • Use lapply and stringr::str_match_all to find the pattern library\\(.*\\).
Code
# Find the name of the files (you'll have to adjust the path)
files <- list.files(path="../..", pattern="*qmd", full.names=TRUE, recursive=TRUE)

# Read files
files_list <- sapply(files, readLines)

# Find the pattern in all files
files_list <- lapply(files_list, 
                     function(x) str_match_all(x, "library\\(.*\\)"))

#Create a (named) vector 
files_list <- unlist(files_list)

# Print pattern
files_list
../../Material/Part B - Data types and structures/Practical_B_walkthrough.qmd1 
                                                              "library(readr)" 
../../Material/Part B - Data types and structures/Practical_B_walkthrough.qmd2 
                                                              "library(dplyr)" 
              ../../Material/Part C - Programming/Practical_C_walkthrough.qmd1 
                                                              "library(readr)" 
              ../../Material/Part C - Programming/Practical_C_walkthrough.qmd2 
                                                            "library(stringr)" 
              ../../Material/Part C - Programming/Practical_C_walkthrough.qmd3 
                                                              "library(readr)" 
              ../../Material/Part C - Programming/Practical_C_walkthrough.qmd4 
                                                            "library(stringr)" 
  1. If you are done, try to upload your files to github.. Steps:
  • Create an account.
  • Create a new repository.
  • Upload files.

If interested, try to set up git in your RStudio (https://support.posit.co/hc/en-us/articles/200532077)