Rows: 748 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): gen, phb, reg
dbl (6): age, hgt, wgt, bmi, hc, tv
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Practical C
Exercises
In this practical we will go over reproducibility, control flow (if-else statements and for loops), and functions.
First, create a new project (File -> New Project), install renv
and initialize the dependency management (renv::init()
).
Then, create a new notebook for this homework (File -> New File -> Quarto document -> Document)
In the following exercises we will use the same dataset as last time, boys.
You need to download dataset_boys.csv
from here and add it to a folder “data” in the same folder as your markdown file. In t
Then, read the file dataset_boys.csv using the code in the next cell.
Code
#install.packages(c("readr", "stringr), repos = "http://cran.us.r-project.org")
library(readr)
library(stringr)
# Reading the file in the background
<- readr::read_delim("data/dataset_boys.csv",delim=",")
boys # Keep only a few columns
<- boys[, c("age","wgt","bmi")]
boys # Drop missing values
<- na.omit(boys) boys
Exercise 1-3: Control-flow
- Create a
for
-loop that loops over all numbers between 0 and 10, but only prints numbers 3, 4, and 5.
Code
for (i in 0:10) {
if (i %in% 3:5) {
print(i)
} }
[1] 3
[1] 4
[1] 5
- Try to do the same thing without a for-loop, by subsetting a vector from 0 to 10 directly.
Code
<- 0:10
num >= 3 & num <=5] num[num
[1] 3 4 5
Exercise 3–5: Functions
- Create a function that calculates the sample standard deviation. Validate it by comparing the output to the function
sd()
. Write documentation for such function.
\[\sigma = \sqrt{\frac{1}{N-1}\sum_{i=0}^N{(x_i - \bar{x})^2}}\]
Code
#' Calculate the standard deviation of a
#' \sigma = \sqrt{\frac{1}{N-1}\sum_{i=0}^N{(x_i - \bar{x})^2}}$$
#'
#' @param values A numeric vector
#' @returns STD of the vector
#' @examples
#' std_own(c(3,5,4))
<- function(values) {
std_own <- sqrt(1/(length(values)-1)*sum((values - mean(values))^2))
s return(s)
}
std_own(c(1,2,3,3))
[1] 0.9574271
Code
sd(c(1,2,3,3))
[1] 0.9574271
- Use a for loop to apply the function to the columns
c("age","hgt")
of the datasetboys
. Boys is a tibble (more on this later today), you’ll need to use the notationboys[["age"]]
to extract the vector of values of the column.
Code
for (col in c("age","wgt")) {
print(col)
print(std_own(boys[[col]]))
}
[1] "age"
[1] 6.876048
[1] "wgt"
[1] 26.04846
- Redo exercise 5 using apply, sapply or lapply.
Code
#Returns vector
apply(boys[, c("age","wgt")], MARGIN=2, FUN=std_own)
age wgt
6.876048 26.048460
Code
#Returns vector
sapply(boys[, c("age","wgt")], FUN=std_own)
age wgt
6.876048 26.048460
Code
#Returns list
lapply(boys[, c("age","wgt")], FUN=std_own)
$age
[1] 6.876048
$wgt
[1] 26.04846
Exercise 6–7: Dependency files and a more complicated example
- Export your dependency files (
renv::snapshot()
). Which files were created?
Code
::snapshot() renv
- The lockfile is already up to date.
Code
#Look inside renv.lock and .Rprofile
- Find all libraries used in the .Rmd files of https://github.com/jgarciab/R. Steps:
- Download the materials;
- Use
list.files(path=???, pattern=???, full.names=TRUE, recurseive=TRUE)
to find all Rmd files; - Use
sapply
andreadLines
to read all files; - Use
lapply
andstringr::str_match_all
to find the patternlibrary\\(.*\\)
.
Code
# Find the name of the files (you'll have to adjust the path)
<- list.files(path="../..", pattern="*qmd", full.names=TRUE, recursive=TRUE)
files
# Read files
<- sapply(files, readLines)
files_list
# Find the pattern in all files
<- lapply(files_list,
files_list function(x) str_match_all(x, "library\\(.*\\)"))
#Create a (named) vector
<- unlist(files_list)
files_list
# Print pattern
files_list
../../Material/Part B - Data types and structures/Practical_B_walkthrough.qmd1
"library(readr)"
../../Material/Part B - Data types and structures/Practical_B_walkthrough.qmd2
"library(dplyr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd1
"library(readr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd2
"library(stringr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd3
"library(readr)"
../../Material/Part C - Programming/Practical_C_walkthrough.qmd4
"library(stringr)"
- If you are done, try to upload your files to github.. Steps:
- Create an account.
- Create a new repository.
- Upload files.
If interested, try to set up git in your RStudio (https://support.posit.co/hc/en-us/articles/200532077)