My personal notes from DataCamp’s course
The assertive
package comes with several functions to validate arguments within functions.
library(assertive)
calc_harmonic_mean <- function(x, na.rm = FALSE) {
assert_is_numeric(x)
if(any(is_non_positive(x), na.rm = TRUE)) {
stop("x contains non-positive values, so the harmonic mean makes no sense.")
}
na.rm <- coerce_to(use_first(na.rm), target_class = "logical")
print(1 / mean(1 / x, na.rm = na.rm))
}
calc_harmonic_mean(1:5, na.rm = 1:5)
## Warning: Only the first value of na.rm (= 1) will be used.
## Warning: Coercing use_first(na.rm) to class 'logical'.
## [1] 2.189781
Functions can only return one value. If you want to return multiple things, then you can store them all in a list.
If users want to have the list items as separate variables, they can assign each list element to its own variable using zeallot
’s multi-assignment operator, %<-%
.
Create model object to use in the examples:
suppressPackageStartupMessages(library(dplyr))
snake_river_visits <- readRDS('data/snake_river_visits.rds')
model <- lm(n_visits ~ gender + income + travel, snake_river_visits)
Returning a list a spliting items via %<-% operator:
library(broom)
library(zeallot)
groom_model <- function(model) {
list(
model = glance(model),
coefficients = tidy(model),
observations = augment(model)
)
}
# Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model)
# See these individual variables
mdl; cff; obs
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.224 0.210 43.3 16.3 1.61e-16 7 -1791. 3599. 3630.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 61.5 7.83 7.86 5.06e-14
## 2 genderfemale 10.8 4.94 2.19 2.88e- 2
## 3 income($25k,$55k] -1.95 7.74 -0.251 8.02e- 1
## 4 income($55k,$95k] -19.3 8.01 -2.42 1.63e- 2
## 5 income($95k,$Inf) -18.6 7.47 -2.49 1.32e- 2
## 6 travel(0.25h,4h] -26.6 6.00 -4.44 1.24e- 5
## 7 travel(4h,Infh) -45.1 6.30 -7.16 5.06e-12
## # A tibble: 346 x 12
## .rownames n_visits gender income travel .fitted .se.fit .resid .hat .sigma
## <chr> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 25 2 female ($95k… (4h,I… 8.67 5.79 -6.67 0.0179 43.4
## 2 26 1 female ($95k… (4h,I… 8.67 5.79 -7.67 0.0179 43.4
## 3 27 1 male ($95k… (0.25… 16.3 5.35 -15.3 0.0153 43.4
## 4 29 1 male ($95k… (4h,I… -2.18 4.79 3.18 0.0122 43.4
## 5 30 1 female ($55k… (4h,I… 7.95 6.55 -6.95 0.0229 43.4
## 6 31 1 male [$0,$… [0h,0… 61.5 7.83 -60.5 0.0326 43.3
## 7 33 80 female [$0,$… [0h,0… 72.4 7.39 7.61 0.0291 43.4
## 8 34 104 female ($95k… [0h,0… 53.8 6.35 50.2 0.0215 43.3
## 9 35 55 male ($25k… (0.25… 33.0 5.57 22.0 0.0165 43.4
## 10 36 350 female ($25k… [0h,0… 70.4 6.35 280. 0.0215 40.6
## # … with 336 more rows, and 2 more variables: .cooksd <dbl>, .std.resid <dbl>
Sometimes you want the return multiple things from a function, but you want the result to have a particular class (for example, a data frame or a numeric vector), so returning a list isn’t appropriate. This is common when you have a result plus metadata about the result. (Metadata is “data about the data”. For example, it could be the file a dataset was loaded from, or the username of the person who created the variable, or the number of iterations for an algorithm to converge.)
pipeable_plot <- function(data, formula) {
plot(formula, data)
attr(data, 'formula') <- formula
invisible(data)
}
plt_dist_vs_speed <- cars %>%
pipeable_plot(dist ~ speed)
str(plt_dist_vs_speed)
## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
## - attr(*, "formula")=Class 'formula' language dist ~ speed
## .. ..- attr(*, ".Environment")=<environment: 0x5590dd8f7c88>