Yesterday we applied some unsupervised machine learning algorithms,
today we will be applying two types of supervised machine learning
algorithms: Support Vector Machines (SVM) and Random Forests (RF). We
will continue with the cattle movement data that we worked with the last
2 days. Because we had the column label in the dataset,
which contained the annotated behaviour based on direct visual
observation, we have a labelled dataset and thus possibility to use
supervised learning.
Recall from the lecture on the data science workflow that in
predictive modelling it is necessary to split the data into train and
validation set. This can be done randomly, but this should only be done
with great care. As we are dealing with time series data on individual
animals, which generally have high degrees of temporal autocorrelation,
randomly splitting the data into train and validation sets is not making
sure that both sets are independent! In tutorial 9 we called
non-independent train and validation sets a form of data
leakage, in which by mistake information is shared between the
training and validation data sets. In general, by dividing a set of data
into training and validation sets, the goal is to ensure that no
data is shared between the two.
Thus, it may be better to keep all the data of a specific individual
together, and allocate all data from a single individual (or
time-series) to either the train dataset or the validation dataset. Such
partitioning of the data into train and validation sets can be done
once, or using a k-fold cross validation approach. We will use
both here.
Exercise 13.1a
Download the file dat_ts2.rds from Brightspace >
Time series analyses > datasets > cattle, and store in an
appropriate folder (e.g. data / processed / cattle /). Load the
file into an object called dat. Note that it is an .rds
file, which can be read into R with the function read_rds
(see tutorial 1). Also load (and install if needed) the packages
tidyverse, tidymodels and kernlab.
library(tidyverse)
library(tidymodels)
library(kernlab)
dat <- readRDS("data/processed/cattle/dat_ts2.rds")
dat
The data dat does not have the column label
like we had in the previous 2 tutorials; here it is reclassified to the
new column isGrazing, which is a factor with 2
levels indicating whether or not the cows were grazing.
For your information, the dataset loaded above was created from the
dataset “dat_ts1.rds” loaded yesterday via the following chain of dplyr
functions:
The rsample package, part of tidymodels,
offers functions to split the data into different sets (e.g. for
training and testing). In tutorial 9 we already saw the simple
initial_split function that partitions the data randomly
into 2 parts. However, because this function randomly allocates records
to either of the datasets, and not a grouped set of records
(e.g. belonging to the same individual, or timeseries thereof), we can
better not use this function here.
Exercise 13.2a
Check the helpfile of the group_initial_split function and
create a train-test split based on grouping variable
IDburst, where you specify that ca 75% of the data should
be assigned to the training dataset, and the rest to the test dataset.
Assign the result to an object called main_split and check
the object by printing it to the console
main_split <- group_initial_split(dat, group = "IDburst", prop = 3/4)
main_split
## <Training/Testing/Total>
## <1551/608/2159>
We now have splitted the data in 2 parts: 1 part can be used for
model development (here ca 75%) , whereas the other part (here ca 25%)
can be used for testing the model predictions. In a data science project
involving machine learning, model development is often done
using k-fold cross validation, where both trainable parameters
are learned but also hyperparameters are tuned. To do k-fold
cross validation with groups, we can use the group_vfold_cv
function, which is a function that creates splits of the data based on
some grouping variable in a k-fold (here called
V-fold) cross-validation approach.
Exercise 13.2b
Check the helpfile of the group_vfold_cv function. Retrieve
the training data from the object main_split
created above (remember from earlier that the function
training can be used here), and with the training data
create a train-validation split with 5 folds based on grouping variable
IDburst. Assign the result to an object called
dat_split and check the object by printing it to the
screen.
dat_split <- group_vfold_cv(training(main_split), group = IDburst, v = 5)
dat_split
You’ll see that the output of group_vfold_cv is a
tibble, here with 5 rows (as we set it to 5 folds), in which the first
column, splits, is a named list with split data. it shows
the number of records in the train dataset, and in the validation
dataset, for each fold. Using the function analysis, we can
extract the data to be used for training, whereas with the
assessment function we can extract the data that should be
used for validation. To retrieve the first record/fold of the data
splits, we can use dat_split$splits[[1]].
Exercise 13.2c
Extract the train and validation data for the first fold, and assign
these to the objects dat_train and dat_val,
respectively. Print the objects to the console so you can inspect them.
Using the rsample package we have splitted the data into
k folds used for training and validation, now we will proceed
and use the recipes package to set up a data recipe.
Recall from tutorial 9 that a data recipe is a description of what steps
should be applied to a data set in order to get it ready for data
analysis. Most steps in the recipe are defined using functions that
start with the prefix step_.
Exercise 13.3a
Create a data recipe called dataRecipe that:
takes the training data as input;
specifies that the formula for the analyses is
isGrazing ~ .;
removes the column IDburst from the data;
centers all predictors;
scales all predictors;
prepares the data recipe using the training data.
Create the recipe with these steps in the order shown above.
To specify a formula, use the function recipe; to remove
columns use step_rm; to center use
step_center; to scale use step_scale. To
prepare the data recipe use the prep function (this will
then automatically use the data passed on to the recipe
function for training the data recipe (thus, here the
training data!). Put all steps behind each other in one
sequences using pipes.
Now that we have splitted the data into train and validation set, and
now that we’ve set up and prepared a data recipe, let’s
bake some data: recipes metaphor for apply the
preprocessing computations to new data.
Exercise 13.4a
Bake the dataset dat_train and do the same for the
dat_val dataset. Assign the results to objects called
baked_train and baked_val, respectively.
Now that all data is properly pre-processed we can proceed to train a
model/algorithm. Recall from tutorial 9 that a model is trained using
the parsnip package, by first defining the
model, then specifying the engine that will execute the model
estimation for us, and finally fitting the model. Which models
currently are supported and via which function they should be defined
can be seen
here.
We will focus on a Support Vector Machine, SVM, and specifically a SVM
with a Gaussian radial basis function.
A SVM with a Gaussian radial basis function can be specified in the
parsnip package via the svm_rbf function. For a
classification problem (i.e. our machine learning task of classifying
the annotated behaviour), it takes 3 arguments: mode which
should be set to "classification", as well as the
hyperparameters cost and rbf_sigma. Check the
helpfile for information on these parameters.
Exercise 13.5a
Fit a SVM to the train data baked_train. Do this by
first specifying the model: specify it to be a classification mode, with
cost parameters set to value 1.0, and
rbf_sigma parameter set to value 0.001 Then, set the engine
to "kernlab". Finally, fit the model using the formula
isGrazing ~ . (i.e. isGrazing is the response,
and all other columns are predictors) with data
baked_train. Assign the fitted model to an object called
dat_SVMfit.
NOTE: If you have problems installing the tidymodels
package, you can do this exercise a bit different: in the box below this
exercise there is information on how to fit the SVM to the data directly
using the kernlab package, thus without the use of the
interface that parsnip supplies.
Specify the model using svm_rbf, then set the engine
using set_engine, and fit the model using
fit.
## parsnip model object
##
## Support Vector Machine object of class "ksvm"
##
## SV type: C-svc (classification)
## parameter : cost C = 1
##
## Gaussian Radial Basis kernel function.
## Hyperparameter : sigma = 0.001
##
## Number of Support Vectors : 319
##
## Objective Function Value : -265.9519
## Training error : 0.084979
## Probability model included.
6 Making predictions
With the fitted model stored in object dat_SVMfit, we
can now predict the model on the validation dataset
baked_val using the predict function. The
function by default returns the predicted class, but we can also
retrieve the predicted classes and probabilities by adding the function
argument type="prob" (then fitted using parsnip; the column
names will then be .pred_FALSE and
.pred_TRUE), or type = "probabilities" (when
fitted directly using kernlab; the column names will then be
FALSE and TRUE).
Exercise 13.6a
Predict the fitted SVM on the baked_val validation dataset:
predict the class probabilities, and save to an object
dat_preds. Also add the columns of dat_val
using the function bind_cols. Because we predict
probabilities, and because our response variable isGrazing
is a factor with levels "TRUE" and
"FALSE", we get as predictions the columns
".pred_FALSE" and ".pred_TRUE".
Let’s also add not just the predicted probabilities of both classes,
but the predicted class itself. You can do so by adding the
prediction column using :
dat_preds <- dat_preds %>%
bind_cols(predict(dat_SVMfit, baked_val, type = "class"))
Because here we predict the response variable isGrazing,
which is a factor with levels "TRUE" and
"FALSE", the newly created column “.pred_class” also is a
factor with these levels!
dat_preds <- dat_preds %>%
bind_cols(predict(dat_SVMfit, baked_val, type = "class"))
dat_preds
After model fitting and prediction using parsnip, we can use
the functions from the package yardstick to compute
model evaluation metrics. The metrics function can be used
for this, taking input arguments truth and
estimate (which should be the predicted class, and not the
predicted class probability!). For example, the following code will
compute model evaluation metrics based on a classification problem:
Exercise 13.7a
Compute the metrics for the fitted model, using column “isGrazing” as
the true label, and the column with the predicted class as the estimate.
The results are estimates for the accuracy and
kappa statistic for the model prediction. The accuracy is quite
high: around 75% of the labels have been correctly predicted! To see how
the predictions are for both classes, we can compute the confusion
matrix the conf_mat function:
We see that most of the records where the cattle did or
did not graze were correctly classified as such. In this case,
207 of the 212 records were the cattle were in fact grazing were
correctly classified as such (recall: ca 98%), whereas 207 of
the 298 records were cattle were predicted to be grazing they were in
indeed also doing that (precision: ca 71%). As mentiond in the
lectures, instead of simply being interested in accuracy only,
we are often interested in these precision and recall
metrics of model predictions. See
here
for a good overview of what these (and other) measures mean (reading
material for the lecture on the data science workflow). the yardstick
package offers the function pr_curve to compute the
precision and recall along a gradient in threshold values.
Exercise 13.7b
Compute the precision and recall using the function
pr_curve. Do this based on the data dat_preds
(in 13.6 you have added the class predictions), using
isGrazing as the true label, and .pred_TRUE
(or TRUE if you did not use parsnip for model fitting but
rather kernlab directly) as the predicted value, and set the
“event_level” argument to value “second” (namely, from FALSE and TRUE,
the second value is the focus). Assign the result to an object
called PR, inspect it by printing it to the screen, and
plot PR using the autoplot function.
A very helpful performance measure is the F1 score: it is the
weighted average of precision and recall. Therefore, this score takes
both false positives and false negatives into account. Intuitively it is
not as easy to understand as accuracy, but F1 is usually more useful
than accuracy, especially if you have an uneven class distribution. F1
can be computed using:
F1=2∗precision∗recallprecision+recall
Exercise 13.7c
Add the F1 score as a new column F1 to PR.
Show the row of data that corresponds to the maximum F1. Classify the
predictions based on the class prediction .pred_TRUE and
the threshold value corresponding tot the max F1 score.
Another helpful metric is the AUC (area under the ROC
curve), which indicates how well the probabilities from the
positive classes are separated from the negative classes (see link to
the metrics above; ROC is the receiver operating
characteristic, see e.g. this
wikipedia page).
Compute the AUC, using the pr_auc function with the data
from dat_preds, where isGrazing is the true
class, and .pred_TRUE as the predicted value, and set the
“event_level” argument to value “second” (namely, from FALSE and TRUE,
the second value is the focus).
first a main binary split based on the IDburst identifier, splitting
the data into a model development set (ca 75% of the data) and
a test dataset (the remaining ca 25% of the data);
and subsequently we splitted the model development dataset further
into V-folds of training and validation sets,
also using IDburst as grouping variable;
we specified and prepared a data recipe, so that we could
prepare (bake) a dataset before model fitting;
we set up a model specification (one with pre-set values for the
hyperparameters cost and rbf_sigma);
we fitted the specified model to the training data;
we predicted the fitted model on the left-out validation
dataset, and evaluated the predictive performance using several
performance metrics.
Where the set of tidymodels tools will really start shining, is when
we start combining multiple steps into a workflow, and then use
the set of tidymodels tools (from the tune and dials
packages) that allows us to efficiently do hyperparameter-tuning!
Before we can do that, let’s first repeat a few steps we’ve been
doing above, yet without actually training the data recipe (using the
prep function) or the model (using the fit
function).
Exercise 13.8a
Create a specification of a data recipe using all steps as done
above (see section 3 above), yet without the prep function.
Assign it to an object called dat_rec.
Create a model specification using all steps as done above (see
section 5 above), yet without the fit function. Assign it
to an object called svm_spec. However, instead of giving
the “cost” and “rbf_sigma” a value, this time only specify that these
will be hyperparameters that need to be tuned! Do so by giving them the
value tune().
# Defining model specification (without fitting it to data)
svm_spec <- svm_rbf(mode = "classification",
cost = tune(),
rbf_sigma = tune()) %>%
set_engine("kernlab")
Exercise 13.8c
Now that you’ve created a data recipe specification
(dat_rec) as well as a model specification
(svm_spec), let’s combine them into a workflow
object: do so using the workflow function to create an
empty workflow, then adding the model specification
svm_spec using the add_model function, and
then add the data recipe specification dat_rec using the
add_recipe function. Combine these into a single object
called svm_wf using the pipe operator %>%.
Inspect the workflow object svm_wf by printing it to the
console.
# Bundle specification of data recipe and model into workflow
svm_wf <- workflow() %>%
add_model(svm_spec) %>%
add_recipe(dat_rec)
svm_wf
You’re almost ready to start tuning the hyperparameters! But first
you’ll need to specify a strategy for the values of the hyperparameters
that you’ll be assessing.
Using the functions rbf_sigma and cost, you
can set the range of values for which the parameters will be evaluated
during tuning. Set parameter “rbf_sigma” to be sampled in a range from
-5 to 0 (this is on a log10 scale by default!), which you can do using
rbf_sigma(range = c(-5,0)). Similarly, specify that the
“cost” parameter should be sampled in a range from -4 to 8 (on a log2
scale!), using cost(range = c(-4,8)).
Use these two hyperparameter at their given scales to set up a
regular grid for hyperparameter tuning: do this using the
grid_regular function where both hyperparameter tuning
specifications (as shown above) are included, and where you will also
need to insert a value for the argument levels, which is
the number of levels each value of each hyperparameter will get. Setting
levels to a high value will lead to many combinations of
hyperparameters that need to be tuned, thus for now set it to a low
value, e.g. 5 (later, if everything runs fine and in a manageable amount
of time, you can increase it to e.g. 10 or higher.
Assign the resultant tuning grid to an object called
svm_grid.
After you’ve created svm_grid, inspect it by printing it to
the console, and count the number of distinct value of
rbf_sigma as well as cost (using the
count function).
Note that in the code below, we use levels = 20, which
will lead to a fine grid for hyperparameter tuning, but also long
computation time!
# Specify grid of hyperparameter values for which to assess fit
svm_grid <- grid_regular(rbf_sigma(range = c(-5,0)),
cost(range = c(-4,8)),
levels = 10)
svm_grid
Now you’re all set to tune the hyperparameters! For this use the
function tune_grid, which takes a workflow
specification (the object svm_wf as created above) as
input, as well as inputs to the arguments resamples (the
data-split object dat_split created above) and
grid (a tuning grid specification such as
svm_grid created above). Save the result to an object
called svm_res.
Note that running the tune_grid can take a long amount
of time (minutes), so this may be an ideal time to stretch your legs,
get some water, tea or coffee, etc.! And, as mentioned above, start with
a low value of the levels argument in the specification of
the tuning grid (in function grid_regular), and only
increase this value if the command runs without error and results in
promising outcomes
# Tune the hyperparameters with the elements prepared above
svm_res <- svm_wf %>%
tune_grid(resamples = dat_split,
grid = svm_grid)
Exercise 13.8f
After having evaluated the model performance for all combinations of
hyperparameter values as listed in the svm_grid object, we
can now extract the model performance metrics using the
collect_metrics function. By default, this function will
compute some performance metrics per combination of hyperparameter
values; which ones depends on the machine learning problem at hand
(classification or regression). Here we are faced with a classification
problem, and the collect_metrics function thus evaluated
model performance using the accuracy and roc_auc. By default, the
collect_metrics function will summarise multiple values of
a performance metric for each combination of hyperparameter values into
a mean value (column mean) and its standard error (column
std_err), with information shown as to how many values were
summarised (column n). In object svm_res, we
will indeed get these summary statistics, as it is based on 5-fold cross
validation (as specified in the data split object
dat_split), since we therefore have 5 performance metric
values per combination of hyperparameter values.
Extract the performance metrics for the svm_res object.
Also, get the combination of hyperparameter values that results in the
highest roc_auc (for this you could either filter the
metrics table to that combination of hyperparameters that maximizes
“roc_auc”, or you could use the function select_best where you pass the
svm_res object as input and additionally set
metric = "roc_auc"). Then, plot the mean value of the
roc_auc metric as function of cost and
rbf_sigma (for this you could opt for
geom_tile or geom_contour_filled).
# Get best set of hyperparameters (in terms of roc_auc)
best_hyperparms <- svm_res %>%
collect_metrics() %>%
filter(.metric == "roc_auc") %>%
filter(mean == max(mean))
best_hyperparms
# Plot the mean roc_auc as function of both hyperparameter values
svm_res %>%
collect_metrics() %>%
filter(.metric == "roc_auc") %>%
ggplot(mapping = aes(x = rbf_sigma, y = cost, z = mean)) +
geom_contour_filled(bins = 10) +
geom_point(data = best_hyperparms,
mapping = aes(x = rbf_sigma, y = cost)) +
scale_y_log10() +
scale_x_log10() +
labs(x = "Sigma", y = "Cost")
The plot above shows the model’s performance (mean ROC_AUC value over
the 5 folds) as function of the hyperparameters cost and sigma. The
plotted point shows the value with the highest performance.
Note that if you tuned the hyperparameters on a coarser grid, then
the plot will be different (coarser).
9 Final fit and predictions
Using the tuned hyperparameters (thus the best combination
of hyperparameters in terms of performance on the validation datasets),
we can now proceed to the last step in a predictive machine learning
pipeline After having established the best values of the
hyperparameters, we need to:
update the specification of the model (by filling in the
tuned values of the hyperparameters);
train the data recipe on the entire model development
dataset (thus: prep the data recipe on the entire
training part of the main_split data
split);
bake the training, as well as the testing data, using the
prepared data recipe;
fit the updated model to the prepared training data;
predict the fitted model onto the prepared testing data;
evaluate the predictive performance of the fitted model on the
testing data.
We could do these steps by repeating the steps done above
(i.e. exercises 13.3 - 13.7) with an updated specification of the model
and an updated definition of the training and testing data (thus: not
the k cross validation folds, but the main train-test data
split).
Alternatively, and more efficiently, we could use several dedicated
tidymodels functions to do this:
function select_best selects the best combination of
hyperparameters from a tuned grid (here from the object
svm_res that holds the results of the
tune_grid function) based on a specified metric that will
be used to sort the models in the grid (e.g., based on the metric
“roc_auc”);
function finalize_model can be used to then update a
model specification with the selected hyperparameters (as done in the
previous step);
function update_model can then be used to update the
model specification in an existing workflow object: it first removes the
existing model specification and then adds the new specification to the
workflow;
function last_fit can then be used to perform the final
model fit on the entire training set (e.g. the entire dataset that was
used during cross-validation) and then predicts the fitted model using
the left out test dataset (that was not used in any of the
steps above during model development!). This last step will thus
quantify the predictive performance of the model! The model
predictions and prediction metrics can then be retrieved using the
collect_metrics and collect_predictions
functions.
For example, the following sequence of operations, with hypothetical
names for the different types of objects (e.g. for the hyperparameter
tuning results, the model specification, the specified workflow and the
specification of a train-test datasplit) will combine all these
steps:
# Get best hyperparameter values based on ROC-AUC
best_hyperparms <- tuning_results %>%
select_best(metric = "roc_auc")
# Update the model-specification using these best hyperparameters
final_model <- model_specification %>%
finalize_model(best_hyperparms)
# Update the workflow
model_workflow_final <- model_workflow %>%
update_model(final_model)
# check the steps above
model_workflow_final
# perform a last fit of the model# based on ALL training data# and predict on the TESTING data
final_fit <- model_workflow_final %>%
last_fit(train_test_split)
# Inspect result
final_fit
# collect metrics
final_fit %>%
collect_metrics()
# get predictions
final_fit %>%
collect_predictions() %>%
conf_mat(truth = isGrazing,
estimate = .pred_class)
Exercise 13.9a
Using the different objects created in the exercises above, select the
best combination of hyperparameters according to ROC-AUC, update the
model specification and workflow, refit the model on the entire model
development dataset (thus, the training part of the
main_split object) and assess its predictive performance on
the testing dataset (show both the prediction metrics, ass well
as the confusion matrix).
# Get best hyperparameter values based on ROC-AUC
best_auc <- svm_res %>%
select_best(metric = "roc_auc")
# Update the model-specification using these best hyperparameters
final_model <- svm_spec %>%
finalize_model(best_auc)
# Update the workflow
svm_wf_final <- svm_wf %>%
update_model(final_model)
# Inspect result
svm_wf_final
# perform a last fit of the model based on ALL the training data, and predict on the TESTING data
svm_wf_final_fit <- svm_wf_final %>%
last_fit(main_split)
# collect metrics
svm_wf_final_fit %>%
collect_metrics()
The selected best hyperparameters are: cost = 16 and rbf_sigma =
0.00599. The model fits the test data very well: the overall
accuracy of prediction is 93.6%!
10 Challenge
Challenge
Above we only have focussed on a kernel SVM using the “kernlab”
implementation as the engine. As a challenge, try using a different
algorithm, e.g. a neural network (e.g. in the “nnet” or “keras”
implementation, or a random forest (e.g. in the “randomForest” or
“ranger” implementation). See
here
for examples. You can focus on mlp for neural networks (a
multilayer perceptron model a.k.a. a single layer, feed-forward neural
network) or rand_forest for random forests,
respectively.
When tuning their hyperparameters, start simple, e.g. with tuning max 2
hyperparameters using a regular grid that is not having many levels.
A random forest could be fitted and tuned as follows;
# Load library for enginelibrary(ranger)
# Specify model
rf_spec <- rand_forest(trees = 1000,
mtry = tune(),
min_n = tune(),
mode = "classification") %>%
set_engine("ranger", seed = 1234567)
# Bundle specification of data recipe and model into workflow
rf_wf <- workflow() %>%
add_model(rf_spec) %>%
add_recipe(dat_rec)
# Specify grid of hyperparameter values for which to assess fit
rf_grid <- grid_regular(mtry(range = c(1,5)),
min_n(range = c(1,10)),
levels = 10)
# Tune the hyperparameters with the elements prepared above
rf_res <- rf_wf %>%
tune_grid(resamples = dat_split,
grid = rf_grid)
# Get best hyperparameter values based on ROC-AUC
rf_best <- rf_res %>%
select_best(metric = "roc_auc")
# Update the model-specification using these best hyperparameters
rf_final <- rf_spec %>%
finalize_model(rf_best)
# Update the workflow
rf_wf_final <- rf_wf %>%
update_model(rf_final)
# check the steps above
rf_wf_final
# perform a last fit of the model# based on ALL training data# and predict on the TESTING data
rf_final_fit <- rf_wf_final %>%
last_fit(main_split)
# Inspect result
rf_final_fit
# collect metrics
rf_final_fit %>%
collect_metrics()
# get predictions
rf_final_fit %>%
collect_predictions() %>%
conf_mat(truth = isGrazing,
estimate = .pred_class)
Tuning and fitting a neural network using Keras could be done
using:
# Load library for enginelibrary(nnet)
# Specify model
nn_spec <- mlp(hidden_units = tune(),
penalty = tune(),
dropout = 0.0,
epochs = 20,
activation = "softmax") %>%
set_engine("nnet", seed = 1234567) %>%
set_mode("classification")
# Bundle specification of data recipe and model into workflow
nn_wf <- workflow() %>%
add_model(nn_spec) %>%
add_recipe(dat_rec)
# Specify grid of hyperparameter values for which to assess fit
nn_grid <- grid_regular(hidden_units(range = c(1,10)),
penalty(range = c(-10,0)),
levels = 10)
# Tune the hyperparameters with the elements prepared above
nn_res <- nn_wf %>%
tune_grid(resamples = dat_split,
grid = nn_grid)
# Get best hyperparameter values based on ROC-AUC
nn_best <- nn_res %>%
select_best(metric = "roc_auc")
# Update the model-specification using these best hyperparameters
nn_final <- nn_spec %>%
finalize_model(nn_best)
# Update the workflow
nn_wf_final <- nn_wf %>%
update_model(nn_final)
# check the steps above
nn_wf_final
# perform a last fit of the model# based on ALL training data# and predict on the TESTING data
nn_final_fit <- nn_wf_final %>%
last_fit(main_split)
# Inspect result
nn_final_fit
# collect metrics
nn_final_fit %>%
collect_metrics()
# get predictions
nn_final_fit %>%
collect_predictions() %>%
conf_mat(truth = isGrazing,
estimate = .pred_class)
11 Submit your last plot
Submit your script file as well as a plot: either your last created
plot, or a plot that best captures your solution to the challenge.
Submit the files on Brightspace via Assessment > Assignments >
Skills day 13.
Note that your submission will not be graded or
evaluated. It is used only to facilitate interaction and to get insight
into progress.
11 Recap
Today, we’ve further explored the tidymodels packages,
including the workflows, dials and tune
packages that we used to perform hyperparameter tuning.
An R script of today’s exercises can be downloaded
here