Overview

Today’s goal: to apply unsupervised learning algorithms on the time series data prepared yesterday.

Resources
Packages

tidyverse lubridate tsibble, feasts, Rtsne momentuHMM

Main functions

as_tsibble, mutate_all, prcomp, Rtsne, kmeans, hclust, fitHMM

Today, we are going to use the data that we prepared and explored yesterday, on the movement and activity of cattle (measured via GPS and accelerometer sensors as well as via visual observations) during an experiment at Carus research farm here in Wageningen, to train some unsupervised learning algorithms. We will be exploring dimension reduction methods (PCA and t-SNE) to find prevailing patterns in the data, and we will apply some clustering methods, e.g. K-means and hierarchical clustering, as well as unsupervised algorithms specific for time series data: Hidden Markov models.

1 Load and preparing the data

We continue with the data we worked with yesterday. We have run yesterday’s exercises with the code shown in yesterday’s answers to the exercises, and saved it to object “dat.rds” to Brightspace.

Exercise 12.1a
Download the file “dat_ts1.rds” from Brighspace > Time series analyses > datasets > cattle, and store in an appropriate folder (e.g. data/processed/cattle/). Load the file into an object called dat. Note that it is an .rds file, which can be read into R with the function readRDS (see tutorial 1). Also load (and install if needed) the packages tidyverse, lubridate, tsibble, and feasts.

library(tidyverse)
library(lubridate)
library(tsibble)
library(feasts)
dat <- readRDS("data/processed/cattle/dat_ts1.rds")

Before continuing, let’s rearrange the data a little bit.

Exercise 12.1b
Remove the columns dx, dy and head from object dat, and convert it into tsibble, where the columns ID and burst are the keys, and column dttm is the index. Store the newly created tsibble as an object called datTS.

dat <- dat %>%
  select(-c(dx, dy, head)) %>%
  select(ID, dttm, everything()) # just for reordering columns
datTS <- dat %>%
  as_tsibble(key = c(ID, burst),
             index = dttm)

When you print the object datTS to the console you see that we are dealing with a time-regular tsibble object of 16 unique keys, with 2191 records and 17 columns, and time interval of 3 seconds:

datTS
## # A tsibble: 2,191 x 17 [3s] <GMT>
## # Key:       ID, burst [16]
##       ID dttm                burst     t label         x        y xacc_mean xacc_sd yacc_mean yacc_sd zacc_mean zacc_sd VeDBA_mean VeDBA_sd
##    <dbl> <dttm>              <dbl> <dbl> <fct>     <dbl>    <dbl>     <dbl>   <dbl>     <dbl>   <dbl>     <dbl>   <dbl>      <dbl>    <dbl>
##  1     1 2018-04-21 09:05:00     1   300 idle    682170. 5762774.     -2.47    9.98     -11.8    26.3     -4.26    18.6       30.8     18.6
##  2     1 2018-04-21 09:05:03     1   303 grazing 682169. 5762773.     -6.06   13.5      -34.7    21.9    -13.0     19.8       46.2     17.8
##  3     1 2018-04-21 09:05:06     1   306 grazing 682169. 5762773.     -1.91   18.8      -30.4    27.7     -7.80    22.5       46.9     20.2
##  4     1 2018-04-21 09:05:09     1   309 grazing 682169. 5762773.     -7.59   14.7      -23.5    25.3     -3.15    21.5       40.5     16.9
##  5     1 2018-04-21 09:05:12     1   312 grazing 682169. 5762773.     -8.88   13.3      -18.8    12.6     -4.31    11.8       28.5     10.5
##  6     1 2018-04-21 09:05:15     1   315 grazing 682169. 5762773.     -1.75   22.4      -19.6    23.6     -5.09    21.1       40.3     16.7
##  7     1 2018-04-21 09:05:18     1   318 grazing 682169. 5762773.     -4.60   16.8      -23.2    22.8     -1.83    15.0       35.8     17.5
##  8     1 2018-04-21 09:05:21     1   321 grazing 682169. 5762773.     -6.55   15.1      -15.2    21.4     -2.14    16.2       30.8     16.6
##  9     1 2018-04-21 09:05:24     1   324 grazing 682169. 5762773.     -6.04   16.6      -27.0    28.1     -9.91    24.3       45.6     20.6
## 10     1 2018-04-21 09:05:27     1   327 grazing 682168. 5762773.     -7.10   13.4      -23.5    28.3     -6.14    21.4       41.1     19.3
## # ℹ 2,181 more rows
## # ℹ 2 more variables: step <dbl>, angle <dbl>

2 Dimension reduction

As the first set of unsupervised learning algorithms, we are going to apply two dimension reduction approaches to the data: principal components analyses (PCA) and t-Distributed Stochastic Neighbour Embedding (t-SNE). Both algorithms need a table (or matrix) as input with only numeric data, and subsetted to the features of interest. Therefore, let’s first subset the data selecting only specific columns, and by removing rows that contain NA values.

Exercise 12.2a

Use object datTS, convert it into a tibble, and remove columns ID, dttm, burst, t, label, x and y. We have to convert the object back to a tibble first, as otherwise we will keep the index and key columns that reference each unique observation in a tsibble (thus, columns dttm, ID and burst respectively). In this exercises, however, we want to only retrieve the numerical features and store it in a separate object.

Also, overwrite the column angle with the cosine of angle. Then, filter the data by keeping only records that have an angle that is not NA. Try to do all steps in 1 pipe sequence, and assign the result to an object called datx.

Convert the object of class tsibble to tibble using as_tibble. Use the function cos to compute the cosine of a value. Records with NA values is specified columns can be removed using drop_na, or by using filter in combination with is.na.


datx <- datTS %>%
  as_tibble() %>%
  select(-c(ID,dttm,burst,t,label,x,y)) %>%
  mutate(angle = cos(angle)) %>%
  drop_na(angle)

The object datx now contains all the features that we computed yesterday based on the GPS and accelerometer data (note that we could have computed many more features).

It is often advised to first scale the features before doing analyses. Scaling here means to remove the mean and then divide by the standard deviation, so that all features are expressed on the same scale (with zero mean and unit variance). The dplyr package provides a nice function, across, which can be used inside a mutate function, and in combination with tidyselect functions for indexing columns by their names or properties (e.g. you can use where(is.numeric) to automatically select all numeric columns in a tibble). Using the across function, you can apply a function to several columns in one statement, for example the following code scales all numeric columns to zero mean and unit variance in dataset dat:

dat %>% 
  mutate(across(where(is.numeric), ~ (.x - mean(.x)) / sd(.x)))

Note that the ~ symbol denotes a formula in which the .x refers to column being processed. Instead of using the across function in this way (using a formula with the ~ symbol and .x as column), you can also pass in a function, e.g.:

# Define a function to standardize the data
fnScale <- function(x){
  y <- (x - mean(x)) / sd(x)
  return(y)
}

# Apply the function across all numeric columns
dat %>% 
  mutate(across(where(is.numeric), fnScale))
Exercise 12.2b
Scale all numeric features in the tibble datx.

datx <- datx %>% 
  mutate(across(where(is.numeric), ~ (.x - mean(.x)) / sd(.x)))

In the case of the dataset that we use this week, we actually have labelled data: the behaviour of the cattle was visually observed and annotated to several classes, here downscaled to the classes “foraging”, “walking” and “idle” (the column label in dat and datTS). To assist us in providing some meaning to the patterns that the unsupervised machine learning algorithms will show us, let’s keep the labels in memory.

Exercise 12.2c
Store the labels in the column label of datTS in an vector called datxLabels, after having filtered out all the rows on which the column angle is NA. Count the number of occurrences of each label.

To retrieve the values from a column, you can use the pull function, or the $ operator. To count the number of occurrences, you may want to use the function table or count.


To store the labels:

datxLabels <- datTS %>%
  drop_na(angle) %>%
  pull(label)

Then, to count the number of occurrences of each label:

table(datxLabels)
## datxLabels
## grazing walking    idle 
##    1656     240     263
datTS %>%
  drop_na(angle) %>%
  count(label)
## # A tibble: 3 × 2
##   label       n
##   <fct>   <int>
## 1 grazing  1656
## 2 walking   240
## 3 idle      263

2.1 PCA

Now that we’ve prepared the data (datx), and also stored a matching vector with true (observed) labels (datxLabels), let’s apply some dimension reduction algorithm, PCA. There are multiple packages and functions that compute a PCA, here we are going to use the function prcomp from the stats package that is automatically loaded when you start RStudio. Apart from a table with numeric values, the prcomp function does not need additional input arguments (although you can supply some optional argument values, see the helpfile of the prcomp function).

Exercise 12.2d
Apply a PCA on the data datx, assign the result to an object called datx_pca. Check the contents of the datx_pca object, e.g. by printing it to the console, by using names(datx_pca) or by using summary(datx_pca).

datx_pca <- prcomp(datx)
datx_pca
## Standard deviations (1, .., p=10):
##  [1] 2.0437206 1.1887108 1.1616227 1.0408751 0.8639934 0.7121532 0.5736013 0.4258363 0.3675481 0.2798058
## 
## Rotation (n x k) = (10 x 10):
##                     PC1        PC2         PC3         PC4          PC5          PC6          PC7         PC8         PC9         PC10
## xacc_mean   0.037169211  0.1168051 -0.56243531  0.56752224 -0.347432867  0.448172320 -0.143425548 -0.02118836  0.01615148 -0.060920642
## xacc_sd    -0.405963958 -0.1890052 -0.04519275  0.07075323  0.133530718 -0.251465092 -0.765468125 -0.21858203 -0.03530045 -0.276389369
## yacc_mean   0.375327231 -0.4044292  0.13759970  0.24112769 -0.054704303 -0.156912683 -0.003020708  0.01118408  0.73822604 -0.213925730
## yacc_sd    -0.438084953 -0.1669035 -0.03852115  0.11340582  0.006235495 -0.016677265  0.567770657 -0.13053586 -0.08491595 -0.647262836
## zacc_mean   0.322703242 -0.3041226 -0.23602125  0.37142865  0.084276850 -0.568091497  0.143712058 -0.09346818 -0.47366783  0.161634150
## zacc_sd    -0.432126056 -0.1702750 -0.03849030  0.17261652 -0.076446713 -0.174994576  0.019690786  0.81421018  0.08430956  0.214695749
## VeDBA_mean  0.005798483 -0.5220882  0.56100575  0.17160873 -0.247041063  0.427925606 -0.083179055 -0.03367593 -0.34962342  0.093671080
## VeDBA_sd   -0.448861389 -0.1624421 -0.07143879  0.10997560  0.035954440 -0.006326124  0.207323994 -0.49455561  0.30347584  0.611505570
## step        0.092185560 -0.4736074 -0.40823435 -0.25170755  0.602165298  0.398467931  0.002952121  0.12371238 -0.01615637  0.015219142
## angle       0.012004246 -0.3383226 -0.34283059 -0.57380525 -0.648503482 -0.125375719 -0.012783822 -0.04369966 -0.01549639 -0.004401788
names(datx_pca)
## [1] "sdev"     "rotation" "center"   "scale"    "x"
summary(datx_pca)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6    PC7     PC8     PC9    PC10
## Standard deviation     2.0437 1.1887 1.1616 1.0409 0.86399 0.71215 0.5736 0.42584 0.36755 0.27981
## Proportion of Variance 0.4177 0.1413 0.1349 0.1083 0.07465 0.05072 0.0329 0.01813 0.01351 0.00783
## Cumulative Proportion  0.4177 0.5590 0.6939 0.8023 0.87691 0.92763 0.9605 0.97866 0.99217 1.00000

The element sdev (which you can access using datx_pca$sdev) contains the standard deviation associated to each principal component. The element x (which you can access using datx_pca$x) contains the computed principal components themselves (stored in a matrix).

Exercise 12.2e
Plot a barplot of the element sdev and calculate the fraction of total variance that is explained by the first 2 principal components. Then, convert the element x into a tibble and add a column label that gets the values of the earlier created datxLabels. Create a scatterplot (ggplot with geom_point) with the first principal component on the x-axis and the second principal component on the y-axis. Let the colour of the points be determined by the label column. Do you see a clear pattern in relation to the cattle’s activity?

The function barplot can be used to create a barplot from values, e.g. the sdev values. The fraction of total variance that a principal component explains is its variance (sd2) divided by the sum of the variances across the principal components. The function cumsum computes a cumulative sum across a vector of values, which may be handy when computing the fraction of total variance explained by the first \(n\) principal components.


# Barplot
barplot(datx_pca$sdev)

# Variance per axis
datx_pca$sdev^2 / sum(datx_pca$sdev^2)
##  [1] 0.41767938 0.14130333 0.13493672 0.10834209 0.07464847 0.05071622 0.03290185 0.01813366 0.01350916 0.00782913
# plot the cumulative explained variance
plot(seq_along(datx_pca$sdev), 
     100 * cumsum(datx_pca$sdev^2) / sum(datx_pca$sdev^2),
     xlab = "Number of PC included", 
     ylab = "Cumulative variance explained (%)",
     type = "b")

# Get the data as a tibble, with label as new column
datx_pcatbl <- as_tibble(datx_pca$x) %>%
  mutate(label = datxLabels)

# Plot the first 2 principal components
datx_pcatbl %>%
  ggplot(aes(x=PC1, y=PC2, col=label)) +
  geom_point() +
  coord_fixed() # to have a 1:1 ratio on x and y axis

There seems to be quite some clustering of data for the 3 different behaviours.

The ggfortify package provides a nice interface to plotting PCA results using ggplot (using the function autoplot). When you run this code you will get a nicer plot summarising the PCA results, including the contribution of the individual features to the principal components and the contribution of the plotted principal components to the total variance.

library(ggfortify)
autoplot(datx_pca,
         data = tibble(label=datxLabels),
         colour = 'label',
         loadings = TRUE,
         loadings.colour = 'blue',
         loadings.label = TRUE,
         loadings.label.size = 4,
         loadings.label.colour = 'black')

2.2 t-SNE

A relatively new dimension reduction approach is the t-Distributed Stochastic Neighbour Embedding (t-SNE) proposed in 20081. Whereas PCA is a linear dimension reduction approach, the t-SNE is non-linear. Here, we will use the Rtsne function from a package with the same name. Thus, install the package and load it. Check the Rtsne help file via ?Rtsne. The argument X should be a table (matrix, data.frame), where each row is an observation and each column a variable (i.e. a tidy dataset!). Via the dims argument, we can set the dimensionality of the output (defaults to 2). There are other arguments that we can supply, but now we let them at their default values (which for our data will be fine).

Exercise 12.2f
Install and load the Rtsne package. Apply the t-SNE to the data datx, keep 2 output dimensions (argument dims), and assign the result to an object called dat_tsne (running the algorithm may take half a minute or so). Explore the object dat_tsne, e.g. using the names and summary functions. Convert the element Y of output dat_tsne into a tibble, and add a column label that gets the values of the earlier created datxLabels. Create a scatterplot (ggplot with geom_point) with the first component of the t-SNE on the x axis and the second component of the t-SNE on the y-axis. Let the colour of the points be determined by the label column.

library(Rtsne)
dat_tsne = Rtsne(datx, dims = 2)
names(dat_tsne)
##  [1] "N"                   "Y"                   "costs"               "itercosts"           "origD"               "perplexity"         
##  [7] "theta"               "max_iter"            "stop_lying_iter"     "mom_switch_iter"     "momentum"            "final_momentum"     
## [13] "eta"                 "exaggeration_factor"
summary(dat_tsne)
##                     Length Class  Mode   
## N                      1   -none- numeric
## Y                   4318   -none- numeric
## costs               2159   -none- numeric
## itercosts             20   -none- numeric
## origD                  1   -none- numeric
## perplexity             1   -none- numeric
## theta                  1   -none- numeric
## max_iter               1   -none- numeric
## stop_lying_iter        1   -none- numeric
## mom_switch_iter        1   -none- numeric
## momentum               1   -none- numeric
## final_momentum         1   -none- numeric
## eta                    1   -none- numeric
## exaggeration_factor    1   -none- numeric
dat_tsne <- dat_tsne$Y %>%
  as_tibble() %>%
  mutate(label = datxLabels)

dat_tsne %>%
  ggplot(aes(x=V1, y=V2, col=label)) +
  geom_point() +
  coord_fixed() # to have a 1:1 ratio on x and y axis

It seems to separate the clusters a bit better than the PCA does (although we are not doing a cluster analysis yet).

3 Clustering

3.1 K-means clustering

The simplest clustering algorithm is K-means clustering, where K is a number that should be chosen a priori. In R, the kmeans function from the stats package performs K-means clustering on a data matrix (e.g. datx as computed above).

Exercise 12.3a
Apply the K-means algorithm to the data datx. Use \(K=3\). Assign the result to an object called k3. Retrieve the value of the K centers (using the element k3$centers). Also, create the same plot as the t-SNE exercise (a scatterplot of the the 2 t-SNE axes), but now with the colour of the points indicating the assigned cluster according to the K-means algorithm (using the element k3$cluster).

k3 <- kmeans(datx,
             centers = 3)

dat_K <- dat_tsne %>%
  mutate(cluster = factor(k3$cluster))

dat_K %>%
  ggplot(aes(x=V1, y=V2, col=cluster)) +
  geom_point() +
  coord_fixed()

The cluster package offers a function to partition (i.e. cluster) the data into \(k\) clusters “around medoids”, a more robust version of K-means:

library(cluster)
autoplot(pam(datx, 3), frame = TRUE, frame.type = 'norm')

3.2 Hierarchical clustering

Whereas K-means clustering is a fast and flat type of clustering, hierarchical clustering is much slower because it assigns each records to its own unique cluster, and evaluates the dissimilarity between each pair of clusters. It will merge together the 2 clusters that are most similar, and will continue to do so until all records belong tot the same cluster. The results of a hierarchical clustering can be shown as a dendrogram, i.e. a tree diagram. We will use the hclust function from the stats package. The hclust function needs a dissimilarity structure as produced by the function dist as input. Thus, with input data x, you can apply hierarchical clustering using hclust(dist(x)). The dist function has the method input argument, which can include “euclidean” or “manhattan” and denotes the metric used to compute distances between points. The hclust function performs agglomerative hierarchical clustering and also has a method argument, yet this is the argument to specify the type of linkage method: e.g. “complete”, “single”, or “average”.

Exercise 12.3b
Apply hierarchical clustering to the object datx, with Manhattan distances and complete linkage. Store the output as object with name hc, and plot the result.

To plot the output of hclust, simply use the plot() function.


hc <- hclust(dist(datx, method = "manhattan"), method = "complete")
plot(hc)

The plot from the previous exercise does not look very nice yet. Using the dendextend package, we can plot the dendrogram a bit nicer. The following code colours the records based on their observed behaviour as stored in the object datxLabels:

library(dendextend)
dend <- as.dendrogram(hc)
colCodes <- c(grazing = "green", walking = "red", idle = "blue")
labels_colors(dend) <- colCodes[as.integer(datxLabels)][order.dendrogram(dend)]
plot(dend)
legend(x = "topleft", bty = "n", pch = 16, col = colCodes, legend = names(colCodes))
abline(h = 25, col = 8, lty = 2)

The dendrogram now shows us a bit more insight. We see that there is quite a bit of clustering in the features associated to the different behaviour, although there still is some mixing.

Since hierarchical clustering does not yield a specific number of clusters, but rather a tree-like hierarchy of similarity, we need to prune the tree in order convert it into discrete clusters. For this we can use the cutree function, here at a height of 25 (the horizontal grey dashed line in the previous plot):

table(behaviour=datxLabels, cluster=cutree(hc, h = 25))
##          cluster
## behaviour    1    2    3    4    5
##   grazing 1521   73   45   10    7
##   walking   64    7  154   11    4
##   idle      38    0   37  187    1

We notice that there are more clusters found (using this height at which we cut the tree) than there are behaviours, yet also that clusters correlate with specific behaviours.

3.3 Hiden Markov models

The last class of algorithms that we will look at today are Hidden Markov models (HMM): a model that assumed that the process with unobservable (i.e. hidden) states (i.e., the clusters, or latent behaviours) follows a Markov chain (i.e. a sequence of possible events in which the probability of each event depends only on the state attained in the previous event). There are several packages in R that implement HMMs; here we will use the momentuHMM package (see this paper).

Before we can fit a HMM using the momentuHMM package, we have to prepare the data using the prepData function. Input to this function can be a data.frame (or tibble or tsibble). By specifying the coordNames argument (e.g. the x and y coordinates), the prepData function will calculate the movement statistics step and angle. However, as we have already done so yesterday ourselves, and as they are now columns in our data object datTS, we can set the coordNames argument to NULL so that this step will be ignored in prepData. The function prepData assumes that there is a column with the name ID, which should contain the identifier of the time-series. In our object datTS we actually have 2 keys: ID and burst, so we should first merge them into a single column ID. However, as datTS is a tsibble, with both key and index, we cannot straight away change the columns that are either a key or an index. Thus, we should first convert it back to a tibble, then merge both columns, and then prepare the data using prepData.

Exercise 12.3c
Install and load the momentuHMM package, and prepare the object datTS for analyses using the prepData function. Assign the output to an object called preppedDat. Plot the object preppedDat (using the plot function).

To merge 2 columns into a single column, we can use either the unite function in dplyr, or the str_c function in stringr. When you use plot(preppedDat) all time-series will be plotted, but you can choose to plot only a specific time-series by using the argument animal.


library(momentuHMM)

preppedDat <- datTS %>%
  as_tibble() %>%
  drop_na(angle) %>%
  unite("ID", c(ID,burst)) %>%
  prepData(coordNames=NULL)
plot(preppedDat, animals = 13)

Now that we’ve prepared the data for the package momentuHMM, we can fit our fist HMM. the function fitHMM can be used for this. The function needs several inputs:

  • data the data as prepared above
  • nbStates the number of hidden states we want to estimate
  • dist a named list, where the names of the elements refer to the column names in data that will be the response variables, and their values indicating their probability distributions. Supported distributions are norm for Gaussian distributions (e.g. the accelerometer-derived features), gamma, weibull for distributions with values > 0 (e.g. step), and vm or wrpgauchy for the von Mises distribution and wrapped Gauchy distribution: distributions that can be used for angular/circular features (e.g. angle). For most distributions, we need 2 parameters for each state (e.g. norm has mean and sd as parameters), yet the circular distributions can be supplied with either 1 or 2 parameters (the angular mean and dispersion): if providing only 1 parameters, fitHMM will assume that the angular mean of the distribution is 0.
  • Par0 is a named list containing vectors of initial state-dependent probability distribution parameters for each data stream specified in dist.

Let’s fit a 3-state HMM with 2 response variables: step and angle. We thus need to specify their distributions (let’s use gamma and wrpcauchy), and we have to set their initial values. For angle lets only provide 1 parameter per state, thus assuming that the angular mean is 0 (i.e. on average the animals move forward).

Let’s estimate the initial values of the parameters using the mean and standard deviation of the step lengths, and the mean cosine of angle:

mean(preppedDat$step, na.rm=TRUE)
## [1] 0.5099411
sd(preppedDat$step, na.rm=TRUE)
## [1] 0.7432717
mean(cos(preppedDat$angle), na.rm=TRUE)
## [1] 0.3835441

Now we can vary the parameters a bit from these estimates, having 1 state that is consistently higher in its parameters, 1 state roughly at these estimates, and 1 state lower.

fitHMM_stepangle <- momentuHMM::fitHMM(
  data     = preppedDat,
  nbStates = 3,
  dist     = list(step = "gamma",
                  angle = "wrpcauchy"),
  Par0     = list(step = c( 1,    .5,  .2,   # parameters for the mean step, 3 states
                            1.25, .75, .25), # parameters for sd step, 3 states
                  angle = c(0.6, 0.4, 0.2))) # parameters for angle variance

Now that we’ve fitted the HMM, we can explore it. See a summary of the fitted model by printing it to the screen:

fitHMM_stepangle
## Value of the maximum log-likelihood: -3228.898 
## 
## 
## step parameters:
## ----------------
##       state 1   state 2    state 3
## mean 2.050121 0.2745343 0.08798034
## sd   1.033226 0.1682710 0.06730777
## 
## angle parameters:
## -----------------
##                 state 1   state 2   state 3
## mean          0.0000000 0.0000000 0.0000000
## concentration 0.7781085 0.3077997 0.2936657
## 
## Regression coeffs for the transition probabilities:
## ---------------------------------------------------
##                1 -> 2    1 -> 3    2 -> 1    2 -> 3   3 -> 1    3 -> 2
## (Intercept) -1.402442 -16.49222 -3.367524 -5.303439 -31.7583 -2.728423
## 
## Transition probability matrix:
## ------------------------------
##              state 1    state 2      state 3
## state 1 8.025711e-01 0.19742885 5.520806e-08
## state 2 3.316651e-02 0.96204782 4.785668e-03
## state 3 1.513779e-14 0.06131689 9.386831e-01
## 
## Initial distribution:
## ---------------------
##      state 1      state 2      state 3 
## 3.691801e-01 6.308193e-01 5.823285e-07

Moreover, we can plot the results: both he mix of density distributions, as well as maps of the estimated states of the moving animals (using the Viterbi algorithm). Using plot(fitHMM_stepangle) will show all animals, but to show just 1 animal (here time-series 13), we can use:

plot(fitHMM_stepangle, animals=13)
## Decoding state sequence... DONE

Also, we can plot the estimated probabilities that the animal is in any of the 3 states at any moment in time using plotStates:

plotStates(fitHMM_stepangle, 13)
## Decoding state sequence... DONE
## Computing state probabilities... DONE

We can decode the fitted model (i.e. estimate the most likely state at every moment in time) using the viterbi function. For example, we can check how often the cattle were in any of the 3 estimated states using:

table(viterbi(fitHMM_stepangle))
## 
##    1    2    3 
##  297 1759  103

We can also check how often a fitted hidden stated occurs per annotated behaviour:

table(behaviour=datxLabels, state=viterbi(fitHMM_stepangle))
##          state
## behaviour    1    2    3
##   grazing   80 1576    0
##   walking  202   37    1
##   idle      15  146  102

We see the latent state 1 is mostly associated to walking behaviour but also partly to grazing, latent state 2 mostly to grazing, and idle behaviour is represented by hidden states 2 and 3.

4 Challenge

Challenge
The above shows how to fit a 3-state HMM with 2 response variables. Extend this model by including 1 or 2 extra response variables (i.e. the accelerometer-based features that you computed yesterday), and fit the model. Summarise and visualize the model output. For example: how do the fitted hidden states relate to the true observed behaviours (stored in datxLabels)?

For example, a 3-state HMM with 4 response variables (step, angle, VeDBA_sd and yacc_sd):

fitHMM_4x <- momentuHMM::fitHMM(
  data     = preppedDat,
  nbStates = 3,
  dist     = list(step = "gamma",
                  angle = "wrpcauchy",
                  VeDBA_sd = "norm",
                  yacc_sd = "norm"),
  Par0     = list(step = c( 1,    .5,  .2,
                            1.25, .75, .25),
                  angle = c(0.6, 0.4, 0.2),
                  VeDBA_sd = c(15, 12.5, 10,
                               5, 4.5, 4),
                  yacc_sd = c(18, 16, 14,
                              8, 7, 6)))

# Explore
fitHMM_4x
## Value of the maximum log-likelihood: -15471.83 
## 
## 
## step parameters:
## ----------------
##       state 1   state 2    state 3
## mean 1.275037 0.2495804 0.11043422
## sd   1.158870 0.1415646 0.08776665
## 
## angle parameters:
## -----------------
##                 state 1   state 2   state 3
## mean          0.0000000 0.0000000 0.0000000
## concentration 0.6217996 0.3102226 0.1899219
## 
## VeDBA_sd parameters:
## --------------------
##       state 1   state 2  state 3
## mean 9.817193 14.248176 1.826411
## sd   3.952679  2.920225 1.156986
## 
## yacc_sd parameters:
## -------------------
##        state 1  state 2  state 3
## mean 12.840551 19.55205 2.002726
## sd    6.300851  4.53290 1.365834
## 
## Regression coeffs for the transition probabilities:
## ---------------------------------------------------
##                1 -> 2    1 -> 3    2 -> 1    2 -> 3    3 -> 1    3 -> 2
## (Intercept) -1.619524 -3.022692 -2.699377 -5.679367 -1.691551 -4.142941
## 
## Transition probability matrix:
## ------------------------------
##            state 1    state 2     state 3
## state 1 0.80214140 0.15881837 0.039040235
## state 2 0.06280911 0.93400060 0.003190285
## state 3 0.15351390 0.01322887 0.833257233
## 
## Initial distribution:
## ---------------------
##      state 1      state 2      state 3 
## 4.480314e-01 5.519670e-01 1.626806e-06
plot(fitHMM_4x, animals=13)
## Decoding state sequence... DONE

plotStates(fitHMM_4x, 13)
## Decoding state sequence... DONE
## Computing state probabilities... DONE

# Get fraction of viterbi HMM states per observed behaviour
datTS %>%
  as_tibble() %>%
  drop_na(angle) %>%
  mutate(viterbi = as.factor(viterbi(fitHMM_4x))) %>%
  ggplot(aes(x=label, fill=viterbi)) +
  geom_bar(position="fill") +
  ylab("fraction") +
  labs(fill = "Viterbi HMM state")

We see that hidden state 1 is mostly related to walking, hidden state 2 is mostly related to grazing, and hidden state 3 is mostly related to being idle, yet also a share of state 1 occurs during idle behaviour.

5 Submit your last plot

Submit your script file as well as a plot: either your last created plot, or a plot that best captures your solution to the challenge. Submit the files on Brightspace via Assessment > Assignments > Skills day 12.

Note that your submission will not be graded or evaluated. It is used only to facilitate interaction and to get insight into progress.

6 Recap

Today, we’ve further explored the tsibble package to work with time series data and fitted unsupervised algorithms. Tomorrow, we will fit supervised learning algorithms using tidymodels.

An R script of today’s exercises can be downloaded here

7 Further reading