Chapter 6 Clustering
The default option is to arrange the rows and columns in the same order as they are provided. We saw in a previous section that it is easy to arrange the rows and columns into a clustered formation using the pretty.order.rows
and pretty.order.cols
arguments.
6.1 Dendrogram
It is natural to supply a dendrogram that highlights the hierarchical clustering of the columns and/or rows using the col.dendrogram
and row.dendrogram
arguments. Note that if you want to implement the row or column ordering implied by the dendrogram, but to remove the dendrogram itself, you can use the pretty.order.rows
and pretty.order.cols
arguments.
6.2 Generating clusters
Grouping the rows and/or columns into a pre-specified number of clusters is a nice way to highlight structure and simplify visualization. For example, we can group the rows into three groupings by specifying n.clusters.rows = 3
. The underling clustering algorithm is kmeans()
, but you can use hierarchical clustering by specifying clustering.method = 'hierarchical'
.
In order to get the same clustering every time you must set the seed or provide your own clustering membership vector.
set.seed(2016113)
superheat(mtcars,
# scale the matrix columns
scale = TRUE,
# generate three column clusters
n.clusters.rows = 3)
By default, when clustering the corresponding labels are grouped into the cluster name (typically 1, 2, 3, … etc). If you would like to force the labels to be the original variable names, you can specify left.label = 'variable'
or bottom.label = 'variable
, depending on whether it is the left or bottom labels, respectively.
set.seed(2016113)
superheat(mtcars,
# scale the matrix columns
scale = TRUE,
# generate three column clusters
n.clusters.rows = 3,
left.label = 'variable')
6.2.1 Extracting the clusters
If you would like to be able to extract the clusters generated by the superheat()
function, then you need to first save the superheat object as a variable. From this variable, you can access the clusters that are stored in the membership.rows
and membership.cols
entries.
set.seed(2016113)
superheatmap <- superheat(mtcars,
# scale the matrix columns
scale = TRUE,
# generate three column clusters
n.clusters.rows = 3,
left.label = 'variable',
print.plot = F)
# extract the clusters
superheatmap$membership.rows
## Hornet Sportabout Duster 360 Merc 450SE
## 1 1 1
## Merc 450SL Merc 450SLC Cadillac Fleetwood
## 1 1 1
## Lincoln Continental Chrysler Imperial Dodge Challenger
## 1 1 1
## AMC Javelin Camaro Z28 Pontiac Firebird
## 1 1 1
## Ford Pantera L Maserati Bora Mazda RX4
## 1 1 2
## Mazda RX4 Wag Hornet 4 Drive Valiant
## 2 2 2
## Merc 240D Merc 230 Merc 280
## 2 2 2
## Merc 280C Toyota Corona Ferrari Dino
## 2 2 2
## Datsun 710 Fiat 128 Honda Civic
## 3 3 3
## Toyota Corolla Fiat X1-9 Porsche 914-2
## 3 3 3
## Lotus Europa Volvo 142E
## 3 3
6.3 User-supplied clusters
The best way to conduct clustering on your matrix is to provide a pre-specified membership vector using the membership.rows
/membership.cols
argument. Suppose, for our mtcars example, we wanted to group by number of gears.