Performs clustering analysis on forest parcels based on their ecosystem service family profiles. Supports both K-means and hierarchical clustering with automatic optimal k determination via silhouette analysis.
Arguments
- data
An sf object or data.frame containing the parcels to cluster
- families
Character vector of family column names to use for clustering (e.g.,
c("family_C", "family_B", "family_P", "family_S"))- k
Integer number of clusters. If
NULL(default), the optimal number of clusters is determined automatically using silhouette analysis.- method
Character string specifying clustering method:
"kmeans"(default) or"hierarchical"(Ward's linkage)- max_k
Maximum number of clusters to test when k is NULL (default: 10)
Value
The input data with an additional cluster integer column
indicating cluster assignment. The result also has attributes:
cluster_profile: Data frame with mean family values per clustermethod: Clustering method usedoptimal_k: Optimal k if auto-determined (only when k=NULL)silhouette_scores: Silhouette scores for k=2 to max_k (only when k=NULL)
If input is sf object, output preserves the sf class and geometry.
Details
## Clustering Methods
- **K-means**: Fast, works well with spherical clusters, sensitive to outliers - **Hierarchical**: More flexible cluster shapes, deterministic, slower
## Automatic K Determination
When k = NULL, the function tests k from 2 to max_k and selects
the k with highest average silhouette width. Silhouette values range from
-1 to 1:
- > 0.7: Strong structure
- 0.5-0.7: Reasonable structure
- 0.25-0.5: Weak structure
- < 0.25: No substantial structure
## Cluster Profiles
The function computes cluster profiles (centroid values) for each family, allowing interpretation of cluster characteristics (e.g., "high production, low biodiversity" cluster).
Examples
if (FALSE) { # \dontrun{
# Load demo dataset
data("massif_demo_units")
# Cluster parcels into 3 groups based on 4 families
result <- cluster_parcels(
massif_demo_units,
families = c("family_C", "family_B", "family_P", "family_S"),
k = 3,
method = "kmeans"
)
# View cluster assignments
table(result$cluster)
# View cluster profiles
attr(result, "cluster_profile")
# Auto-determine optimal k
result_auto <- cluster_parcels(
massif_demo_units,
families = c("family_C", "family_B", "family_P", "family_S"),
k = NULL
)
attr(result_auto, "optimal_k")
attr(result_auto, "silhouette_scores")
# Use hierarchical clustering
result_hclust <- cluster_parcels(
massif_demo_units,
families = c("family_C", "family_B", "family_P", "family_S"),
k = 3,
method = "hierarchical"
)
# Visualize clusters spatially
library(ggplot2)
ggplot(result) +
geom_sf(aes(fill = factor(cluster))) +
scale_fill_viridis_d() +
labs(title = "Parcel Clusters", fill = "Cluster")
} # }