Welcome to Toolkit4PED

Archetype representation and clustering for building stock

Learn More

About PREDICT

PREDICT (Predictive Renovation Data Intelligence Clustering Toolchain) is a project focused on enabling intelligent renovation pathways across Europe. It aims to leverage clustering algorithms and data analytics to uncover hidden patterns in building stock performance and renovation potential.

Toolkit4PED is one of PREDICT’s core innovations, designed to cluster and visualise building data, detect outliers, and inform renovation and investment decisions with explainable AI models.

What is Toolkit4PED?

Toolkit4PED is a modular web-based platform that empowers stakeholders—energy agencies, local authorities, building owners, and analysts—to:

Cluster buildings using intelligent algorithms (e.g. K-Means, DBSCAN, Agglomerative)
Evaluate renovation groupings with visual and statistical metrics
Detect anomalies in building datasets (outliers)
Assess renovation opportunities based on Smart Readiness, EPC, EUI, and more
Filter, visualise, and export results to support investment planning

The Toolkit4PED supports the creation of representative archetypes critical for Renovation Wave strategies and EU Green Deal compliance.

Work Package Structure

WP1: Data Harmonisation & Preprocessing
WP2: Clustering Engine Design (clustering core)
WP3: Evaluation Metrics & Visualisation
WP4: Integration with PREDICT Dashboard
WP5: Validation with Pilot Datasets
WP6: Replication & Exploitation Plan

Clustering Algorithms & Evaluation Metrics

The Toolkit4PED uses advanced unsupervised learning methods to cluster buildings into renovation-relevant groups. Supported algorithms include:

K-Means: Simple partitioning based on centroid distance
DBSCAN: Density-based clustering to detect noise and core points
Agglomerative: Hierarchical clustering merging closest groups

Clusters are evaluated using four main validation scores:

Silhouette Score: Measures cluster separation (0.5–1 is good)
Davies-Bouldin Index: Lower is better (< 0.5 is good)
Calinski-Harabasz Index: Higher is better (> 2000 is strong)
Elbow Method: Detects optimal number of clusters based on WCSS

Visual panels and metrics guide users in selecting the best algorithm and filtering strategy.