Welcome to Toolkit4PED

Archetype representation and clustering for building stock

Toolkit4PED Visual

About PREDICT

PREDICT (Predictive Renovation Data Intelligence Clustering Toolchain) is a project focused on enabling intelligent renovation pathways across Europe. It aims to leverage clustering algorithms and data analytics to uncover hidden patterns in building stock performance and renovation potential.

Toolkit4PED is one of PREDICT’s core innovations, designed to cluster and visualise building data, detect outliers, and inform renovation and investment decisions with explainable AI models.

What is Toolkit4PED?

Toolkit4PED is a modular web-based platform that empowers stakeholders—energy agencies, local authorities, building owners, and analysts—to:

  • Cluster buildings using intelligent algorithms (e.g. K-Means, DBSCAN, Agglomerative)
  • Evaluate renovation groupings with visual and statistical metrics
  • Detect anomalies in building datasets (outliers)
  • Assess renovation opportunities based on Smart Readiness, EPC, EUI, and more
  • Filter, visualise, and export results to support investment planning

The Toolkit4PED supports the creation of representative archetypes critical for Renovation Wave strategies and EU Green Deal compliance.

Work Package Structure

  1. WP1: Data Harmonisation & Preprocessing
  2. WP2: Clustering Engine Design (clustering core)
  3. WP3: Evaluation Metrics & Visualisation
  4. WP4: Integration with PREDICT Dashboard
  5. WP5: Validation with Pilot Datasets
  6. WP6: Replication & Exploitation Plan

Clustering Algorithms & Evaluation Metrics

The Toolkit4PED uses advanced unsupervised learning methods to cluster buildings into renovation-relevant groups. Supported algorithms include:

  • K-Means: Simple partitioning based on centroid distance
  • DBSCAN: Density-based clustering to detect noise and core points
  • Agglomerative: Hierarchical clustering merging closest groups

Clusters are evaluated using four main validation scores:

  • Silhouette Score: Measures cluster separation (0.5–1 is good)
  • Davies-Bouldin Index: Lower is better (< 0.5 is good)
  • Calinski-Harabasz Index: Higher is better (> 2000 is strong)
  • Elbow Method: Detects optimal number of clusters based on WCSS

Visual panels and metrics guide users in selecting the best algorithm and filtering strategy.

Ready to Start?