Zero-Shot Learning with Vision-Language Models for Estimating Building Energy Efficiency from Street View Images
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
The built environment significantly contributes to global energy use and CO2 emissions, with large-scale energy efficiency evaluations often hindered by cost and data inconsistencies. This study employs zero-shot learning with Vision-Language Models (VLMs) to classify building energy efficiency from images. Our methodology integrates advanced image processing with natural language understanding, enabling buildings to be classified into UK Energy Performance Certificate (EPC) grades based solely on visual inputs. By leveraging pre-trained knowledge in VLMs, the framework identifies energy efficiency levels using descriptive attributes without requiring prior training on labeled examples. Performance evaluation against traditional supervised learning models demonstrates VLMs can effectively categorize buildings into energy efficiency classes. Results highlight VLMs' potential as a scalable tool for building energy assessment to inform renovation planning and sustainable urban development.

