Repository logo
 

Zero-Shot Learning with Vision-Language Models for Estimating Building Energy Efficiency from Street View Images

Accepted version
Peer-reviewed

Change log

Abstract

The built environment significantly contributes to global energy use and CO2 emissions, with large-scale energy efficiency evaluations often hindered by cost and data inconsistencies. This study employs zero-shot learning with Vision-Language Models (VLMs) to classify building energy efficiency from images. Our methodology integrates advanced image processing with natural language understanding, enabling buildings to be classified into UK Energy Performance Certificate (EPC) grades based solely on visual inputs. By leveraging pre-trained knowledge in VLMs, the framework identifies energy efficiency levels using descriptive attributes without requiring prior training on labeled examples. Performance evaluation against traditional supervised learning models demonstrates VLMs can effectively categorize buildings into energy efficiency classes. Results highlight VLMs' potential as a scalable tool for building energy assessment to inform renovation planning and sustainable urban development.

Description

Keywords

Journal Title

Conference Name

ASCE International Conference on Computing in Civil Engineering (i3CE 2025)

Journal ISSN

Volume Title

Publisher

Publisher DOI

Publisher URL

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
European Commission Horizon 2020 (H2020) Marie Sk?odowska-Curie actions (101034337)
European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 101034337