Repository logo
 

Vision Language Model-Driven Multimodal Occlusion Analysis (VLM-MOA) Method for Construction Progress Monitoring

Accepted version
Peer-reviewed

Loading...
Thumbnail Image

Change log

Abstract

Accurate and timely construction progress updates are vital for effective project management. Drone-based data collection offers a cost-effective, efficient method for generating point cloud scans to visualize and track progress. However, limitations in drone flight data collection, which captures only surface-level scenes, can lead to undetected constructed elements during long intervals between flights, resulting in inaccurate progress assessments. This study introduces an innovative approach that integrates multimodal information with vision language models (VLMs) to infer construction progress. A case study in Poland utilized periodic drone-generated point cloud scans to monitor a construction project. The proposed method employs point proximity metrics and occlusion analysis, leveraging VLM-driven spatial and logical reasoning to address data gaps and infer occluded progress. Results validate the efficacy of this multimodal VLM-driven approach. Future work will focus on completing the overall workflow.

Description

Journal Title

Computing in Civil Engineering 2025

Conference Name

Computing in Civil Engineering 2025

Journal ISSN

Volume Title

Publisher

American Society of Civil Engineers (ASCE)

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International
Sponsorship
Engineering and Physical Sciences Research Council (EP/S02302X/1)
Engineering and Physical Sciences Research Council (2728220)