Vision Language Model-driven Multimodal Occlusion Analysis (VLM-MOA) Method for Construction Progress Monitoring

Accurate and timely construction progress updates are vital for effective project management. Drone-based data collection offers a cost-effective, efficient method for generating point cloud scans to visualize and track progress. However, limitations in drone flight data collection, which captures only surface-level scenes, can lead to undetected constructed elements during long intervals between flights, resulting in inaccurate progress assessments. This study introduces an innovative approach that integrates multimodal information with Vision Language Models (VLMs) to infer construction progress. A case study in Poland utilized periodic drone-generated point cloud scans to monitor a construction project. The proposed method employs point proximity metrics and occlusion analysis, leveraging VLM-driven spatial and logical reasoning to address data gaps and infer occluded progress. Results validate the efficacy of this multimodal VLM-driven approach. Future work will focus on completing the overall workflow.

Conference Name

ASCE International Conference on Computing in Civil Engineering (i3CE 2025)

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International

Sponsorship

Engineering and Physical Sciences Research Council (EP/S02302X/1)

Collections

University of Cambridge Research Outputs (Articles and Conferences)