Vision Language Model-driven Multimodal Occlusion Analysis (VLM-MOA) Method for Construction Progress Monitoring
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Abstract
Accurate and timely construction progress updates are vital for effective project management. Drone-based data collection offers a cost-effective, efficient method for generating point cloud scans to visualize and track progress. However, limitations in drone flight data collection, which captures only surface-level scenes, can lead to undetected constructed elements during long intervals between flights, resulting in inaccurate progress assessments. This study introduces an innovative approach that integrates multimodal information with Vision Language Models (VLMs) to infer construction progress. A case study in Poland utilized periodic drone-generated point cloud scans to monitor a construction project. The proposed method employs point proximity metrics and occlusion analysis, leveraging VLM-driven spatial and logical reasoning to address data gaps and infer occluded progress. Results validate the efficacy of this multimodal VLM-driven approach. Future work will focus on completing the overall workflow.

