Chain-of-Thought-based Knowledge Extraction from Heterogeneous Infrastructure Database for Integrated Transportation Asset Management
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Abstract
The fragmentation of infrastructure information systems has long been an obstacle to integrated transportation asset management (TAM). This paper presents a novel method for automatic knowledge extraction and ontology modelling from heterogeneous TAM databases using large language models (LLMs). The method adopts a Chain-of-Thought framework to decompose the complex ontology modelling process into atomic tasks, harnessing the semantic understanding and reasoning capabilities of LLMs. As a result, class entities, class hierarchies, and relations are generated to construct an ontology model that supports semantic interoperability across diverse TAM systems. The method’s performance was evaluated using four sets of TAM database schemas from UK road agencies. The results show that the overall recall rate for entity generation reaches 89.5% compared to the standard ontology. Furthermore, the accuracy rates for entity classification and relation classification are 82.1% and 75.6%, respectively, demonstrating the effectiveness of the proposed LLM-based approach in addressing data fragmentation issues in transportation information systems.

