Towards automatically generating supply chain maps from natural language text
Supply chains have become increasingly global, complex, and multi-tiered. Consequently, companies have been gradually losing visibility of their supply network topology, that is, the structure formed by supply chain participants and their inter-dependencies across multiple tiers. This is a problem for firms as information about their extended supply chains is a valuable input when making supply chain decisions, such as how to improve their efficiency, resilience, or sustainability. Supply chain mapping, whereby information about the supply network structure is collected and visualised, is often cited in the literature to be key to addressing the problem. Yet the challenge of acquiring the necessary information remains largely unaddressed. This thesis aims to tackle this challenge by presenting an approach for automatically generating basic supply chain maps from unstructured, natural language text by using machine learning methods. The focus of this research is on automatically extracting individual buyer-supplier (“who supplies whom”) relations, a pre-requisite for automating the creation of supply chain maps from text. Such text might be sourced from openly available documents, such as news articles obtained from the Web, or alternatively from privately acquired documents. This thesis focusses mainly on the former although the results provided apply equally to the latter. A classifier for buyer-supplier relations was obtained in two steps: Firstly, a reference dataset (“corpus”) was created by having human annotators assign a “label” to each pair of organisational named entities in each given sentence of the dataset. Labels indicate the type of relation between these two organisations expressed in this sentence. Secondly, a classifier was designed and trained on the dataset. A selection of different classifier architectures were tested and compared against each other. A further part of this research extends the scope from extracting individual buyer-supplier relations to the end-to-end processing pipeline in which a collection of text documents is converted into a basic supply chain map. The end-to-end approach, including a pre-trained classifier, was validated on a large, unlabelled and previously unseen real-world dataset. The approach proposed in this study shall be understood as a first step towards the vision of automating supply chain mapping from text. It is not yet an equivalent substitute for manual research. It could, however, provide an initial (partial) supply chain map or help with checking existing supply chain maps for completeness. The effectiveness of any automated approach to supply chain mapping is clearly dependent on obtaining sufficiently rich data to work with in the first place. For example, in an experiment using openly available Web data only, it was possible to extract 229 distinct Boeing suppliers. Assuming a total number of 13,000 Boeing suppliers, this would correspond to 1.8% of Boeing’s suppliers, albeit some of the key ones. A central contribution of this work is a method to automatically extract individual buyer-supplier relations from text. Using the proposed method, questions regarding the achievable level of agreement among the human annotators (“inter-annotator agreement”), the classification performance on the reference dataset, as well as the achieved performance on large unlabelled datasets could be addressed. A further contribution is the identification of challenges in developing an end-to-end approach for automating the complete supply chain mapping process as well as a conceptual framework for such an approach.