Source-code queries with graph databases - With application to programming language usage and evolution

Urma, RG; Mycroft, A

Source-code queries with graph databases - With application to programming language usage and evolution

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/288434

Repository DOI

https://doi.org/10.17863/CAM.21647

Files

Accepted version (136.08 KB)

Type

Article

Authors

Urma, RG

Mycroft, Alan

https://orcid.org/0000-0001-7013-8572

Abstract

Program querying and analysis tools are of growing importance, and occur in two main variants. Firstly there are source-code query languages which help software engineers to explore a system, or to find code in need of refactoring as coding standards evolve. These also enable language designers to understand the practical uses of language features and idioms over a software corpus. Secondly there are program analysis tools in the style of Coverity which perform deeper program analysis searching for bugs as well as checking adherence to coding standards such as MISRA. The former class are typically implemented on top of relational or deductive databases and make ad-hoc trade-offs between scalability and the amount of source-code detail held - with consequent limitations on the expressiveness of queries. The latter class are more commercially driven and involve more ad-hoc queries over program representations, nonetheless similar pressures encourage user-visible domain-specific languages to specify analyses. We argue that a graph data model and associated query language provides a unifying conceptual model and gives efficient scalable implementation even when storing full source-code detail. It also supports overlays allowing a query DSL to pose queries at a mixture of syntax-tree, type, control-flow-graph or data-flow levels. We describe a prototype source-code query system built on top of Neo4j using its Cypher graph query language; experiments show it scales to multi-million-line programs while also storing full source-code detail.

Journal Title

Science of Computer Programming

Journal ISSN

0167-6423

Volume Title

97

Publisher

Elsevier

Publisher DOI

https://doi.org/10.1016/j.scico.2013.11.010

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Cambridge University Research Outputs