Fabular: regression formulas as probabilistic programming
Gordon, Andrew D
POPL 2016 Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Association for Computing Machinery
MetadataShow full item record
Borgström, J., Gordon, A. D., Ouyang, L., Russo, C., Ścibior, A., & Szymczak, M. (2016). Fabular: regression formulas as probabilistic programming. POPL 2016 Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 271-283. https://doi.org/10.1145/2837614.2837653
Regression formulas are a domain-specific language adopted by several R packages for describing an important and useful class of statistical models: hierarchical linear regressions. Formulas are succinct, expressive, and clearly popular, so are they a useful addition to probabilistic programming languages? And what do they mean? We propose a core calculus of hierarchical linear regression, in which regression coefficients are themselves defined by nested regressions (unlike in R). We explain how our calculus captures the essence of the formula DSL found in R. We describe the design and implementation of Fabular, a version of the Tabular schema-driven probabilistic programming language, enriched with formulas based on our regression calculus. To the best of our knowledge, this is the first formal description of the core ideas of R's formula notation, the first development of a calculus of regression formulas, and the first demonstration of the benefits of composing regression formulas and latent variables in a probabilistic programming language.
Bayesian inference, linear regression, probabilistic programming, relational data, hierarchical models
Adam Ścibior received travel support from the DARPA PPAML programme. Marcin Szymczak was supported by Microsoft Research through its PhD Scholarship Programme.
External DOI: https://doi.org/10.1145/2837614.2837653
This record's URL: https://www.repository.cam.ac.uk/handle/1810/253583