Kafka, Samza and the Unix Philosophy of Distributed Data
IEEE Data Engineering Bulletin
MetadataShow full item record
Kleppmann, M., & Kreps, J. (2015). Kafka, Samza and the Unix Philosophy of Distributed Data. IEEE Data Engineering Bulletin, 38 (4), 4-14. http://sites.computer.org/debull/A15dec/issue1.htm
Apache Kafka is a scalable message broker, and Apache Samza is a stream processing framework built upon Kafka. They are widely used as infrastructure for implementing personalized online services and real-time predictive analytics. Besides providing high throughput and low latency, Kafka and Samza are designed with operational robustness and long-term maintenance of applications in mind. In this paper we explain the reasoning behind the design of Kafka and Samza, which allow complex applications to be built by composing a small number of simple primitives – replicated logs and stream operators. We draw parallels between the design of Kafka and Samza, batch processing pipelines, database architecture, and the design philosophy of Unix.
External link: http://sites.computer.org/debull/A15dec/issue1.htm
This record's URL: https://www.repository.cam.ac.uk/handle/1810/286031