Resilient payment systems Khaled Baqer St. Edmund’s College October 2018 This dissertation is submitted for the degree of Doctor of Philosophy Resilient payment systems Khaled Baqer Summary There have been decades of attempts to evolve or revolutionise the traditional financial system, but not all such efforts have been transformative or even successful. From Chaum’s proposals in the 1980s for private payment systems to micropayments, previous attempts failed to take off for a variety of reasons, including non-existing markets, or issues pertaining to usability, scalability and performance, resilience against failure, and complexity of protocols. Towards creating more resilient payment systems, we investigated issues related to security engineering in general, and payment systems in particular. We identified that network coverage, central points of failure, and attacks may cripple system performance. The premise of our research is that offline capabilities are required to produce resilience in critical systems. We focus on issues related to network problems and attacks, system resilience, and scalability by introducing the ability to process payments offline without relying on the availability of network coverage; a lack of network coverage renders some payment services unusable for their customers. Decentralising payment verification, and outsourcing some operations to users, alleviates the burden of contacting centralised systems to process every transaction. Our secondary goal is to minimise the cost of providing payment systems, so providers can cut transaction fees. Moreover, by decentralising payment verification that can be performed offline, we increase system resilience, and seamlessly maintain offline operations until a system is back online. We also use tamper-resistant hardware to tackle usability issues, by minimising cognitive overhead and helping users to correctly handle critical data, minimising the risks of data theft and tampering. We apply our research towards extending financial inclusion efforts, since the issues discussed above must be solved to extend mobile payments to the poorest demographics. More research is needed to integrate online payments, offline payments, and delay-tolerant networking. This research extends and enhances not only payment systems, but other electronically-enabled services from pay-as-you-go solar panels to agricultural subsidies and payments from aid donors. We hope that this thesis is helpful for researchers, protocol designers, and policy makers interested in creating resilient payment systems by assisting them in financial inclusion efforts. Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. It is not substantially the same as any that I have submitted, or am concurrently submitting, for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or is being concurrently submitted, for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. This dissertation does not exceed the prescribed limit of 60,000 words1. Khaled Baqer October 2018 1The word count is about 42,000 (using detex). Acknowledgements I am grateful to my family for always being supportive at the most critical moments. I could not have done it without them. I thank my friends for always encouraging me to go even further and always being supportive. I thank my funders, the Crown Prince’s International Scholarship Program (Bahrain), for supporting my academic studies. I’m grateful for all the assistance and encouragement. I thank my thesis advisor, Ross Anderson, for motivating me to not only be a better computer scientist, but a better scholar as well. His advice was crucial to guide my research, and to aspire to tackle challenging problems. I’ve learned a lot from him–not only about computer science–and I’ll always be grateful for that. I thank my undergraduate advisor, Mahendran Velauthapillai at Georgetown University, for always motivating me to give my best and encouraging me to pursue my graduate studies. His advice has been invaluable, and I’m grateful for his counsel. I thank colleagues who have made my time at the Computer Laboratory better and have always been there when most needed: Christian O’Connell, Jeunese Adrienne Payne, Laurent Simon, and Alexander Vetterl. I’m really grateful for all the great discussions and support; they have enriched my experience at the Lab. Contents 1 Introduction 13 1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 Publications and contributions . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Background 19 2.1 EMV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.1 EMV overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Transaction processing . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.2.1 Card authentication . . . . . . . . . . . . . . . . . . . . . 21 2.1.2.2 Cardholder verification . . . . . . . . . . . . . . . . . . . . 22 2.1.2.3 Transaction authorisation . . . . . . . . . . . . . . . . . . 23 2.1.3 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.4 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.1.5 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.1 Bitcoin overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1.1 Global ledger and money supply . . . . . . . . . . . . . . 29 2.2.1.2 Mining and transaction processing . . . . . . . . . . . . . 30 2.2.1.3 Disruption attacks . . . . . . . . . . . . . . . . . . . . . . 31 2.2.1.4 Security economics and incentives . . . . . . . . . . . . . . 33 2.2.2 Bitcoin’s backlog: Mempool and UTXO . . . . . . . . . . . . . . . 34 2.2.3 Novel features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.4 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.5 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3 Mobile payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.1 Mobile payments overview . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.1.1 Payment operations and services . . . . . . . . . . . . . . 43 2.3.1.2 Reasons for success and remaining challenges . . . . . . . 43 2.3.2 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.3 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4 Micropayments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4.1 Micropayments overview . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4.2 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.3 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5 Summary of challenges and research goals . . . . . . . . . . . . . . . . . . 48 3 The social externalities of trust 51 3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 System design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.1 Member registration . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.2 A simple threat model . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.3 A more realistic threat model . . . . . . . . . . . . . . . . . . . . . 56 3.4.4 Payment system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.4.1 Member identifier . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.4.2 Victim accounts . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.4.3 Token creation . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.4.4 Defining time using key rotation . . . . . . . . . . . . . . 59 3.4.4.5 Validation and value of tokens . . . . . . . . . . . . . . . . 59 3.4.5 Generating trust and reputation metrics . . . . . . . . . . . . . . . 59 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5.1 Mitigating collusions and malicious members . . . . . . . . . . . . . 60 3.5.2 Mitigating Sybil attacks . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.3 Metrics for security economics . . . . . . . . . . . . . . . . . . . . . 61 3.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.8 Afternote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4 Bitcoin stress testing 65 4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3.1 Transaction priority . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3.2 DoS targets inherent in Bitcoin . . . . . . . . . . . . . . . . . . . . 67 4.4 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5 Spam clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5.2 Results and motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.6 Impact on Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5 Offline payment protocols 83 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.3 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4 Design evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.1 Basic protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.2 Evolution 1: Eliminating narrow pipes . . . . . . . . . . . . . . . . 87 5.4.3 Evolution 2: Transaction chaining . . . . . . . . . . . . . . . . . . . 88 5.5 Mitigating the risk of a shared master secret . . . . . . . . . . . . . . . . . 92 5.5.1 Evolution 3: Group key scheme . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Evolution 4: Delay-tolerant Needham–Schroeder . . . . . . . . . . . 94 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6 Usability evaluation of offline payments 97 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.3 Background and related work . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.1 Overview of DigiTally . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4.2 DigiTally codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.5 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.2 Evaluation materials . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.5.4 Ethics, data collection, and privacy . . . . . . . . . . . . . . . . . . 109 6.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.6.1 Errors and speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.6.1.1 Error rates . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.6.1.2 Time on task . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.6.2 SUS results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.6.3 Responses to open-ended questions . . . . . . . . . . . . . . . . . . 114 6.6.3.1 Perceived usefulness . . . . . . . . . . . . . . . . . . . . . 114 6.6.3.2 Perceived ease-of-use . . . . . . . . . . . . . . . . . . . . . 115 6.6.3.3 Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.7 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.7.1 Visual cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.7.2 Error recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.9 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7 Conclusion 125 Bibliography 129 Chapter 1 Introduction The modern electronic financial system started with the credit card industry in the 1960s and became profitable in the 1980s. In the 1990s, Europay-Mastercard-Visa (EMV) evolved as the main payment protocol and standard, after prolonged disputes by major stakeholders in the payment industry, with a more recent development of contactless payments. In the same period, a surge of innovations occurred, of which PayPal emerged as the winner [5]. Not all attempts to revolutionise the financial system were successful or transformative. From Chaum’s proposals in the 1980s for private payment systems to micropayments and then cryptocurrencies, previous attempts failed to garner traction due to non-existing markets (solutions looking for problems), or issues pertaining to usability, scalability and performance, resilience against failure, and complexity of protocols. These topics are the focus of this thesis: we analyse the aforementioned systems to incorporate their strengths, explore failure issues, and attempt to mitigate them. The main components of information and computer security are confidentiality, integrity, and availability (CIA), the classic categorisation of security engineering issues that must be tackled to secure and harden systems. Although many other categories are necessary for a more thorough approach, we focus specifically in this thesis on availability; the security community has already dedicated considerable effort to system confidentiality and integrity, to preserve critical data at rest and during transit while supporting authentication and authorisation of system requests. However, as discussed by Anderson in “Security Engineering” [5], most businesses worry about system availability because it is a critical component of their profitability: a system is useless if it is not available, no matter what confidentiality and integrity features it provides; it is unlikely to generate profits, and it may lead customers to seek the services of competitors. This criticality of the network makes Denial-of-Service (DoS) attacks and connectivity failures among the most detrimental for online services. DoS attacks, for example, render online systems incapable of serving non-malicious requests by overwhelming the system with requests that deplete its resources and can cause system crashes. Such attacks can 13 make systems unavailable for prolonged periods of time. While an online service is under DoS attack, it is not uncommon for attackers to demand a “ransom” payment to cease the attack. Other network failures can have a similar effect on system availability: Domain Name System (DNS) can be poisoned with malicious entries to redirect traffic; and Border Gateway Protocol (BGP) routes can be hijacked to intercept network traffic or create network sinkholes for censorship. It is not difficult to create a payment system given online connectivity to serve requests through a front end, and a back end to process requests (ideally in atomic operations to balance debit and credit amounts without loss of data in case of interruptions). However, what is interesting for our research is what happens when a payment system experiences network failures. Such failures are not limited to payment systems; there is a lot of research on fault tolerance and delay-tolerant networks (DTNs) that expect failure and plan for it by, for example, load balancing or delayed delivery of data. We investigate and prepare for system failure to create resilient payment systems. We define system resilience as the ability to handle and recover from failure, and to temporarily degrade system processes if necessary in order to resume full system operations. We posit that network failures are inevitable, and that for systems to be resilient, offline capabilities must be included in the system design to be utilised when connectivity is unavailable. We see payment systems as critical national infrastructure that must be protected, possibly from non-local adversaries. A nation can attempt to damage another economically by targeting the victim nation’s payment networks to render them incapable of processing transactions quickly or at all. Therefore, it is important to deploy resilient payment systems that can withstand prolonged periods of large-scale attacks – attacks that are not limited to a few select servers, but that target an entire payment infrastructure, whether directly or via its underlying data networks. Towards creating more resilient payment systems, we investigate issues that impact performance and throughput, network issues, and several other weaknesses that effect scalability and resilience. Since we identified that network coverage issues and network attacks may cripple system performance, the premise of our research is that offline capabilities are required to produce resilience features in critical systems. Offline capabilities used to be standard in banking systems until the 1990s, but were progressively phased out to optimise and simplify system designs. We argue that this has gone too far. The goal of our work is to tackle network coverage and unreliable service problems by processing payments offline reliably and deterministically, and to simplify the user interface for usability. We focus on financial inclusion efforts, since the problems we investigated must be addressed to extend mobile payments to cater for the poorest demographics. 14 1.1 Thesis outline • Chapter 2 (background) : This chapter introduces the background to the research conducted, and related relevant work. The focus is on traditional financial systems, which include the payment protocol and standard (EMV) and the payment network that processes EMV transactions – the communication infrastructure and the princi- pals involved. We then discuss cryptocurrencies, focusing primarily on Bitcoin. We also discuss mobile payments and micropayments. We finish by synthesising the challenges and research goals. • Chapter 3 (the social externalities of trust): We highlight the lack of trust and reputation metrics in open-membership non-centralised systems. The lack of such metrics exposes these systems to various network attacks. We investigate how utilising a payment system, with a pseudo-currency, creates useful metrics for building trust and reputation systems. The system is designed to enable social externalities to determine the level of trust in the system. • Chapter 4 (Bitcoin stress testing): We analyse a spam campaign that occurred in July 2015, which crippled system performance by overwhelming the network with spam transactions. This resulted in transactions experiencing more delays in processing time, and incurring higher transaction fees to prioritise transactions. • Chapter 5 (offline payment protocols): We incorporate lessons learned from reviewing existing systems and previous proposals to create an offline payment system, DigiTally, that is focused on financial inclusion efforts in developing countries. The goal of DigiTally is to address network coverage and unreliable service problems by processing payments offline, reliably and deterministically. We aim to create these capabilities while simplifying the user interface and prioritising usability, simplicity, and speed. We include the details of the payment protocol created for DigiTally. • Chapter 6 (usability evaluation of offline payments): We developed a proto- type system for DigiTally, which lets users make offline payments by copying short strings of digits from one mobile handset to another. We discuss the lessons learned from field visits and initial evaluation, and present the findings and observations. • Chapter 7 (conclusion): The final chapter summarises our work and results, and discusses future work and research towards more resilient payment systems. 15 1.2 Publications and contributions I list below my contributions, and those of my colleagues, to the publications that are included in this thesis. I also list acknowledgements of funding and feedback. 1. Khaled Baqer and Ross Anderson. “Do you believe in Tinker Bell? The social externalities of trust”, in International Workshop on Security Protocols, 2015. (a) I worked on the protocol design and collaborated with Ross Anderson to create a usable protocol for Tor bridges. (b) I thank colleagues Laurent Simon and Stephan Kollmann for discussions re- garding anonymity networks. 2. Khaled Baqer, Danny Yuxing Huang, Damon McCoy, and Nicholas Weaver. “Stress- ing out: Bitcoin “stress testing””, in Proceedings of the Third Workshop on Bitcoin and Blockchain Research, 2016. (a) I worked on data analysis, as well as feature extraction and clustering. (b) Danny Yuxing Huang helped with the tasks above and was responsible for data gathering, and we both collaborated to produce the results. His PhD thesis [63] includes parts of this publication. (c) All listed authors collaborated to set the research goals and the issues to explore. (d) This work was supported by US National Science Foundation grant CNS- 1619620. 3. Khaled Baqer, Johann Bezuidenhoudt, Ross Anderson, and Markus Kuhn. “SMAPs: Short Message Authentication Protocols”, in International Workshop on Security Protocols, 2016. (a) I proposed introducing offline capabilities to produce resilience features in payment systems. I drafted the initial protocol and then collaborated with Ross Anderson to produce a research proposal to apply for a grant. Johann Bezuidenhoudt provided valuable feedback to refine the proposal. This was listed in my first-year report. (b) After the summer of 2015 (working on the Bitcoin project discussed above), Ross Anderson and I collaborated with Johann Bezuidenhoudt and Markus Kuhn to produce a more robust protocol to account for various attacks and issues. The result of this collaboration is the publication listed above. 16 4. Khaled Baqer, Ross Anderson, Jeunese Adrienne Payne, Lorna Mutegi, and Joseph Sevilla. “DigiTally: Piloting offline payments for phones”, in Proceedings of the Thirteenth Symposium on Usable Privacy and Security (SOUPS), 2017. (a) Ross Anderson provided valuable feedback to direct this research and handle the connections required for collaboration. (b) I designed and implemented DigiTally as a Java Card applet. I provisioned the overlay SIMs and the phones used in the evaluation. (c) Jeunese Adrienne Payne and I designed the study, and I was responsible for making sure the applet measured the metrics required for the evaluation. (d) I conducted the pilot study and evaluation in Kenya. Tasks included preparing the study material, session demonstrations and tutorials, answering participant questions, and maintaining the data securely upon collection at the end of the study. Staff members from Strathmore University helped with these tasks. (e) I prepared the results for analysis and collaborated with Jeunese Adrienne Payne to prepare and write the results of the research. (f) Staff members from Strathmore University helped with administrative tasks, selecting and communicating with participants (Strathmore University students), as well as organising and scheduling events and meetings. (g) I thank the following colleagues at the Computer Laboratory, University of Cambridge: Laurent Simon for providing feedback throughout the project; Alastair Beresford, Kat Krol, and Alexander Vetterl for helpful feedback on the SOUPS paper. I also thank colleagues at Strathmore University: Edwin Njeru, Nelson Mutua, and David Mutyethau for their assistance with this project. (h) This research was approved by ethics committees at the University of Cambridge and at Strathmore University. (i) This work was supported by a grant from the Bill & Melinda Gates Foundation. 5. Ross Anderson and Khaled Baqer. “Reconciling multiple objectives–politics or markets?”, in International Workshop on Security Protocols, 2017. (a) Ross Anderson was responsible for most of the ideas in this paper; I contributed sections regarding payment systems and ideas that are related to DigiTally. We also collaborated to list obstacles that suppress innovation and maintain monopolies. (b) This paper is not included in this thesis. 17 18 Chapter 2 Background We discuss in this chapter the background to the research conducted and related work. Our focus is on traditional financial systems, specifically, the Europay-Mastercard-Visa (EMV) payment protocol and standard, and the payment network that processes EMV transactions. We then investigate cryptocurrencies, focusing primarily on Bitcoin. We discuss mobile payments as well as micropayments. With regards to mobile payments, we focus primarily on payments using mobile phones as deployed in Less Developed Countries (LDCs), since our project for offline payments was aimed at mitigating some of the problems experienced in these countries. These types of payment systems are related to the research we have conducted on creating protocols for processing payments securely, and the need for lightweight protocols to minimise bandwidth consumption. We highlight common problems that we tackled during our research, including per- formance bottlenecks and network unreliability. The former is a major shortcoming of cryptocurrencies in general, and Bitcoin in particular. Network unreliability also creates resilience issues for the mobile payment systems deployed in LDCs: network disruptions prevent customers from conducting any transactions through the system operator, which is effectively a single point of failure. This is a real problem, especially where network cellular communication is unreliable and susceptible to censorship by the state, which typically either owns the network providers or has regulatory power. Network issues also arise with Bitcoin: the unavailability of network coverage means that peers cannot finalise payments because a global ledger entry is required to attest to the finality of a transaction. This, in turn, depends on the availability of candidate transactions to the miners. 19 2.1 EMV Europay-Mastercard-Visa (EMV) is a card payment protocol and standard that provides interoperability for Integrated Circuit Cards (ICCs) and Point-of-Sale (PoS) terminals worldwide. Another name for EMV is “Chip-and-PIN”: EMV smartcards contain a tamper- resistant chip and, in most countries, require a Personal Identification Number (PIN) to authenticate transactions. EMV is maintained by EMVCo., and the latest specification version 4.3 (released in November 2011) consists of four books (or volumes) [52, 53, 54, 55]1. Our analysis is largely based on these books (the second book [53] discusses cryptography in EMV). The following overview includes content that we published at SPW 2017 [7]. 2.1.1 EMV overview Card payment networks emerged from the local innovation of Diners’ Club in New York in 1950. Both credit and debit cards were first offered by single banks to local customers, then networked together, first nationally and then internationally. The protocol history really starts with the first cash machines fielded by Barclays in 1967. Early ATMs used ad-hoc mechanisms for PIN security but rapidly converged on a methodology pioneered by IBM. A bank could give each customer a PIN generated by encrypting their account number with a secret PIN Derivation Key, available in all bank ATMs and also to its central mainframe. This was adapted to provide dual control so that no member of bank staff would ever see any PIN other than their own, with a view to both minimising insider fraud and defending the bank against claims from defrauded customers (or those who pretended to be). The mechanism was to embed the cryptographic keys in tamper-resistant hardware, both in the ATM and in a Hardware Security Module (HSM) at the bank. The cards and PINs sent to customers could be sent from different data centres, or from the same centre on different days; and likewise the pairs of keys used to initialise the HSMs in ATMs [5]. The next step was to extend the communications to support interbank networking, a project driven by Visa and other card brands in the early 1980s. Further features were added, such as allowing customers to choose their own PINs. This allowed convenient worldwide ATM networking, but, in due course, problems started to appear: the process had become so complex that attacks started to emerge based on careless design [4] and feature interaction [26]. No sooner had the worst of these attacks been mitigated than the banks rolled out EMV, adding a further layer of complexity and more attacks followed [123]. The card payment system evolved from a single-bank system to an ecosystem, and assurance failed because of a combination of bloat, feature interaction, and institutional 1EMV books include about 750 pages, and start with about 50 pages of acronyms, definitions, terms, etc. that are common between the four books. The effective total number of pages (actual content specifying the EMV protocol) is about 550. This does not include specifications for contactless payments. 20 economics. As noted in [93], the implementation of EMV in Britain involves thousands of pages of specifications, as the cost of building the new EMV ecosystem was accepting backwards compatibility with numerous legacy systems. The combination of features that led, for example, to the No-PIN attack (discussed in Section 2.1.5) is scattered around this corpus. The bank brands building the ecosystem had insufficient incentive to insist on formal verification, or even adversarial review. The member banks also have conflicts of interest internally [94]: an acquiring bank that uses a less secure PIN entry device increases the risk of fraud across all local card issuers, yet it is unlikely to face more than a few percent of this fraud itself. Ultimately, the limits to security are about governance. The move from magnetic-strip-based ATMs and PoS transactions to the smartcard- based EMV involved pilots in the early 1990s, standardisation work in the late 1990s, and, finally, a roll-out in 2002, which is still not complete. This roll-out has involved substantial local customisation, and incremental changes are still being made to support new services and patch up vulnerabilities that get exploited. Even there, it can be slow; it took UK banks over five years to block the No-PIN attack [93]. 2.1.2 Transaction processing The principals in EMV transactions are the customer (payer, also called cardholder) who has an EMV-compliant card issued by a bank (the issuer), a merchant (payee) who normally has a PoS terminal that accepts the customer’s transaction and processes the payment through the merchant’s acquirer bank. Cards contain keys that are also stored in the bank’s HSMs, which are physical tamper-resistant crypto-processors. The symmetric keys, derived from the bank’s master key, are normally changed during the card personalisation process. In the following sections, the notation used is as follows: Card (C ), merchant’s PoS Terminal (T ), and Issuer (I ). 2.1.2.1 Card authentication The EMV protocol specifies three methods for card authentication2: • Static Data Authentication (SDA): The card contains a symmetric key used to create a Message Authentication Code (MAC) of the transaction data, called a Transaction Certificate (TC). The terminal must be online to contact the bank, which will verify and authenticate transactions, since the terminal does not contain the symmetric key necessary to authenticate the MAC. • Dynamic Data Authentication (DDA): Asymmetric cryptography is used in DDA to provide more capabilities than SDA. This involves a challenge-response 2See [116] for more details. 21 exchange in which the terminal sends a nonce that the card signs and returns, thereby proving that it knows the private key corresponding to the public key in its certificate [116]. However, the actual transaction is not tied to the card authentication process, and the transaction process falls back to using symmetric keys. Thus, the card details are authenticated, but the terminal cannot prove that the transaction was created by the given card (the transaction itself is not authenticated). • Combined Data Authentication (CDA): This method is similar to DDA, with the additional step that the card signs the generated TC, which is the MAC of the transaction, to associate the transaction data with the card. Thus, both the card and transaction details are authenticated using digital signatures. 2.1.2.2 Cardholder verification This process verifies the cardholder’s PIN as entered into the terminal, or a signature might be required instead of a PIN3. Note that it is also possible that the terminal does not require verification (no PIN or signature required), which is normally the case for unattended terminals, such as parking meters and vending machines, and for low-value contactless transactions. If a PIN is required, the card sends the PIN counter to the terminal to detect brute force attacks designed to guess the PIN. The PIN is entered by the cardholder and relayed to the card by the terminal. There are encrypted and unencrypted methods to relay the PIN from terminal to card, with encrypted communication requiring DDA or CDA capable cards; it is frequently assumed that the communication between the terminal and card is not intercepted, which is a critical error that paves the way for various attacks, discussed later. The card replies with a response to the PIN provided by the terminal: either a “PIN OK” message (bytes 0x9000), or a “PIN not OK” message along with the number of trials x remaining (bytes 0x63Cx). This is known as the “offline PIN” verification method; the card itself authenticates the PIN without contacting the Issuer. These messages are central to discussing Man-in-The-Middle (MiTM) attacks on EMV’s “offline PIN” verification. The process is outlined below: C → T : PIN counter T → C : PIN C → T : PIN OK (0x9000) or PIN Not OK (0x63Cx) Alternatively, using “online PIN” verification, the PIN is encrypted at the terminal, and the encrypted PIN is sent to the issuer for authentication (the card is not involved in “online PIN” verification). This is the typical transaction flow in ATMs. 3A signature might be the verification of choice for EMV in the United States with the deployment of EMV-compliant terminals still being finalised. 22 2.1.2.3 Transaction authorisation The terminal sends the transaction details to the card, including the transaction value, currency, timestamp, nonce (which is also called the Unpredictable Number (UN), since it should be random), and other transaction verification data. The card generates a MAC on the data received from the terminal, as well as a counter value stored in the card, called the Application Transaction Counter (ATC), which is incremented for each transaction, and, optionally, the Issuer Application Data (IAD), which contains proprietary data that must be sent from the card to the issuer. The MAC, ATC, and IAD are collectively called the Authorisation ReQuest Cryptogram (ARQC). The terminal then relays to the issuer the transaction data, and the ARQC obtained from the card. The issuer uses the transaction data received by the terminal to generate a new MAC that must match the MAC contained in the received ARQC. The issuer performs back-end checks to verify that the transaction can be executed, e.g., verifying that the cardholder’s account contains enough funds and that the card is not blacklisted. The issuer generates a response based on the checks performed, and sends to the terminal an Authorisation Response Code (ARC) and an Authorisation ResPonse Cryptogram (ARPC) which is a MAC of ARQC ⊕ ARC. The terminal uses the ARC value to determine whether to accept or reject the transaction, and passes ARPC and ARC to the card, which generates the same ARPC value using the ARC relayed by the terminal. The card generates a Transaction Certificate (TC), which consists of ATC, IAD, and a MAC of ARC, transaction data, ATC and IAD. The terminal sends the TC value to the issuer, which is used for processing payments. T → C : data = (value, currency, timestamp, UN) C → T : ARQC = [ATC, IAD, MAC(data, ATC, IAD)] T → I : data, ARQC I → T : ARC, ARPC = [MAC(ARQC ⊕ ARC)] T → C : ARC, ARPC C → T : TC = [ATC, IAD, MAC(ARC, data, ATC, IAD)] T → I : TC 2.1.3 Strengths The main strengths of EMV can be summarised as follows: 1. Offline capabilities: EMV transactions can be processed online and offline. In an offline transaction, the card and merchant terminal exchange data and decide whether a transaction should be accepted based on issuer-set risk parameters and configuration. This adds critical resilience features that are necessary when network 23 connectivity is unavailable. Related to this thesis, this is one of the most important strengths of any payment system we reviewed. 2. Dedicated and tamper-resistant hardware: EMV implementations provide tamper-resistant hardware for customers (cards), merchants (PoS terminals), and banks (HSMs). Smartcards are offline dedicated hardware devices used to authorise EMV transactions, which are difficult to compromise because of multiple tampering countermeasures, and are only used for payment processing. This trusted hardware guarantees the correctness of the EMV protocol, by protecting the keys used to generate MACs and signatures. Possible issues that undermine the security and trust assumptions are discussed below. 3. Dedicated network and system resilience: EMV utilises a dedicated network for clearing transactions. This network is not built on top of the Internet and is less vulnerable to Distributed-Denial-of-Service (DDoS) attacks; this provides resilience for network connections. However, we discuss in the following section issues regarding points of failure in the network. 4. Verification and auditability: The cryptography used in EMV provides the ability to create audit logs to verify transactions, with some caveats discussed later. 5. Usability: Familiarity and convenience are some of EMV’s strong points. Customers are familiar with using bank cards, finding the transaction speed acceptable. The use of trusted hardware to perform critical operations also minimises errors and, therefore, increases usability. 2.1.4 Weaknesses We discuss below issues that compromise the reliability of EMV or undermine the security guarantees provided by the trusted hardware. We focus primarily on issues around system resilience. 1. Complexity: The EMV standard has grown substantially with added features. Optimisations [4] and feature interaction [26] have led to various attacks, discussed later. Moreover, substantial local customisation and new services open up the possibility of lurking vulnerabilities. 2. Compromised hardware and lack of trusted display: Although trusted hard- ware makes it more difficult to compromise the security guarantees and assumptions in the EMV system, there are issues that undermine such guarantees and the as- sumptions on which they rest. First, the customer does not control the PIN-entry 24 device, which can compromise the transaction through, e.g., skimmers installed on the terminal. Second, because of the lack of a trusted display, the customer cannot be sure they are approving the correct transaction details, since the terminal may pass any data to the card for signature. 3. Points of failure: Although a dedicated network makes the EMV payment network less vulnerable to attacks, an authenticated transaction must still be relayed to an Issuer through the payment processor’s network (Visa or MasterCard) in order to determine if the customer has available funds, unless the value of the transaction is small. Yet on 1 June 2018, Visa’s network and infrastructure experienced “hardware failure” and started rejecting EMV transactions. Although Visa can normally process thousands of transactions per second (24,000 transactions per second [115], and even more in peak shopping periods such as Christmas), the network failed to process payments by customers in Europe [119]. 4. Maintenance and backwards compatibility: The Internet carries transaction data when a customer purchases goods or services through an online retailer, and again from online merchants and retailers to payment processors, where they are ultimately processed using the EMV protocol. The network infrastructure has to support both upgrades and backwards compatibility while safeguarding the integrity of the process as well as protecting the customer’s information. This creates conflicts. Upgrading systems to use more recent and secure standards must be carefully planned since upgrades may break backwards compatibility. For example, since the end of June 2018, the encryption used to protect transactions must comply with newer versions of TLS, yet older devices (e.g., older smartphone models running unsupported operating systems) may be rendered incapable of transmitting transaction details since they only support older versions of TLS, or what the PCI Security Standards Council calls “early TLS” (TLS 1.0 must not be used after June 2018) [61]. 5. Secure transmission of transaction data: Related to the point above, online merchants incur non-negligible costs of securely storing, processing, and transmitting payment information to be validated through the payment processor. The protocols used to secure the transmission create various possible failure points, especially if the implementations do not conform to known protocols and standards. 2.1.5 Attacks EMV weaknesses are largely due to implementation optimisations, rather than a result of the protocol design itself. These optimisations can result in exploits. We provide summaries of attacks that demonstrate how implementation errors lead to broken deployments: 25 1. No-PIN attack [93]: As discussed in Section 2.1.2.2, a terminal sends a message to the card to authenticate the PIN entered by the customer. The attacker executes a Man-in-The-Middle (MiTM) attack to remove the PIN from the message, so that the card thinks it is doing a chip-and-signature transaction. The card then sends a response to the terminal, which is interpreted as indicating the PIN was correct, even if it was not. This MiTM attack is performed in the cardholder verification stage by intercepting the query sent from the card to the terminal and replying with PIN OK (bytes 0x9000). The card, on the other hand, believes that no PIN was entered, and it simply assumes that an alternative verification method was used (e.g., signature or no verification). 2. Relay attack [48]: This attack captures transaction data at terminal A and relays it to a remote terminal, B. The victim authenticates a transaction thinking they are paying for the items they want to purchase. However, the authorisation the victim provides is for a transaction relayed by the merchant’s accomplice for a quite different purchase. This exploits EMV transaction authorisation, discussed in Section 2.1.2.3. Relaying the transaction data from terminal B via a card controlled by an accomplice to the malicious terminal A (controlled by the merchant and presented to the victim) means that the victim will authenticate the remote transaction rather than the transaction displayed to them. The lack of a trusted display means that the victims have no means of authenticating transaction details. This attack requires a low latency channel to transmit the data between the two locations. The technical means to overcome the delay in communication between the card and the remote terminal is crucial for this attack to work, since the EMV protocol defines a timeout to receive responses from either the card or the terminal. More time can be requested to delay timeouts by sending bytes 0x0060 to the terminal from the card, or from the MiTM device between the card and the terminal, to indicate a “more time requested” message; the intended non-malicious use of this message is to allow cardholders more time to enter their PIN, used in the cardholder verification stage. 3. Pre-play attack [27]: The EMV protocol specification mandates UNs as nonces, but the implementation is often defective, with terminals relying on counters, timestamps, or other non-random numbers. The card counter does not establish the freshness of the transaction; it is the UN value that indicates to the issuer that a terminal issued a new nonce. However, the issuer relies on the terminal for this, and predictable numbers can be generated on a malicious terminal. The effect is that just as the relay attack can shift transactions in space, so the preplay attack can shift them in time: a malicious terminal can get a card to authenticate a number of transactions, which are submitted for payment later. The lesson learned here is that the entity 26 relying on unpredictable values should be in control of issuing them: the bank should control the generation of random numbers, not the terminal [5]. However, if the UNs were always generated by the issuing bank, the bank would have to be online. Designing EMV so that it works offline presumably led to the decision to allow the merchant terminal to stand in for the bank in UN generation – an example of a resilience versus security tradeoff. Given that this tradeoff costs perhaps a few millions in fraud but facilitated many billions in transactions during network outages, it may have been in principle the right decision, although the implementation could have been better. The risk of preplay attacks might have been mitigated by better tamper-resistance of PIN entry devices or by better online fraud detection systems. 4. Contactless attack [51]: EMV does not require that a terminal authenticates itself to the card. This vulnerability is exploited to seek contactless EMV cards using, e.g., a smartphone or a card reader, to request transaction authorisation. The card may remain in the cardholder’s pocket or wallet and does not require cardholder verification using PIN entry. This can be seen as a digital pickpocketing attack. 2.2 Bitcoin The cryptocurrency Bitcoin [95] was announced in late 2008 on the crypto mailing list, and the first reference implementation was provided in January 2009. The creator or creators are unknown: the name Satoshi Nakamoto was used by the entity who created and announced Bitcoin, but this name is widely considered to be a pseudonym. The common feature of cryptocurrencies is that they rely on public key cryptography to create, verify, and process payments (hence the name). Cryptocurrencies surfaced after decades of attempts to establish digital currencies; previous attempts to produce a feasible solution struggled to gain adoption for various reasons including performance, fraud, and patents. Earlier attempts at digital money include Chaum’s “Blind signatures for untraceable payments” [33], W. Dai’s “B-money” [38], and Chaum et al.’s “Untraceable electronic cash” [34]. Chaum’s proposals were influential in designing non-bank and privacy-preserving systems using blind signatures. Crypto-anarchists and libertarians had sought digital money free from government control since the 1990s [82, 83, 84]. There are many aspects of Bitcoin that are not discussed in detail here as they are beyond the scope of this research. The following sections focus on the resilience of the system with regards to performance and throughput, network issues, and other issues that effect its scalability. We discuss the system and protocol as designed, and highlight the difference between theory and practice where relevant. 27 2.2.1 Bitcoin overview Bitcoin is a decentralised Peer-to-Peer (P2P) payment system. The Bitcoin payment system does not have a bank server to act as a central root of trust; this is one of the main differences between cryptocurrencies and other payment systems. One of the main problems with maintaining central payment servers, and a motivation to create decentralised payment systems, is system resilience: those servers can be vulnerable to system disruption via, e.g., DoS attacks. The promise of a decentralised system is that in order to cripple the entire system many nodes must be neutralised; at least in theory this is what Bitcoin tries to maintain (we discuss this further in Section 2.2.3). A bitcoin4 is not a physical token or currency. Instead, owning a bitcoin in effect means owning the private key5 required to spend that bitcoin. Spending a bitcoin involves changing the record of ownership, effectively changing control of a bitcoin from one key to another. To change ownership, a receipt of the modification, created by the last owner, must now record who the new owner is. This receipt is a bitcoin transaction: a signed receipt determining the next owner of a certain quantity of bitcoins, signed by the previous owner of those bitcoins. Transactions include inputs, which refer to previous unspent transaction outputs (UTXOs) to be used in the current transaction, and outputs, which are Bitcoin addresses6. If A wants to pay B, then A must sign a transaction that 1) proves that A owns all the funds input to that transaction, and 2) declares that B is now the new owner of the funds included in that transaction7. How can B be sure that A did not spend the same bitcoins to, say, C in a double-spending attack? If B accepts a signed transaction from A directly, B could not be sure that A did not cheat by sending another signed transaction to C as well. Previous payment systems used either a central bank server, or hardware tamper-resistance, or both, to make double spending hard. Bitcoin’s novelty is that there is neither. Instead, for the principals involved in a transaction (A and B in the example above) to be sure that a transaction is final, the whole network must agree on the finality of their transaction. The global ledger – the blockchain – which creates the global consensus, records all transactions considered final by the Bitcoin network. This ledger will, by definition, include all the ownership changes and the ledger itself performs the role of a bank server. Transactions can include transaction fees, which are 1) a monetary incentive to en- 4We follow the convention of referring to the protocol as Bitcoin, the currency and its units as bitcoin or BTC. 5Bitcoin uses Elliptic Curve Cryptography (ECC). 6A Bitcoin address is a 160-bit hash of a public key as well as a checksum, represented in Base58. Addresses can be exchanged out-of-band. 7We provide a simplified example here; a transaction may contain multiple inputs that may be controlled by multiple principals so long as they all sign it, and multiple outputs too. 28 courage transaction processors8 and nodes in the network to validate and propagate transactions9, and 2) a tax to demotivate spam (e.g., a high number of low-value transac- tions). Fees are the unclaimed outputs of a transaction, and the processor of the transaction can claim those outputs by assigning themselves as the new owners. Spam transactions and transaction fees are a focus of Chapter 4. Losing a record of a Bitcoin transaction is inconsequential; it is ownership of the controlling private key that matters. Therefore, a Bitcoin wallet is no more than a collection of private keys that spend bitcoins. Losing those private keys renders the bitcoins owned by those lost keys non-spendable. If the keys are stolen, then the thief can now spend the bitcoins. We now explain how global consensus is maintained, as well as the mining process and the performance issues in the Bitcoin protocol. We focus primarily on aspects of Bitcoin that are required to understand the problematic issues we have investigated in our research, which are discussed in detail in Chapter 4. 2.2.1.1 Global ledger and money supply In a decentralised system, without a known set of bank servers, who creates the global ledger and how is global consensus reached? In Bitcoin, participating nodes create a block which is a data structure containing verified transactions. Verification means checking that the signatures are valid using the public keys provided along with the transaction: the public keys must hash to the addresses of the unspent transaction outputs (UTXOs) that are inputs to that transaction. A block also includes a Merkle Tree of the verified transactions. Blocks, which are created roughly every 10 minutes, also include a reward, called the coinbase transaction, for creating the block. This special coinbase transaction has no prior history of ownership and is created by the node that produced that block. In effect the node that created the block is paying itself in the block it created, and announcing this reward to the entire network. This is how bitcoins are created and injected into the system10. A new block (bi) must include a hash of the previous block (bi−1). This chain of block hashes is known as the blockchain, and represents the global ledger, where a transaction must be included to be considered final. This blockchain includes all the historical changes of ownership of every bitcoin. The participating nodes not only verify transaction signatures, but also check that each input has not previously been spent. Thus, if a transaction authorising the transfer of funds from one owner to the next is already included 8We discuss later the role of transaction processors, called Miners. 9Transaction fees prove that the sender is incurring a cost, the assumption being that an economically rational entity would avoid incurring unnecessary fees. 10The block reward is halved every four years (approximately), and is now 12.5 BTC (about $87,500 at the current exchange rate). 29 in the blockchain, the previous owner cannot reuse the same private key to transfer the funds elsewhere, and any double-spending attempt should be rejected as invalid. 2.2.1.2 Mining and transaction processing As discussed earlier, a block contains a reward to incentivise the creation of blocks. This reward and process must be controlled, otherwise any node in the network can create blocks at will to claim rewards. The rate of creating blocks in the network must be throttled to avoid overwhelming the network with newly created blocks. Rapid block creation would render global consensus impossible to reach, since there would be no way for a node to determine which new block is the correct one to add to the blockchain. The ability to create blocks is determined and controlled using a Proof-of-Work (PoW) system that involves solving computationally difficult puzzles. PoW was proposed to fight spam [50] and was used in HashCash [14] as a DoS countermeasure. It is used in Bitcoin to ensure that a node needs to do substantial work to find a hash collision. This entails hashing the current block, which may include verified transactions, and must include the hash digest of the previous block, in order to find a hash that includes d zeros in the prefix, where d is referred to as the difficulty. The cryptographic strength of the hash function means that there is no faster way known of doing this than iterating the process with different random seed values (nonces) in the block until a correct collision is produced. This involves on average 2d−1 hash computations for a difficulty level of d. Finding such a solution entitles the successful principal to claim a block reward. This computationally intensive process of verifying transactions and creating blocks is called mining ; miners create blocks and claim rewards. Once a miner finds a new block, it is broadcast to the network, with an average propagation time of about 13 seconds [39]; thereupon the other miners start using it as the basis for finding the next block. The Bitcoin protocol recognises the longest branch of the blockchain as the correct one, creating an incentive for consensus. Bitcoin includes more monetary incentives: miners can claim transaction fees, explained in Section 2.2.1. Since a transaction remains non-final until it is included in a block, as discussed in Section 2.2.1.1, and miners can claim any unclaimed transaction outputs as transaction fees, they can select the transactions offering the highest fees to embed in a new block. They then work on the computational puzzle to create a valid block: once one miner wins this race, and their solution is accepted on the blockchain by a majority of other miners, that block will be built on and the claim to the block reward and transaction fees will become bankable. The intention behind mining, as discussed in the original Bitcoin paper [95], may have been naive. That paper includes the term “one-CPU-one-vote”, which primes the reader to appreciate the democratic values of the proposed system: the idea is that any one entity 30 would control a CPU and would, therefore, have a chance to contribute to the network and potentially earn block rewards. However, the mining ecosystem has long progressed from using CPUs for mining: the ability to parallelise hash computations led first to a move to utilise the superior capabilities of GPUs, then to field-programmable gate arrays (FPGAs), and finally Application-Specific Integrated Circuits (ASICs). ASICs produce better performance and now perform all the hashing of consequence. Another interesting development in the mining ecosystem is the cooperation between miners to create mining pools. Miners pool their resources and a pool operator sends all miners a block to hash repeatedly until a valid solution is produced, paying miners based on their contribution11. This has resulted in the centralisation of mining effort, and fierce competition between mining pools to claim block rewards by adding more hardware: the current network hash-rate (the rate at which the network can generate hashes collectively), is 35-40 tera hashes per second12. It is possible that multiple miners produce multiple valid blocks bi at epoch i. This would prevent global consensus since listening nodes will not be able to determine which block to add to the blockchain: a temporary fork can occur where the network splits by adopting multiple versions of the ledger, and this violates global consensus. Bitcoin resolves this issue by dictating that the fork that invested the most work, as determined by inspecting the blocks and their PoW, should be chosen as the correct blockchain. In practice, this scenario may not occur often. If it does occur, global consensus would be delayed until the fork is resolved. This opens up the system to disruption attacks. 2.2.1.3 Disruption attacks 1. 51% attack: If a miner creates a longer blockchain than the one in current use, then that miner can disrupt consensus as the longest blockchain is assumed to be the authoritative global ledger. This would cause previous transactions to be invalidated and considered non-final in the new alternative blockchain, and the attacking miner would claim rewards for all blocks included in the new blockchain, invalidating previous block rewards. Thus, the more difficult it is to create a block, the more difficult it becomes for an attacker to disrupt consensus and create a longer blockchain. Forks are problematic not least because they undermine Bitcoin’s finality and irreversibility claims. This is exacerbated by the fact that Bitcoin lacks any consumer protection or dispute resolution mechanisms. Nakamoto’s idea was that any coalition of miners controlling a minority of computational power (hash-rate) should not be able to out-mine a group of correctly-operating majority miners to create an 11There are a few variations of how mining pools work; we included a simplification of the most basic and earliest mining pool schemes. 12See https://www.blockchain.info/charts/hash-rate. 31 alternative blockchain [95, 21]. Three mining pools, at the time of writing, could collude to control more than 51% of the network hash-rate: BTC.com controls 27.9%, AntPool 12.4%, and SlushPool 11.1%. This undermines the claim that Bitcoin is decentralised: miners controlling the majority of the hash-rate have a higher chance of creating blocks, and therefore have a good chance to out-mine everybody else and dictate what transactions are included in the blockchain. 2. Selfish mining: There are circumstances in which miners might conduct an attack and disrupt or take over a blockchain with less than 51% of the hash rate. Active adversaries can monitor the network and strategically broadcast their blocks to claim rewards and disrupt competitors’ capabilities in what is called a selfish-mining attack [56]. An adversary broadcasts blocks when other miners are likely to be close to catching up with the adversary, and does not broadcast anything otherwise. Broadcasting a longer blockchain would invalidate non-malicious miners’ blocks and rewards, making their mining operations unprofitable. Given that miners invest their hash rate wherever the return is highest, this incentivises economically-rational miners to join an adversary who mounts a successful selfish-mining attack. Moreover, it is possible for an adversary who controls many IP addresses to connect to nodes in the Bitcoin network to control what data is relayed to those victim nodes, in what is called an “eclipse attack” [62]. Using an eclipse attack, the selfish-mining strategy can be more effective: an adversary who is well connected in the network can control which blocks are propagated in the network, censoring blocks created by other miners, and increase the chances of a successful selfish-mining attack13. 3. Double-spending attack: Using the ability to disrupt global consensus, as ex- plained above, a computationally powerful adversary can choose to double-spend their funds. This can happen when A pays B in transaction txB included in block bi and B waits for and accepts bi as final (at this point, B and A exchange goods or services based on the assumption that txB is final and irreversible). However, after the exchange, A creates two chained blocks bi ′ and bi+1 where either of those blocks includes a transaction that conflicts with txB, that pays someone else in txC using the same funds (or indeed A can reclaim the funds in txA). By Bitcoin’s rules, the longer blockchain (bi ′ and bi+1) will be chosen as the correct ledger, while bi will be discarded, rendering txB invalidated and non-final. Moreover, txB would be considered a double-spend attempt, since it conflicts with transaction txA or txC that is included in either bi ′ or bi+1. Therefore, B would provide their goods or services without being able to claim funds in txB [13, 72, 56]. The cost of disruption attacks is the computational cost of creating an alternative chain. 13See [30] for a discussion about combining eclipse and selfish-mining attacks. 32 2.2.1.4 Security economics and incentives The Bitcoin community decided, arbitrarily, to wait for six blocks in order to mitigate the risk of a fork happening in the blockchain that could compromise the global consensus (six blocks are created in approximately an hour). Regarding the double-spending attack explained above, this means that B would wait for five more blocks created after bi before considering a transaction as final. This increases the difficulty of A disrupting the consensus by creating an alternative blockchain that is longer than six blocks. In practice, users may not wait for six blocks to be created, and instead accept a transaction as valid and final once it is included in a new block. These users are then susceptible to double-spending attempts that can mount disruption attacks, discussed earlier. Even more problematic, if users accept transactions directly without waiting for a transaction to be included in a block, a double-spend attack is easier to mount: a malicious user can broadcast an alternative transaction, and if it is included in a block, it would render the original transaction invalid. It is possible to offer higher fees in the alternative transaction to incentivise miners to prioritise it for inclusion. Furthermore, it is possible that users perform off-chain transactions: creating and accepting transactions using protocols that bootstrap payment channels on the blockchain that create direct relationships to spend bitcoins, with the ability to close the payment channel and settle the balances by including the latest valid transaction on the blockchain [37]. The security-economics argument against malicious disruptions is that an economically- rational miner would use their computational capabilities to operate correctly and mine blocks to claim rewards. A counter argument is that this would not prevent an adversary whose incentives are not purely economic, or if the double-spend attack involves sums that are much larger than what a miner would be able to claim in rewards and fees. Another argument against malicious behaviour is that an adversary who operates this way would undermine the trust in the network, leading to a collapse of the value of bitcoin, which cuts the rewards a malicious miner could obtain in the future. The same counter argument of an adversary who is not economically rational applies. To mitigate disruptions in global consensus, given that the longest blockchain is considered as the global ledger and all alternatives (forks) are discarded, a miner who is economically rational will aim to maximise their profits as follows: 1. Broadcast blocks widely: this maximises the miner’s chances that their block will be added to the blockchain. All miners then add that block to the blockchain and start working on the next block. Block propagation takes 13 seconds on average [39], so a miner (or mining pool) will want to have their block accepted as correct before another valid block is propagated in the network. 33 2. Listen for block broadcasts and react immediately: a miner will stop working on their own version of the block if a valid block for that epoch is received. This demotivates working on alternative versions of the blockchain (forks). If a miner does not follow these two rules, they risk wasting computational effort producing a block that will not be adopted by the majority of the network, if the network has already progressed towards creating the next block. These incentives create the required stabilising factors to produce a consistent global ledger and achieve consensus in the network by demotivating forks and selfish behaviour [74, 106]. The monetary incentives are crucial to system stability when a competitive protocol is used to create consensus. 2.2.2 Bitcoin’s backlog: Mempool and UTXO Transactions are broadcast in the network to other peers for inclusion in a new block. Participating nodes listen on the Bitcoin P2P network14 for broadcasts of new hashes of transactions and blocks. The nodes then issue requests to their peers to retrieve previously unknown data: nodes send getblock and getrawtransaction to retrieve new blocks and transactions, respectively. The newly received data are validated, and the transactions are added to the node’s local memory in what is called the Mempool. A transaction stays in the Mempool until it is finally included in a block. However, memory pressure (or other algorithms used independently by nodes), may evict a transaction from an individual node’s Mempool before that transaction is included in the blockchain. Bitcoin transactions, as discussed earlier, consist of inputs and outputs: the inputs reference Unspent Transaction Outputs (UTXO): nodes who validate transactions sent in the network, especially miners who validate transactions to be included in a newly created block, keep track of UTXOs to quickly verify that a transaction includes unspent funds, as well as checking the signature. Due to the P2P broadcast of transactions, the Mempool and UTXO sets may not be the same for all nodes in the network. There is no global consensus for the Mempool and UTXO; it depends on a given node’s vantage point and what it receives in the P2P network. Since transactions remain in a participating node’s memory until they are finalised, this places resource pressure on the node to maintain correct operations. The pressure decreases each time miners include transactions in a new block: only then will all nodes evict transactions from their Mempool. However, a backlog of transactions remains in memory as the network works to include transactions: at the time of writing, the Mempool alone is 3 GB in size, composed of distinct and previously unseen transactions15. 14By default, Bitcoin uses TCP/8333 for communication. 15During increased non-malicious activity, or possible DoS attacks, the Mempool can increase substan- tially. In May 2017, and between November 2017 and January 2018, the Mempool increased to almost 100 GB in size. The latter period witnessed a substantial increase in the price of bitcoins, and may have 34 Requiring more resources as the Mempool and backlog grow could open up a node to a targeted attack to deplete its available resources, possibly leading to the node crashing if it cannot keep up with resource pressure. More problematic for system performance is that miners can create empty blocks, to claim block rewards without including any transactions (besides the coinbase transaction needed to claim the reward). This causes the blockchain to grow with empty blocks, as miners receive rewards without alleviating the backlog in the Mempool. Miners have no incentive to process and validate transactions other than transaction fees, which tend to be very small except at times of congestion. In fact, a rational miner could have a negative incentive to include transactions in blocks, since including any invalid transaction would result in their block being rejected. The Bitcoin backlog is a central focus of the spam campaign we examined in 2015. We provide the details of our analysis in Chapter 4. 2.2.3 Novel features One of the main novelties in Bitcoin is its attempt to produce robust global consensus of a variable number of anonymous nodes without a central server. Previous protocols for achieving distributed consensus deploy voting procedures, e.g., Paxos [75] and PBFT [32], in order to replicate the ledger among a known set of servers (closed membership). The main problem with such systems is that the known set of servers are vulnerable to disruption, such as a DoS attack. Another problem is that voting protocols do not scale well with an increase in the number of servers. Furthermore, standard voting protocols for distributed consensus assume a known set of validators (n), of which a threshold (k) must be available and operating correctly to make progress and create consensus, despite a number (f) of faulty or compromised validators. To make things more efficient, a leader is chosen in a given epoch (or remains as a leader for a few epochs) to manage tasks and messages sent by validators. Compromising the current leader cripples progress and severely degrades system efficacy. In contrast, Bitcoin promised, for the first time, a leaderless consensus protocol, without the k-of-n threshold requirement. This means that the entity (leader) announcing the progress in the network (the next block) is hard to predict among the set of all participating nodes (miners), meaning there is no single point of failure to compromise or DoS. Therefore, to stop the system from operating correctly, many participating nodes must be neutralised; this increases the work an adversary needs to perform. In practice, we discussed the aggregation of mining power in mining pools in Section 2.2.1.2; prominent miners and mining pools are well-known in the Bitcoin network, making them vulnerable targets to their competitors (other mining pools) or disruptive adversaries. been due to an increase in speculative trading transactions. 35 Despite the possibility of compromise or disruption, the protocol is open to all (open membership): anyone with sufficient computational power can start mining to produce blocks, and maintain progress and consensus in the network. However, the barrier to entry is high. If major mining pools become unavailable (due to a DoS attack, nation- state censorship, or simply working on cryptocurrencies other than Bitcoin), substantial computational power will still be needed to start mining to replace the network hash-rate. Alternatively, the entire network must wait until the difficulty d and hash-rate, discussed in Section 2.2.1.2, is globally recalibrated to reflect the decrease in mining power16. This recalibration happens every two weeks, which is a theoretical downtime for Bitcoin if the current computational power suddenly vanished. The finality of Bitcoin’s decentralised consensus comes from the substantial compu- tational power required to mine blocks by solving PoW puzzles. The assumption is that after several blocks are mined, it is substantially more difficult to create an alternative blockchain that would be longer than the currently known one. Thus, it is assumed that miners are incentivised to operate correctly and gain rewards rather than attack the network. Therefore, Bitcoin depends on whether its incentives for non-malicious behaviour produce the necessary stabilising factors to maintain global consensus. 2.2.4 Strengths The following discussion is not an exhaustive list of Bitcoin’s strengths and weaknesses. Rather, we discuss issues that are related to this thesis. Bitcoin strengths can be sum- marised as follows17: 1. Trusted hardware is not required: Bitcoin does not involve or require dedicated tamper-resistant hardware. On one hand, this cuts the barrier to entry: users need only download the necessary software (a Bitcoin client implementation) to participate. It also cuts the cost of participation (sending and receiving payments), and increases mobility in the sense of portability of data to other machines. Bitcoin relies on the strength of its protocol, and the mathematics of the cryptography, rather than the protections provided by hardware, such as smart cards or HSMs on which new attacks might be found. On the other hand, Bitcoin implementations can use hardware to resolve usability, complexity, and security issues. For example, 16Recalibration is done through software clients, measuring the rate of block creation in the network and adjusting the acceptable value for d to enforce an average block-creation time of 10 minutes. 17Note that we classify the strengths and weaknesses from a perspective of consumer protection and dispute resolution. For example, the finality and irreversibility of a transaction can be viewed as a strength by someone who requires transactions to be finalised immediately, yet from a consumer protection perspective, the ability to reverse transactions to recover from fraud and error is crucial. In Bitcoin, transaction irreversibility is viewed as a feature, but from our more traditional point of view of prioritising consumer protection, it is a bug. 36 offline hardware wallets protect private keys and create transaction signatures within the tamper-resistant device. This prevents the keys from being exposed unprotected on user devices (e.g., smartphones or laptops). However, this comes at a high cost if the hardware wallet is lost or stolen: in most cases, backups and export of the hardware-generated keys are not possible or non-trivial, and losing access to the keys means that the bitcoins controlled by those keys are forever irrecoverable. 2. Censorship-resistance: The decentralised consensus protocol means that there is no central entity, or a group of entities, that can be seized, ordered to disclose user information, or freeze user accounts. Even if major miners are unreachable, other miners can take their place and maintain the blockchain. The main assumption here is that there exists no powerful adversary controlling the entire Internet with means to disrupt all communication links upon the detection of Bitcoin traffic. 3. No single point of failure; consensus without a leader: Bitcoin’s consensus protocol provides a means of reaching consensus without knowing the next epoch’s leader, which provides enhanced resilience against network and system disruption. However, in our own research, we show that the network itself is not Bitcoin’s vulnerability, but rather it is the limited throughput that can be overwhelmed. We discuss this further in the next section, and provide the details in Chapter 4. 4. Low transaction fees (for now): Transaction fees are relatively low compared with traditional financial systems, we discuss this further in the following section. 5. Pseudonymity: Bitcoin uses pseudonymous identities (the hash of the public keys), and no personal information is required to participate in the network. This feature appealed to privacy-conscious users, at least initially. However, this point is also mentioned in Bitcoin’s weaknesses, since the level of privacy is not optimal. 2.2.5 Weaknesses We discuss several issues that hinder scaling the network as more users adopt Bitcoin: 1. Lack of offline capabilities: Payments can only be accepted online, because a transaction must be sent to miners who include that transaction in a block in a globally-verifiable public blockchain ledger. Accepting the payment also requires online connectivity; the blockchain declares that a bitcoin’s ownership has changed. This enables the recipient to verify, through the network, that they received new bitcoins. There is no means for processing a payment directly between peers in an offline transaction, including peers communicating directly through ad-hoc local channels such as Wi-Fi peering, Bluetooth, NFC, etc., to finalise a payment; a global 37 online consensus must be established. In contrast, EMV can fall back to offline processing, where terminals can accept signed transactions issued by the trusted hardware, to be processed later when system connectivity is restored. 2. Inefficiency: The process of verifying transactions and including them in the blockchain needs the equivalent of a supercomputer, due to the current difficulty of solving the computational puzzle. It requires significant power consumption, and someone to pay the bill; and this requirement is only increasing. This inefficient use of power must continue in perpetuity to maintain the blockchain in a competitive environment [76, 77]. 3. Irreversibility and finality: Once a signed transaction is included in the blockchain, and assuming the current block remains valid, that transaction is irreversible and the funds cannot be reclaimed. Bitcoin proponents think this is a feature rather than a bug: irreversibility prevents fraudulent users from reversing transactions after receiving goods or services, which is a major venue for fraud in EMV-based systems through charge-backs [6, 5]. However, since there is no customer service number to call to try to reverse mistakes, rectifying mistakes is impossible. Furthermore, Bitcoin (and similar cryptocurrencies that require a transaction to be included in a global ledger) experience delayed processing of transactions, compared with EMV transactions, which take seconds to authorise. Although a Bitcoin transaction may also take only seconds to propagate in the network once a payer signs and broadcasts it, it must be included in a block to have effect, and a block is created, on average, every ten minutes. Moreover, for reasons outlined earlier, users are advised to wait until five more blocks are created to prevent forks leading to non-finality of transactions. However, a transaction may not be included in the next block, and remains in limbo until a miner selects it from the Mempool to include it in a new block. This means that the 60-minute processing time, to ensure finality, can be seen as a lower bound, particularly at times of congestion. 4. Lack of effective privacy: The blockchain is a public ledger accessible by anyone who cares to download it from the Bitcoin network18. Bitcoin protects users’ privacy through the use of pseudonyms. This is also considered a feature, yet correlating metadata of a few transactions (timing, value, frequency, etc.) is often an easy way to link a user to a pseudonym. Researchers have already pointed out that Bitcoin has significant problems with regards to privacy [11, 101, 105]. Knowing a user’s address is sufficient to investigate the entire blockchain for all purchases made, and funds received, using that address. This can be considered a privacy downgrade when compared to the level of privacy a user of traditional financial services expects: 18The blockchain is currently about 170 GB in size. 38 information about bank transactions is only available to the payer, the banks involved, the payee, and authorities such as the police and financial regulators who get access using warrants. Other researchers investigated correlating users’ online activity and purchases, revealed through web cookies, with their Bitcoin transactions [60]. With the EU’s General Data Protection Regulation (GDPR) which went into effect in May 2018, it will be interesting to follow how that effects issues regarding data control and deletion: data on the blockchain is supposed to be immutable and cannot be deleted upon request. 5. Scalability and performance issues: Bitcoin faces scalability issues with regards to increasing the number of transactions included in each block. Each block would significantly increase in size leading to propagation and consensus issues (the blocks will be slower to propagate). There would be an increase in the rate of growth of the blockchain, which must be maintained by miners to verify transactions. It is possible for users who are not miners to use Simplified Payment Verification (SPV) and this was discussed in the original Bitcoin proposal [95], but this comes at a cost of having to trust other nodes to do the mining and thus to verify transactions: Bitcoin promised a decentralised trustless payment system without having to trust super nodes. We discuss in Chapter 4 how Bitcoin’s performance can be crippled through spam campaigns that substantially increase system resource pressure and delay transaction processing. 6. Transaction fees and a future without block rewards: Fees are only low for now; when the block rewards are substantially lower than what they are now, mining will not be profitable (unless the value of bitcoin increases substantially), and transaction fees must be increased to cover the amount of computation performed by miners. Bitcoin is bound to lose its major advantage over more expensive money transfer systems. In fact, this is already happening when the value of bitcoin increases (the value and exchange rates are volatile). Bitcoin transaction fees are equal to, if not higher than, traditional payment systems, especially when taking into account the backlog to clear payments, and the higher transaction fees a user must pay in order to prioritise their transactions. The security economics of Bitcoin incentivises correct behaviour through block rewards. However, the Bitcoin protocol dictates that the block reward is halved approximately every four years until the block reward is eliminated. As discussed in [31], a lower block reward may result in an unstable network if miners rely on transaction fees rather than block rewards. Miners would have an incentive to mine blocks in order to maximise transaction fees, rather than to work on the longest blockchain. For reasons discussed in Sections 2.2.1.2 and 2.2.1.3, this can undermine the stabilising factors necessary for Bitcoin’s global consensus. 39 7. Non-fungibility: Not all bitcoins are equal; each bitcoin differs in its ownership history. This can be a problem if some bitcoins are blacklisted due to illegal activity. This is not a problem for cash since every coin or note is fungible (serial numbers are usually ignored) [92]. There are proposals to address this non-fungibility issue for cryptocurrencies, and to provide more privacy features [87, 107]. 8. Usability issues and increased risk: A user must maintain the private keys required to spend her bitcoins. Can we expect users not to lose private keys? Compare this with requiring users to remember their 4-digit PINs for their EMV cards or their online banking passwords; both can be reset if required, whereas losing a private key destroys the value of those bitcoins that are spendable using that key. In practice, however, there are a number of exchanges and so-called online wallets that manage bitcoins on behalf of users, just like banks do (except these online wallets operate without any regulatory oversight). Users open an online wallet through a provider by creating an online account protected with only a username and password. This effectively removes the “crypto” in cryptocurrency: the accounts are no longer managed directly by the user, and the online wallet provider must be trusted to manage keys on behalf of users. This leaves the customers with no recourse as there are no consumer protection mechanisms to retrieve the funds if the provider is compromised by hackers or insiders who steal private keys, or the provider absconds with the bitcoins. Moreover, due to unfamiliarity with Bitcoin, users are apprehensive about joining a financial system they do not understand and perceive as risky if not assured by local governments that their funds are secure. This is exacerbated if the average user is only exposed to Bitcoin’s misuse for ransomware, illegal online drug sales, and other so-called “dark web” activity. 9. Barriers to entry and mobility: Although the barrier to entry to participate in the system (to send and receive payments) is low, converting fiat currency (cash or transfer from bank accounts) to bitcoins faces many hurdles and the friction is increasing: in June 2018, Wells Fargo was the latest bank to ban purchases of cryptocurrencies using its cards19. Users face obstacles to cash-in and cash-out from the system, i.e., to convert fiat into bitcoins, or bitcoins into fiat, respectively. 10. Non-compliance with KYC and AML regulations: Bitcoin’s pseudonymity is incompatible with Know-Your-Customer (KYC) and Anti-Money-Laundering (AML) regulation, which requires maintaining accurate and verifiable information regarding identities and ownership of financial assets. 19See “Wells Fargo Bans Cryptocurrency Purchases on Its Credit Cards” https://www.bloomberg.com/ news/articles/2018-06-11/wells-fargo-bans-cryptocurrency-purchases-on-its-credit-cards. 40 11. Problems with Bitcoin’s consensus: The ability to mount disruption attacks (e.g., 51% attacks and selfish mining, as discussed in Section 2.2.1.3), can undermine trust in the system’s capability to reach consensus on the global ledger. A persistent attack might confuse the network, can result in double-spends, demotivate non- malicious miners since they can lose their block rewards, lead to an exodus from the system resulting from a lack of trust and, ultimately, to the collapse of the value of the currency. This is more problematic if a motivated adversary is not economically rational; the security economics embedded in Bitcoin would then not produce the necessary stabilising factors for a global consensus. Hence, Bitcoin’s protocol can be called doubtful consensus as pointed out by Laurie in [76, 77]: one can never be sure that someone else is not secretly working on a longer alternative blockchain that can nullify the current blockchain20. 12. Network attacks: Eclipse attacks [62], as discussed in 2.2.1.3, can result in censor- ing victim nodes’ view of the network, and disrupting consensus in the network by preventing block propagation. Moreover, an adversary can hijack network traffic, by advertising new BGP prefixes, to segment the network into non-communicating quorums [12], to achieve similar results to an eclipse attack, or to hijack mining pool traffic (to earn mining pool rewards or to disrupt competitors). In P2P systems, network attacks similar to the ones highlighted here, and with the absence of a global view and coordination to establish trust and reputation metrics, demonstrate that such systems are vulnerable to attack without an authority being able to mitigate the presence of malicious nodes who can disrupt various critical components in the network and remain undetected. We discuss in Chapter 3 a scheme to create trust and reputation metrics in non-centralised open-membership systems. 2.3 Mobile payments We now discuss mobile payment systems, also referred to as mobile money, that are primarily used in developing countries. These payment systems are relevant to our work on offline payments for financial inclusion, which we discuss in Chapters 5 and 6. The type of mobile payment systems we focus on are different from payment applications in developed countries, such as Apple Pay or Android Pay which are in effect extensions of EMV ported to smartphones: they connect users to banks and the traditional financial system. A major differentiator between mobile payments and EMV extensions is that the 20We had mentioned this criticism in our Master’s thesis. The final artefact of the Master’s thesis, which was inspired by Laurie’s papers and finalised in 2014 [17], was a distributed payment system in which supernodes process transactions and achieve consensus by replicating the ledger using a Byzantine Fault Tolerance (BFT) voting protocol (we used Practical BFT (PBFT) [32]). 41 former usually offer their services primarily through the Subscriber Identity Module (SIM) that Mobile Network Operators (MNOs) provide to their customers (these systems are typically operated by MNOs), while the latter require both smartphones and conventional bank accounts, both of which are less widely used in developing countries. Since the MNO is able to program the SIM, another differentiator between mobile payment systems and payment apps is that the program and code for the former resides in the SIM. Moreover, these mobile payment systems are not directly connected to banks and the traditional financial system; the MNO (or another payment service provider) controls the customers’ balances and performs the required operations to debit or credit accounts and finalise transactions. 2.3.1 Mobile payments overview Mobile payment systems in Less Developed Countries (LDCs) rely on Short Message Service (SMS) messages or Unstructured Supplementary Service Data (USSD) sessions, supported by a SIM card issued by a phone company. If the payment operator is not an MNO, they can pay for access to a USSD channel operated by an MNO to provide their payment services. Thus, mobile payment systems require minimal connectivity in the form of GSM data transmission to communicate with the payment operator; cellular data plans connecting a device to the operator using 2G, 3G, or 4G are not required to process transactions. Therefore, mobile payments are accessible on all phones that implement the GSM standard. This low barrier to entry is crucial to provide services to users in LDCs who use “feature” phones (also called basic phones), which have minimal communication capabilities compared to smartphones; feature phones may not include a camera, Bluetooth or NFC connectivity, or any other form of ad-hoc communication such as Wi-Fi pairing21. GSMA Intelligence estimates that by 2020, the penetration of smartphones will be 57% in Africa and 66% globally (the estimate was 23% for Africa and 45% globally in 2015 when we started our research) [65]. Thus, payment operators who aim to attract many customers in LDCs must provide services that are compatible with feature phones. We focused our research to cater for users with feature phones, because our project was aimed at assisting financial inclusion efforts and the poorest demographics still widely use feature phones. About 200 mobile-phone payment systems have been set up in various LDCs, and about 20 of them have become mainstream22. The “killer app” was migrant remittances in an environment where most people had no access to conventional banking services but where mobile phones were spreading rapidly. The best-known is Kenya’s M-Pesa [64, 86] 21It is not unusual that even if a device includes, e.g., a camera that it would be damaged and unusable, and must be supported with tape. 22See GSMA’s “Mobile for Development” website (using the “Mobile Money” filter): http://www.gsma. com/mobilefordevelopment/tracker 42 whose operator Safaricom processes payments amounting to a substantial proportion of Kenya’s GDP23, becoming a larger payment service provider than any local bank, and has become the monopoly MNO. 2.3.1.1 Payment operations and services The SIM can verify a user PIN and contains keys for authenticating the customer; it can thus establish secure communication with a payment server. The server keeps customers’ transaction histories, just like in a conventional bank. Customers can check their balance and make payments using menu options on the phone’s screen, displayed by the trusted SIM card (the program residing on the SIM is called an applet). The phone number doubles as a bank account number. Cash-in and cash-out services are provided to customers by the payment operator to facilitate exchanging fiat currency into electronic float and vice versa. These services are operated by a network of mobile money agents, who earn commissions based on the services provided (e.g., withdrawing funds from the system incurs a fee). To increase their own balance, a customer provides cash to an agent who will perform a transaction through the payment processor. This will result in the customer’s electronic balance being increased by the amount they provided to the agent minus any transaction fees incurred (the agent earns a commission from the payment processor). The customer and agent are both notified of the result of the transaction. The customer can also verify their own balance at any time using their phone: the phone sends a balance-inquiry request to the payment processor. Some providers charge for inquiries and other services to rate-limit their use. To make a payment, a customer first enters the payee’s phone number (shops have numbers prominently displayed, with large shops having short codes). The customer must then enter the amount, followed by their PIN to authorise the transaction. A payment message is sent to the server; if the funds are available, they are transferred, and the merchant or other payee is informed. 2.3.1.2 Reasons for success and remaining challenges Attempts over the past ten years to set up similar phone payment systems in developed countries (such as Pingit and PayM in Britain) have generated little traction, as people in such countries already have plenty ways of paying both online and offline. There is no need to use an MNO’s service in an already competitive financial system. In LDCs, however, banking services can be unreliable or absent, forcing individuals to look for alternatives: M-Pesa started as a micro lending service, but customers used it to pay each other, since banks were shut down for a few days when M-Pesa was being rolled out. Customers saw the MNO’s service as a more reliable alternative with a lower barrier for entry: to 23See mobile payment statistics at the Central Bank of Kenya: https://www.centralbank.go.ke 43 open a bank account, customers need to provide proof of identity that many did not have and could not obtain, especially in the poorest communities – which did not have bank branches anyway. The MNO, through M-Pesa, provided an attractive and immediate alternative allowing customers to exchange cash for an electronic balance that they can then transfer to each other. A customer only needed to have a SIM provided by that MNO – and the MNOs had already built out networks of tens of thousands of agents nationwide. Later regulation for the mobile payment ecosystem had stricter requirements24 for KYC and AML [2, 86]; that came only after the MNO solidified its control over the mobile payment market, making it difficult for newcomers to undermine their monopoly. Regulation can be used after a monopoly establishes its presence to thwart attempts by newcomers to challenge it: the monopoly (Safaricom) uses regulation to accuse innovators of undermining its service, stalling the innovators’ efforts in lengthy legal battles25. Some of the reasons mobile payments were adopted quickly are shared with other forms of electronic payments. In the context of developing countries, remittances are a major use case for mobile payments: workers in the city send their savings to their families in their home village. In contrast to using and carrying cash, mobile payments are generally considered safer. The traditional methods used to send remittances included the post, and taxi and bus companies: the migrant worker would mail banknotes, or give them to a driver travelling to their village to hand over to their family in return for a fee; in both cases, theft or loss are known risks. Moreover, the availability of the mobile payment service through a customer’s phone, which can be accessed at any time, was a further reason for users to convert their money to electronic float. However, even with a high penetration of mobile usage and mobile payment subscrip- tions, cash is still pervasive even in countries that pioneered mobile payments [73]. The most likely reasons for this, which we confirmed during our visit to Kenya, are transaction fees and unreliable network coverage that renders the payment service unusable [2, 98]. The Safaricom network is claimed to cover 91% of the economy, but it only covers 80% of residential addresses – excluding many of the poorest demographics. As an example of the unreliability of the service, Safaricom’s M-Pesa was unavailable for 24-hour periods or more on various occasions [2, 120]. There is also congestion from time to time in Nairobi, which can leave people waiting at supermarket checkouts for 20 seconds awaiting payment confirmation. Another outage happened when MTN’s mobile payment services in Uganda were inaccessible for about two weeks, frustrating its users, including the agents 24We had to provide an official government ID to enable mobile payment services on our SIM when we visited Kenya to conduct our research. 25For example, see Safaricom’s battles with Equity bank over the bank’s use of overlay SIMs to circumvent the restrictions on phones (https://www.standardmedia.co.ke/business/article/ 2000135850/safaricom-loses-battle-to-block-equity-bank-s-thin-sim-card), and with BitPesa (https://www.bitpesa.co/blog/bitpesa-v-safaricom/). 44 whose wellbeing and source of income rely on providing services to earn commissions26. Customers were left helpless and could not spend their electronic funds. This is especially concerning for the poorest demographics when they need immediate and reliable access to their funds. The issues highlighted here (fees and unreliable networks) are some of the main issues we try to address in Chapters 5 and 6. We summarise in the following sections the strengths and weaknesses of mobile payment systems, as mentioned in this section and as relevant to our research. 2.3.2 Strengths 1. Dedicated and tamper-resistant hardware: As with EMV, mobile payments based on SIMs operate in a tamper-resistant environment, where all the critical operations are created within the SIM. Thus, the integrity of transactions is preserved. 2. Low barrier to entry: Mobile payments are accessible using any phone. No cellular data plan is required to enable most payment services. This substantially lowers the barrier to entry in order to make or receive payments. 3. Usability: Mobile payments use existing technologies, the mobile device and SIMs, without introducing unfamiliar hardware to the user. The payment process is as simple as sending a text. 4. Safety: Using a mobile payment service from a safe location to send funds remotely reduces the risk of theft or loss. These risks are non-negligible when an individual has to transform the funds personally, or pay unreliable taxi or bus companies to transport cash for a fee. 5. Availability: Mobile payments, when the system is online and running, are available 24/7, in contrast to bank offices which close after normal business hours. 2.3.3 Weaknesses 1. Unreliable network coverage and service disruption: When network coverage is unavailable, it locks up the customers’ funds and renders the service unusable. This can be a critical factor for adoption: if customers suffer a serious outage, they may cash-out their funds as soon as service is restored and never use the service again27. Just as with Bitcoin’s “lack of offline capabilities” weakness discussed in Section 2.2.5, mobile payments cannot be processed when the network is unavailable, as there are no means for processing payments between peers directly. 26This case was included in an unpublished report by Microsave “Mobile Money for the Poor (MM4P) – The Customer Journey/Experience Research Report” which was provided to us by Microsave. 27This happened after the mobile network in Uganda was shut down for several days during an election. 45 2. Transaction fees: Incurring a fee for every transaction is inconvenient and costly especially when considering face-to-face and low-value transactions. As discussed in many reports [86, 2, 73, 49], this is of particular concern for financial inclusion efforts aimed at the poorest communities. 2.4 Micropayments First-generation micropayments were proposed in the mid 1990s. We reviewed multiple proposals that suggest using micropayments to solve the “problem” of charging small amounts for Internet content (e.g., pay a penny for a single news article instead of a pound for the entire print edition). This included prominent designs such as PayWord and MicroMint [104], as well as NetCard [9]. Other designs aimed to solve scalability and overhead problems by introducing probabilistic schemes that do not result in incurring fees per transaction [102, 59, 70, 122]. Efficiency enhancements include outsourcing the acceptance of transactions to merchants instead of the bank (centralised provider), by creating transactions tied to the identity of the merchant as proposed in Millicent [59]. Decentralised verification alleviates the need to contact a bank to verify every transaction. The overview is based on simplified versions of PayWord, NetCard and Millicent. 2.4.1 Micropayments overview Customer C has an account with bank B, and wants to pay merchant M . Using a credit-based scheme, B creates and signs28 a stick 29 of values (v1,v2, ..., vn) C M . This stick must include the identities of both C and M as payer and payee respectively. To create the stick of digests, B creates a chain of hash digests where hi = H(vi); H is a hash digest function (e.g., SHA256), h1 is the root of the chain, and vi is the value to be hashed to create the digest; the result is a stick of hash digests ( H(v1), H(v2), ... H(vn) ) C M . B now sends the stick of hash digests to M , and sends the stick of values to C: this effectively opens a line of credit for C with M . C can now pay M by providing the values to M , where a payment is the pair (v`, `); ` is the value of the current transaction added to the number of previously used stick values ρ (ρ is 0 initially). M can then verify that the value received from C generates the same hash value which is contained in the stick of hash digests received from B, by performing a lookup and a single hash digest operation then a result comparison. Then C has n − ` credit remaining. M can immediately verify 28In the simplified protocol shown, all messages are assumed signed, and the necessary freshness indicators such as nonces and timestamps are removed for simplicity. Micropayment schemes assume a Public-Key Infrastructure (PKI) exists to authenticate public keys of the principals involved in the payments and the bank; we later list this assumption and requirements as weaknesses. 29“Stick of coins” – banks handle coins in sticks that have perhaps 50 or 100 coins rolled up in a paper or plastic wrapper. 46 that a transaction is valid, and can later send the highest payment (vµ, µ) to B to close the line of credit: Request: C → B : “I want to pay M” B → C : (v1, v2, ..., vn)CM B →M : (H(v1), H(v2), ... H(vn))CM Payment: C →M : (client’s v`, `) M : H(v`) =?= H(client’s v`) Claim payments: M → B : (vµ, µ) This offline verification scheme does not require the involvement of B in each payment (besides bootstrapping the trust and line of credit initially), since the transaction can be verified locally, and the stick is merchant-specific; C cannot double spend credit. 2.4.2 Strengths Micropayments use lightweight cryptography that can be quickly verified, along with decentralised verification to avoid points of failure: 1. Lightweight cryptography: Micropayment schemes use simple hash digests that can be computed quickly on modest CPUs. This is particularly important if the verification is performed on resource-constrained hardware. 2. Decentralised, offline verification: verification is decentralised and outsourced to the payee who can validate the payment and can prevent double-spend attempts (the stick is payee-specific). So the principals involved do not have to be connected to the bank, and can finalise the payment offline (a direct connection between a customer and merchant is sufficient to accept a payment.) This is in contrast to the cryptocurrencies discussed earlier where offline acceptance of a payment could open up the payee to a double-spend attack. Offline verification can produce efficient protocols as the bank does not need to be involved in verifying each transaction. If a decentralised verification approach is not possible, then low-value high-rate transactions could overburden the bank. Thus micropayment proposals illustrate how payee-specific lines of credit can enable offline verification, simultaneously increasing resilience and reducing server load. 47 2.4.3 Weaknesses These micropayment schemes failed to gain adoption. The main problems with the early proposals can be summarised as follows: 1. A solution looking for a problem: Early proposals did not provide users with any added benefits. Charging users for separate articles rather than access to an entire collection was not attractive. 2. Usability and unfamiliarity: Micropayments also imposed an increased cognitive overhead. Users were not ready to change the way they buy things by worrying about multiple micropayments rather than a single larger payment [112, 113]; users did not care for being “nickel-and-dimed” [114]. 3. Lack of suitable infrastructure: Key management issues are problematic. Ven- dors and brokers (using Payword’s terminology; referring to merchants and banks, respectively) need to exchange public keys or validate certificates. Furthermore, smartcard tamper-resistance in the 1990s was not optimal to provide the necessary safeguards to protect critical data. 2.5 Summary of challenges and research goals There are common problems with payment systems that we aim to address in our thesis: 1. Resilience and scalability: Most systems operate on fragile infrastructure that is vulnerable to DoS attacks and nation-state censorship. Yet payment systems normally work online, and fail to verify and accept transactions offline – that is, when communication to the payment system provider is unavailable for EMV-based systems and offline acceptance is not enabled; or, in Bitcoin, when a transaction cannot be sent to miners since online communication is unavailable. In Bitcoin, although the system is decentralised in its membership and participation, transaction processing is virtually centralised in a monolithic ledger; a more decentralised approach for processing transactions could increase system performance substantially. Micropayments may provide a useful means of outsourcing and decentralising payment verifications to the recipients, alleviating the need for always-on connectivity to a centralised verifier, and increasing system resilience while minimising verification overhead. These critical issues motivated our research. Therefore, we set out to provide mechanisms to alleviate the need to process every transaction online in order to achieve better system performance and resilience. 2. Transaction fees: In Bitcoin, transaction fees are used to demotivate spam, as well as prioritise a transaction to be processed faster (a higher transaction fee promotes 48 a transaction to be processed faster when it is held in limbo in the Mempool). However, this means that during spam campaigns, transaction fees will rise so that valuable transactions can compete with spam transactions that otherwise take up space in Bitcoin blocks. This substantially increases the cost of non-malicious participation in the network, and the fees rise along with processing times, decreasing the appeal of Bitcoin as an alternative financial system (we discuss the details in Chapter 4). With regards to mobile payment systems, payment operators charge a transaction fee for every transaction, harming the poorest demographics for whom transaction fees constitute a non-negligible share of their daily budgets. Payment operators justify transaction fees as necessary because 1) they provide a valuable service to their customers and this service adds overhead to their servers, and 2) that transaction fees rate-limit unwanted behaviour. Without fees, users can repeatedly send transactions back and forth overburdening the servers; this can scale into attacks if the process is automated resulting in high resource usage. We aim to offload the processing of transactions to users’ devices in order to alleviate the burden of processing transactions centrally. 3. Usability: Usability can be a problem if new hardware and unfamiliar operations are introduced, creating mental overhead and decreasing adoption. Therefore, we aim to maintain acceptable usability levels, by mimicking existing systems with which users are familiar, and avoid introducing new hardware that is unfamiliar and can be costly to replace if lost or stolen. We aim to avoid increasing the cognitive overhead necessary to use the system, in order to increase adoption and minimise user errors. 4. Lightweight protocols: We utilise lightweight protocols that are simple to verify and require minimal resource usage. A motivation for using such protocols is the requirement to deploy offline payments on resource-constrained devices. Moreover, through lightweight protocols, we can produce cryptographic authenticators that are exchanged by users to finalise payments, while maintaining an acceptable level of usability. The rest of this thesis is organised as follows. In Chapter 3 we study the social externalities of trust; how users can be incentivised to provide trustworthy nodes in a distributed system to help defeat service-denial attacks and nation-state censorship. This might be seen as a study on how we might crowdsource dependability. In Chapter 4 we consider how censorship and service-denial attacks might be used to undermine the Bitcoin ecosystem. This might be seen as a study of how a nation state might attempt to divide and defeat a payment system based on crowdsourcing. In Chapter 5 we take a more classic distributed-systems perspective on system resilience and scalability, by 49 introducing a new way to process payments offline without relying on the availability of network coverage. Decentralising payment verification, and outsourcing the operations to the users, can alleviate the burden of contacting centralised systems to process every transaction. We also include the usability assessment of the prototype we built for offline payments in Chapter 6. A secondary goal is to minimise the cost of providing payment systems, which then allows a payment provider to cut transaction fees. Moreover, by decentralising payment verifications that can be performed offline, we increase system resilience so users can continue making payments while a system provider is unavailable, and seamlessly maintain offline operations until a system is back online. We also use trusted hardware to address usability issues by minimising cognitive overhead and requirements from users to correctly handle critical data, and minimise the risks of data theft and tampering. Thus, we aim to produce an offline payment system that will work in less developed countries with acceptable usability levels and without introducing any unfamiliar hardware. 50 Chapter 3 The social externalities of trust We first tackle an issue discussed in the Background chapter: that the lack of trust and reputation of nodes participating in decentralised systems exposes nodes to various attacks. These include overwhelming the network with connections to control the content relayed to those victim nodes, thereby undermining global communication (eclipse attack [62]), or hijacking connections to fragment the network into non-communicating quorums [12]. We investigate in this chapter how utilising a payment system, with a pseudo-currency, can create useful metrics for building trust and reputation, and incentivise correct behaviour. Prior to working on this chapter, we reviewed micropayment proposals, as discussed in Section 2.4, since some of them are suggested as possible performance enhancements to reward participating nodes (which we review in the related work section in this chapter). We apply lessons learned from the Background chapter, such as decentralised verification discussed in Section 2.4.2. The pseudo-currency is a proxy for trust and reputation metrics. These metrics add accountability to open-membership systems, such as the Tor anonymity network and Bitcoin. We think these metrics may be a necessary building block to add resilience to such systems. The content of this chapter is adapted from our paper Do you believe in Tinker Bell? The social externalities of trust, by Khaled Baqer and Ross Anderson, which appears in the Proceedings of the International Workshop on Security Protocols 2015 [16]. 3.1 Summary In human societies, trust follows social consensus and crumbles when that is lost. However, the world of trust online is different. People trust CAs because they have to; Verisign and Comodo are dominant not because users trust them, but because merchants do. Two-sided market effects are bolstered by the hope that the large CAs are too big to fail. Proposed remedies from governments are little better; they declare themselves to be trusted and 51 appoint favoured contractors as their bishops. Academics have proposed, for example in SPKI/SDSI [103], that trust should flow from individual users’ decisions; but how can that be aggregated in incentive-compatible ways? The final part of the problem is that current CAs are not just powerful but all-powerful: a compromise can let a hostile actor not just take over your session or impersonate your bank, but “upgrade” the software on your computer. Omnipotent CAs with invisible failure modes are better seen as demons rather than as gods. Inspired by Tinker Bell, we propose a new approach: a trust service whose power arises directly from the number of users who decide to rely on it. Its power is limited to the provision of a single service, and failures to deliver this service should fairly rapidly become evident. As a proof of concept, we present a privacy-preserving reputation system to enhance quality of service in Tor, or a similar proxy network, with built-in incentives for correct behaviour. Tokens enable a node to interact directly with other nodes and are regulated by a distributed authority. Reputation is directly proportional to the number of tokens a node accumulates. By using blind signatures, we prevent the authority learning which entity has which tokens, so it cannot compromise privacy. Tokens lose value exponentially over time; this negative interest rate discourages hoarding. We demotivate costly system operations using taxes. We propose this reputation system not just as a concrete mechanism for systems requiring robust and privacy-preserving reputation metrics, but also as a thought experiment in how to fix the security economics of emergent trust. 3.2 Introduction Children know the story of Tinker Bell from JM Barrie’s 1904 play “Peter Pan; or, the boy who wouldn’t grow up”. She is a fairy who is about to fade away and die, but is revived when the actors get the audience to declare their belief in her. The underlying idea goes back at least to ancient Greek mythology: the Greek gods’ power waxed and waned depending on the number of men who sacrificed to them. More modern references include Jean Ray’s 1943 novel Malpertuis [100] puts it as (translation from French): “Men are not born of the whim or will of the gods, on the contrary, gods owe their existence to the belief of men. Should this belief wither, the gods will die.” The idea that authority emerges by consensus and evaporates when the consensus does is not restricted to mythology; democratic institutions perform a similar function. In the context of a nation state, or even a professional society, they are developed into a governance framework optimised for a combination of stability, responsiveness and the maintenance of trust. How are things online? The honest answer is “not good”. When talking of trust online 52 the first port of call is the Certification Authority (CA) infrastructure, which has many known failings. A typical machine trusts several hundred CAs, and trusts them for just about everything; if the Iranian secret police manage to hack Comodo, they can not only impersonate your bank, or take over your online banking session, they can also upgrade your software. Since an Iranian compromise caused the browser vendors to close down Diginotar, we have seen corporates moving their certificate business to the two largest players, Verisign and Comodo, in the belief that these firms are “too big to fail” (or perhaps “too interconnected to fail”). Firms hope that even if these CAs are hacked (as both have been), the browser vendors would never dare remove their root certificates because of the collateral damage it would cause. As for ordinary users, we trust Verisign not because we decided to, but because the merchants who operate websites we use decided to. This is a classic two-sided market failure. Can we expect salvation from governments? Probably not any time soon. Governments have tried to assume divine powers of their own, first during the crypto wars by attempting to mandate that they have master keys for all trust services operating in their jurisdiction, and second by trying to control authentication services. Such initiatives tend to come from the more secret parts of states rather than the most accountable parts. Can we users ourselves do better? The SPKI/SDSI proposal from Ellison, Rivest and Lampson attracted some research effort in the late 1990s and showed how every individual user could act as their own trust anchor, but the question is how to deploy such a system and scale it up; the one user-based system actually deployed in the 1990s, PGP, remains widely used in niche applications such as CERTs and anti-virus researchers, but never scaled up to mass use. The application of encrypting email suffers from strong network externalities in that I need my counterparties to encrypt their email too. This has become the norm in specific communities but did not happen for the general population. Can we scale up deployment in other applications from a club that provides a small initial user base? One case where this happens is in the Internet interconnection ecosystem, where trust among some 50,000 ASes is founded on the relationships between about a dozen Tier-1 providers, who form in effect a club; their chief engineers meet regularly at Nanog conferences and know each other well. But how could a service scale up from a few dozen users? 3.3 Motivation In this chapter we present another example for discussion. We propose an anonymous online reputation system whose goal is to let people get better quality of service from a distributed proxy service such as Tor. Our proposed new trust service has limited scope; if it works, it can provide lower latency, while if it fails its failure should be evident. The 53 more people trust it, the more effective it becomes; if people observe that it is not working and lose faith in it, then it will fade away and die. What’s more, multiple such trust services can compete as overlays on the same network. The Tor network [46] consists of volunteer relays mixing users’ traffic to provide anonymity. The list of relays is disseminated through a consensus file which includes the IP addresses of all relays. IP addresses are required to allow a user’s client (Tor software) to locally decide which relays to use to route traffic. However, an attacker (say a censor) can also download the consensus file, extract relay addresses, and block traffic by cooperating with local ISPs or using a nation-wide firewall. Thus, victims of a technically competent censor need private relays to connect to one of the publicly known relays. The private relay must not be known to the censor, or it too will be blocked. These private relays, called bridges, act as transient proxies helping victims to connect to the Tor network. Bridges are a scarce resource, yet play a critical role in connecting censorship victims to the Tor network. Therefore, we want to incentivise Tor users to run more bridges. 3.4 System design Our proposed enhancement to Tor consists of competing clubs, each managed by a club secretary. This is a design difference from using a quorum of Directory Authorities (DAs) organised as a failover cluster, as currently implemented in Tor. The club secretaries, acting as Bridge Authorities (BAs), are responsible for disseminating information regarding club members. Each secretary is supported by a community of members who use its tokens as a currency to prioritise service. Members help censored users (victims) to circumvent censorship by volunteering to act as bridges, and can claim token rewards for their help. To the secretaries, the performance of members is visible and measurable. Secretaries clear each others’ tokens, just as banks clear each others’ notes. Through private token payments, we can analyse the behaviour of nodes and determine which are actively and correctly participating in the network. Tokens are blindly-signed objects used to request services from other nodes. Tokens lose value over time, to demotivate hoarding. We discuss now the details of operation. 3.4.1 Member registration Members can join any club they choose; loyalty to a club is determined by the incentives and performance it offers. Members can participate in one club or many, volunteering for whoever provides reliable services. Members offer service to a club by broadcasting their services: “this is my key, address, etc and I’ll be available for contact between 11:00 and 11:30; send victims my way”. This process can be automated using an uncensored and trusted means of communication. We assume that most members are outside the censor’s 54 jurisdiction, though some will have ties of family or friendship to the censor’s victims. Thus, some members may be motivated by the wish to help loved ones while others are altruistic and others are revenue maximisers. Some members will be within the censored jurisdiction. We assume that keys can be exchanged successfully between members and secretaries: either the secretary publishes public keys somehow, or passes them on to new members as part of the recruitment process (about which we are agnostic). Within the censored jurisdiction, one or more designated scouts communicate with victims: we assume these are existing members. We assume the existence of innocuous store-and-forward communications channels such as email or chat; only a handful of censored jurisdictions ban Gmail and encrypted chat completely. 3.4.2 A simple threat model Suppose that Alice and Bob belong to a club organised by Samantha to provide bridge services. Alice volunteers her IP address at time t to Samantha; a victim, Victor, contacts Sam to ask for a bridge; Sam gives him Alice’s IP address and a short one-time password NA; Victor contacts Alice and presents NA; Alice shows NA to Sam, who checks it, and Alice connects Victor to the Tor network. The protocol runs: A → S : IP S → V : IP,NA V → A : NA A → S : NA S → A : OK The first problem with this simple protocol is that Samantha has to be online all the time; as she’s a bottleneck, and the censor can take down the system by running a distributed denial of service attack on her. Even without that we have two messages more in the protocol than we probably need. Our first attempt at improvement is to make the nonce NA one that Alice can check; so let Alice share a key KAS with Samantha and we construct the nonce NA by encrypting a counter k with it. The protocol is now: A → S : IP S → V : IP, {k}KAS V → A : {k}KAS Alice can now check the nonce directly, so Samantha doesn’t have to be online. The next problem is harder; it is that the censor’s shill, Vlad, can also ask for an IP address, and if Samantha gives him one, the censor will block it. This is the real attack 55 right now on Tor bridges; the censors pretend to be victims, find the bridges and block them. Countermeasures range from restricting the number of IP addresses given to any inquirer, and trying to detect Sybil inquirers using analytics; but if you have a repressed population where one percent have been coopted into working for the secret police, then telling Vlad apart from Victor is hard (at least for Samantha who is sitting safely in New York). 3.4.3 A more realistic threat model In what follows we assume that of the two representative club members, Alice is in the repressed country, while Bob is sitting safely in exile. Alice, if honest and competent, is better than average at telling Victor from Vlad, perhaps because of family ties, friends, or ethnic or religious affiliations. Alice might be undercover, or might have some form of immunity; she might be a diplomat, or religious official, or sports star. She might hand over bridge contact details to victims written on pieces of paper, or on private Twitter messages to fans. The full gamut of human communications, both online and offline, is available for members who act as scouts to get in touch with victims. We now introduce another layer of indirection into the protocol. After Bob volunteers to be a bridge, Samantha gives the scout Alice a token for her to give to a victim Victor, constructed as {k,NAS}KBS . When this is presented to Bob, he can decrypt it and recognise the counter, so he knows Samantha generated it for him, and grants bridge service to the victim. He sends it to Samantha, who can recognise it as having been generated for Alice, and can thus note that Alice managed to recruit Victor (or alternatively, if Bob’s IP address then ended up on the blacklist, that Alice recruited Vlad by mistake). Formally: B → S : IPB S → A : IPB, {k,NAS}KBS A → V : IPB, {k,NAS}KBS V → B : {k,NAS}KBS B → S : NAS NAS might be constructed in turn by encrypting a counter; but once we start encrypting a block cipher output and a counter under a wider block cipher, we are starting to get to the usability limit of what can be done with groups of digits written on a piece of paper. As AES ciphertext plus an IP address is about 50 decimal digits. In some applications, this may be all that’s possible. In others, we might assume that both scouts and victims can cut and paste short strings, so that digital coins and other public-key mechanisms can be used. In a more general design, we have to think not just about running scouts to contact victims and tell victims apart from censors, but also about scouts who are eventually 56 turned, and about clubs that fail because the club organiser is turned, or has their computer hacked by the censor, or is just incompetent. We also have to think about dishonesty: about a bridge operator or scout who cheats by inflating his score by helping non-existent or Sybil victims. How far can we get with reasonably simple mechanisms? 3.4.4 Payment system We avoid external payments, as offering cash payments as incentives to volunteers risks trashing the volunteer spirit (this is why the Tor project has always been reluctant to adopt any form of digital cash mechanism for service provision; volunteering is crucial for Tor’s operation). Furthermore, we avoid using complicated zero-knowledge protocols or creating huge log files to protect against double-spending; large audit trails cannot scale very well. We prefer a lightweight mechanism that uses blind signatures made with regularly changing keys and member pseudonyms to provide privacy and unlinkability, as well as symmetric cryptography to create data blobs verifiable by the secretary or bridge. The token reward can then be used to pay for other services in Tor. For example, club members who run successful bridge services might enjoy better quality of service by using service tokens to get priority. We now sketch a design using blind signatures rather than just shared-key mechanisms. 3.4.4.1 Member identifier After registering a member, the secretary creates a series of data blobs as the member’s identifiers. An identifier can be the result of encrypting the member’s name plus a counter or random salt with a symmetric encryption algorithm and a key known to the secretary; identifiers change constantly, and the secretary alone can link them to members other than by context. As well as knowing which members correspond to which identifiers, the secretary also notes which victims are introduced to which identifiers as possible bridges. This is to make it hard for a member to generate fake victims and claim rewards without providing them with service. It does mean the secretary is completely trusted, but secretaries compete with each other to provide effective service. We discuss all this in more detail later. 3.4.4.2 Victim accounts Secretaries maintain reputation scores not just for members but also for victims. Upon being introduced by a scout, a victim gets a default score of 1 allowing them to make a single bridge request. After a successful initial request, the victim’s account is reduced to zero for the current period. This is one of a number of rate-limiting mechanisms to prevent Vlad from draining the IP pool. 57 Secretaries use victim accounts to mitigate a few possible attacks, which we explore in more detail in the discussion section. One purpose is Vlad detection: if the members a victim learns about are not censored after a period of time – the IP addresses are not blacklisted and the scout is not turned – the victim’s account is increased; the opposite is true if a scout is arrested. Eventually, a diligent secretary should be able to identify fake victims by intersecting groups of victims and groups of censored members. It follows that the censor would have to refrain from blocking members if he wants to learn about new members. If he blocks members immediately, he betrays his shills. The conventional law-enforcement approach would be to block immediately, while the intelligence approach would be to observe quietly until all or most of the members are identified. Forcing the censor to make a strategic choice opens up all sorts of possibilities. Secretaries have an incentive to protect their members’ identities; if the censor can spot and arrest the scouts and block the bridge IP addresses, the club will be ineffective and volunteers will help other clubs instead. Some volunteers will be picky, as if they join a badly-run club their IP address will be quickly blocked in that club’s country of interest. Other volunteers may be happy to help victims in fifty countries and consider it a badge of honour to be blocked in a dozen of them. And if participation is rewarded not just with honour but with improved quality of service, then this will mostly come from jurisdictions in which volunteers are not blocked. 3.4.4.3 Token creation Account payment mechanisms can be replaced by blind tokens at one or more stages in the process. In the simplest implementation, we can reward Bob for providing bridge services with tokens that offer better quality of service from Tor nodes. Given the way Tor works, this requires some form of anonymous payment or certification. So when Bob services a request from Victor, Bob can now use Alice’s nonce NAS to request an anonymous token from Samantha rather than just banking the credit. He does this by generating a well-formed token CB, blinds it with a multiplier, and sends it to Samantha, who generates a blind signature [33] and returns it. With simple RSA blinding, where e is the public signature verification exponent, d the private signing exponent and n the public modulus, we have (the first four steps are carried from the protocol in Section 3.4.3): B → S : IPB S → A : IPB, {k,NAS}KBS A → V : IPB, {k,NAS}KBS V → B : {k,NAS}KBS B → S : NAS, reCB (mod n) S → B : rCdB (mod n) 58 Bob now unblinds CB by dividing out the blinding factor r. This token is unlinkable and can be used for interacting with Tor nodes. The token CB includes a random number, generated by Bob, to detect double-spending. Unlinkable tokens can now be used to request other services by embedding them in the request; for example, if victims can handle public-key mechanisms, Victor might use such a token to request bridge service from Bob. However, it is in prioritising anonymous service requests that blind tokens really come into their own. 3.4.4.4 Defining time using key rotation As noted earlier, we assume that new public signature-verification keys are announced for each future epoch. (It is possible to use an identity-based signature scheme by setting the public key for epoch i to be the value of i). We can simplify matters if we define a time interval as an epoch; this enables us to avoid using timestamps in token protocols (we’d prefer to avoid the complexity of using partially blinded signatures that contain timestamps). Changing public keys frequently also reduces the amount of state that must be retained to detect double spending, a known problem with blind payment systems. If tokens are used only by Tor nodes with high-quality network service, this is probably a reasonable simplifying measure. We note in passing that nodes have an incentive to pay attention to the signature traffic, to ensure that a cheater does not pass them a stale token. 3.4.4.5 Validation and value of tokens Tokens expire if they were issued using a signing key that was retired or revoked; a signing key may be deemed retired if it was first used a certain number of epochs ago. We propose that the number of epochs since the token’s signing key was first used is a deflator, which will decrease the value of a token. The formula used might be exponential, to represent a negative rate of interest; it is not clear that it matters all that much whether deflation is exponential or linear. Club secretaries can refresh tokens with new ones of an appropriately lower value, according to rules we will discuss later, and will reject double spend attempts. Expired tokens will also be recognised and rejected by all nodes, limiting the volume of data needed to detect double-spending. 3.4.5 Generating trust and reputation metrics Each secretary acts as a bank and keeps members’ accounts: each member’s balance is the amount of correctly performed identifiable service plus the number of tokens claimed or refreshed in each epoch. These balances act as a proxy for reputation. The balances also deflate over time, but significantly less quickly than tokens in circulation. For example, if a token loses 20% of its value at each epoch when in circulation, it might lose only 5% when 59 banked. Thus a member wanting to maximise its reputation has to do useful work and bank tokens promptly. The history of members’ bank balances may also be made available to other members; there are second-order issues here about the potential identifiability of members involved in particular campaigns, so whether the secretary publishes member account history or smoothed metrics derived from it would depend on the application. The overall effect is that the network as a whole can take a view on how much it trusts particular members. The more tokens Bob has in the bank, the higher his reputation. Note that there is little incentive for a group of colluding nodes to manipulate the reputation system by trading among themselves; they achieve nothing except to decrease their original endowment of tokens by the amount of time these tokens are not in a bank. Another advantage of using a rapidly depreciating currency is that we can use the reputation system itself as an indicator of member liveness. We want to avoid sending requests to offline nodes; yet if we contact each node directly to request proof of liveness (for example with an ICMP packet), we open up a denial-of-service attack where a malicious node saturates the system with liveness checks. A real-time reputation system may help us avoid this. 3.5 Discussion The system proposed here is concerned with enabling a group of users to maintain situational awareness in a censorship avoidance system. Each club of users can set up their own bridge authority maintained by a club secretary; an authority trusted by more users will become larger. In equilibrium we hope that clubs would settle each others’ tokens, just as banks clear each others’ notes. 3.5.1 Mitigating collusions and malicious members There are a number of possible ways in which members might behave improperly. Alice and Bob might collude, so that Bob pretends to service Sybil victims invented by Alice; Samantha can mitigate the risks of this to some extent by randomising the allocation of IP addresses to scouts, and running appropriate analytics. The worst case is if the target country’s intelligence service manages to hack Samantha’s computer; then it is game over. For that reason we want multiple competing clubs. We note that while some nation states have had multiple competing intelligence services, it is rarer to find multiple competing counter-intelligence organisations; perhaps our construction can find other uses. The next most severe attack might be if the target country’s intelligence service manages to subvert Alice and most of her fellow in-country members, and uses their bridge resources to make innocuous network connections, rather than blacklisting them, thereby denying the resources to censorship victims and denying both Bob and Samantha 60 knowledge that Alice has been subverted1. While a normal censor will act as a policeman and block bridge IP addresses, a more strategic adversary might prefer to leave them be and exhaust the resource. Detecting and defeating such attacks requires further channels of information. Ultimately we rely once more on the facts that there are multiple clubs, and multiple channels of communication between censorship victims and their family, friends or co-religionists in exile. 3.5.2 Mitigating Sybil attacks Members can claim tokens by fabricating victims, but those members end up “burning” their victims’ account balances (and their email identities) if they don’t use the identifiers. Recall that victims are assigned to members randomly, and Samantha can run analytics to determine which scouts and which bridges have outlying patterns of victims. Diagnosis is not always going to be straightforward; Samantha might suspect that some victims are bogus, or that some members refuse to service victims, but it may in fact be the case that some victim group cannot contact members due to censorship or DoS attacks. It does little good if multiple members collude to exchange each others’ identifiers; they end up burning their fabricated victims’ identities, as long as there are honest members acting correctly and serving genuine victims, and the secretaries assign victims (clients) to members (bridges) randomly while monitoring performance and correct operations. This restricts the victims’ ability to select members, which is similar to TorCoin, through its TorPath mechanism, where an “assignment server” assigns a Tor path to clients (we provide more detail in Section 3.6). Therefore, members are better off acting correctly. 3.5.3 Metrics for security economics In this proposal, we attempt to incentivise correct behaviour, to generate metrics to identify which nodes are most interconnected, and empower nodes to shift trust to other nodes. In other words, we facilitate the mechanisms required to democratise trust and power, by empowering participating nodes to vote (transact using tokens) for the node that deserves their trust. Moreover, by creating a system to generate useful metrics, this design can be used to facilitate research on the security economics of users’ interactions, inspired by [1, 45]. 1Such subversion might involve a national-scale malware implementation programme; see for example Gamma’s “Project Turkmenistan” disclosed on Wikileaks. 61 3.6 Related work In their paper On the economics of anonymity [1], the authors argue that the actions of interacting nodes in the network must be visible for informative decision-making about malicious behaviour. Through the trust and reputation metrics introduced in this chapter, we can now understand node interactions better while preserving privacy. A thorough and insightful discussion based on practical experience is provided in [45], whose authors state that in order to provide the mechanisms for verifiable transactions and reputation ratings in anonymity systems, they needed to retrofit appropriate metrics. In fact, those authors already wondered whether it might be possible to create a reputation currency that might “expire, or slowly lose value over time”. The redesigns that the authors discuss in [45] were originally introduced in [43] and [41]. In [43], the authors integrate into Mix networks (anonymous remailers such as Mixmaster [88]) the role of witnesses: semi-trusted nodes that act as referees to service rejection or abnormal behaviour (but this introduces multiple trust bottlenecks and can be abused if the witnesses are compromised). In [41], the authors decided to drop the witness construction. Network paths are constructed in cascades; if one node in the path fails, every node in the path is rated negatively. Without proof-of-service-failure to pinpoint the node responsible, this design is vulnerable to an adversary joining multiple correctly-operating cascades, perhaps in order to route traffic to other adversary-owned cascades (which would de-anonymise users). Similarly, Free Haven [44] uses reputation to reward correctly-operating servers that store data and fulfil their contracts. The reward is a higher reputation that allows a server to store its own data with other servers (so long as no issues occur when validating service contracts). In fact, Free Haven is one of the first designs to use reputation as a form of currency. Moreover, the stamp-trading system discussed in [90] suggested that reputation built through proofs of providing services can be used as a currency to facilitate node interaction. In the context of anonymity networks, there have been many proposals to rate and incentivise service-providing nodes (Tor relays). Most of these designs rely on bandwidth verification. For example, in [42], the authors suggest granting priority to the traffic of high-bandwidth nodes using gold stars. However, this can profile relays that also run clients by isolating the gold star traffic from the ordinary variety. More traditional approaches include XPay [35] and PAR [10] which use digital cash to reward relays. A more novel approach is discussed in BRAIDS2 [67] which employs a similar architecture to the one proposed in this chapter. BRAIDS uses partially blind signatures to embed timestamps in tickets. This allows the Directory Authority (DA) in a BRAIDS-enabled system to expire tickets based on timestamps. In contrast, we expire tokens by key rotation, which enables tokens to be unlinkable. BRAIDS nodes create relay-bound tickets that can be 2We initially designed our system without knowledge of BRAIDS then amended this chapter to refer to it, but did not set out to design an improvement to BRAIDS. 62 verified by the relay; this increases efficiency, but limits the freedom to transact, as a relay has to contact a DA to issue new tickets for other relays. If a Tor path includes a relay that refuses service, the node must create new relay-bound tickets (generating another request to DA), since the original tickets will not be accepted by another relay. In LIRA [68], the authors attempt to increase efficiency by introducing a probabilistic micropayment protocol into their design (in fact, the authors use a similar construction to MicroMint [104]). In rBridge [121], the authors aimed to reward users using a credit system based on how long information about a Tor bridge remains secret from an adversary (measured by its reachability and non-censorship). This credit system can then be used to obtain information about another bridge if the original bridge is censored. Other authors were inspired by cryptocurrencies: In [69], high-bandwidth relays are awarded with Shallots which are redeemable for Priority Passes that can be used to classify and prioritise Tor traffic. In TorCoin [58], TorPaths are used to verify paths constructed in Tor, and can be viewed as an enhancement for Tor’s bandwidth verification. Another approach that aims to preserve users’ privacy was proposed in [23] that uses Proof-of-Work shares as micropayments for relays. Relays can then submit those shares to a mining pool and claim rewards in bitcoins [95]. This provides anonymity for users but not for relays if they use the pure Bitcoin protocol. Trust and reputation metrics are used for various reasons. For example, Advogato [78] was used to create attack-resistant metrics to correctly rate user-generated content. Another prominent example is Google’s PageRank [99], which rates web pages based on how many other pages point to it and creates a reputation rating for how reliable a page is (essentially, the pointers to a page are votes for that particular page but are weighted recursively by their own reputation). A common problem with most proposals for incentivising Tor relays by using bandwidth verification schemes, or user-generated feedback, is that the authors do not discuss Sybil attacks which involve the adversary creating many circuits to their own relays to game the system. We attempt to mitigate this through the random assignment of victims to members, while monitoring performance to observe correct behaviour (a task given to a trusted secretary in our system). This is similar to the trusted role of “assignment servers” used in TorCoin (as discussed above); a TTP is responsible for assigning tasks and monitoring performance to evaluate members. A further issue is that an overlay on Tor that enables some club of users to enjoy priority service would risk a substantial decrease in the size of the anonymity set. In our proposal, a large number of clubs can each have a secretary acting as their own DA, and the secretaries can clear each others’ tokens. 63 3.7 Conclusion In this chapter, we sketched a design for an anonymous reputation system that can provide a quality-of-service overlay for an anonymity system like Tor or an open-membership system like Bitcoin [95]. Unlike most electronic trust services today, it has the right incentives for local and democratic trust management. Groups of users can each establish their own token currency to pay for forwarding services, and the nodes that work hardest can acquire the highest reputation, enabling them to get still more work. Groups can clear each others’ tokens. And finally, any group that fails to compete will find that its failure becomes evident; its users can desert it for other groups and it can just fade away. 3.8 Afternote After contacting lead developers of anonymity systems, we learned that the reservations they have for introducing payment and reward schemes to their systems stem mainly from non-technical reasons. The main concern is that introducing such schemes to purely voluntary systems would change the dynamics from nodes operating for the benefit of all others, to a predatory system that motivates adversaries to accumulate rewards while possibly producing damaging effects to these systems. Thus, we decided not to pursue the goal of deploying similar schemes to build trust and reputation metrics. However, we note that the system proposed in this chapter can alleviate network attacks discussed in Section 2.2.5, by ranking participating nodes and creating a reputation score for them in order to mitigate malicious activity. Moreover, proposals incentivising Tor bridges include using cryptocurrencies which we discussed earlier (e.g., as proposed in [23] using schemes similar to those used by Bitcoin mining pools), to reward nodes. Therefore, we focused next on investigating performance issues in cryptocurrencies that create critical bottlenecks and detrimental effects that undermine system performance, scalability and resilience. We discuss the details in the next chapter. 64 Chapter 4 Bitcoin stress testing We discuss in this chapter a few issues with Bitcoin’s throughput and performance. We analyse a spam campaign that occurred in July 2015, which crippled system performance by overwhelming the network with spam transactions. This resulted in Bitcoin transactions experiencing more delays in processing time (reducing the reliability and efficacy of the payment system), and incurring higher transaction fees to prioritise transactions (increasing the cost of participation in the network). This shows that Bitcoin requires more research to add resilience against undesirable content being included in the blockchain, which take valuable space within blocks away from non-malicious transactions waiting in limbo to be processed and finalised. The content of this chapter is adapted from our paper Stressing out: Bitcoin “Stress Testing”, by Khaled Baqer, Danny Yuxing Huang, Damon McCoy, and Nicholas Weaver, which appears in the Proceedings of the Third Workshop on Bitcoin and Blockchain Research (2016), affiliated with the International Conference on Financial Cryptography and Data Security [19]. 4.1 Summary We present an empirical study of a recent spam campaign (a “stress test”) that resulted in a DoS attack on Bitcoin; its goal was to understand the methods spammers used and the impact on Bitcoin users. To this end, we used a clustering-based method to detect spam transactions. We then validated the clustering results and generated a conservative estimate that 385,256 (23.41%) out of 1,645,667 total transactions were spam during the 10-day period at the peak of the campaign. We show the impact as increasing non-spam transaction fees from $0.11 to $0.17 (per kilobyte of transaction) on average, and increasing delays in processing non-spam transactions from 0.33 to 2.67 hours on average. We estimate the direct cost of this spam attack in terms of increased mining fees at $49,000; an attacker prepared to spend this much could DoS Bitcoin in 2015. 65 4.2 Introduction The Bitcoin network [95] was subjected to a major spam campaign during the summer of 2015 that caused degraded performance. The likely intent of the incident (advertised as a “stress test”) was to flood Bitcoin with spam transactions, in order to expose its vulnerability to spam and DoS attacks, and to garner support for a proposed change to increase the number of transactions that the Bitcoin network can verify, which was then approximately 3 transactions per second. DoS attacks against Bitcoin had been theorised. However, to date there has been little empirical analysis of DoS attacks launched directly against Bitcoin. 4.3 Background We already gave a high-level description of the Bitcoin protocol in Chapter 2. The main components of a transaction, relevant to our analysis here, are the transaction ID (txid), the inputs to the transaction (vin), and the outputs (vout). A transaction includes inputs that reference one or more unspent transaction outputs (UTXOs). Transactions are broadcast to peers in the Bitcoin P2P network, and the transaction propagates through the entire network in about 13 seconds [39]. Received transactions are checked and maintained in a node’s own local memory pool (Mempool), where they remain in limbo until confirmed and included in a block. Although a node may maintain unconfirmed transactions for a long period of time, memory pressure may cause a node to evict older entries from the Mempool if it grows sufficiently large. Nodes also maintain an unspent transaction output set to easily verify inputs to newly received transactions. Any increase in the UTXO set adds memory pressure on nodes which hold the UTXO set in RAM. Unlike with the Mempool, this pressure cannot be relieved by eviction, but requires changing the node’s implementation. 4.3.1 Transaction priority In the reference implementation, a Bitcoin miner calculates a transaction priority P and uses this to determine which transactions to include in the block. To calculate P , the node considers all inputs to the transaction as well as its size. P is defined in Bitcoin as∑n i=0(valuei × agei) ÷ S, where n is the number of inputs to the transaction, value is the value of input i (in Satoshis1), age is defined as the difference between the current block’s height and the input’s block height, and S is the transaction’s size. The value of P determines a transaction’s fate, and there are three possibilities: 11 Satoshi = 10−8 BTC. 66 1. Include transactions in the high-priority section of a block (50 KB); no transaction fee is necessary. The following conditions must be satisfied, the transaction must be: • smaller than 1 KB • all output values are at least 0.01 BTC • P is high as determined by valuei and agei 2. Transactions that pay fees are prioritised by highest mBTC per KB. 3. The remaining transactions are maintained in the Mempool until one of the two conditions above is satisfied. In the latter case, age is the determining factor for P since everything else is constant. It is of particular note that miners prioritise for higher fees. 4.3.2 DoS targets inherent in Bitcoin Spam can be detrimental to the Bitcoin network by outcompeting legitimate transactions for inclusion in a block, delaying other transactions. We define the following types of spam: 1. Fan-out: Transactions that split a few inputs into many outputs occupy space in the blocks and also increase the UTXO set. 2. Fan-in: Transactions which absorb a large number of inputs reduce the UTXO set but still occupy substantial space in the blocks. 3. Dust output: Transactions that create very small “dust” outputs convey a trivially small amount of value but occupy the same amount of resources in the network. The spam campaigns in the “stress test” target one or more aspects of the Bitcoin environment, including the block size limit, the UTXO set, and the computational cost of verification. All these limited resources represent potential targets. The primary publicly-stated motivation behind the stress test campaign was to provide a justification for raising the Bitcoin block size limit before organic demand limits the ability of Bitcoin to process payments. The current Bitcoin block size of 1 MB globally supports less than 3 Bitcoin transactions per second. Since this is three orders of magnitude lower than Visa’s sustained rate of 150M transactions per day (and peak processing ability of 24,000 transactions per second) [115], it was argued that Bitcoin payment processing was insufficient to meet the ambitions of the Bitcoin community. The spammers claimed that they wanted to demonstrate the impact of this limit by squeezing out normal transactions. 67 Raising the block size, however, opens up a different DoS vulnerability: a long-term growth DoS on the Blockchain itself. Since the Blockchain records all previous transactions, an attacker could perform low-fee transactions simply to take up space. Thus if Bitcoin raised the block limit to 20 MB, and an attacker can cheaply consume 10 MB of data per block, this would cause the Blockchain to increase in size by half a terabyte a year. Since valid transactions can only spend unspent outputs, most full Bitcoin nodes keep the UTXO set in memory to speed validation. The memory requirements for the UTXO set are solely based on the number of unspent outputs, so the inclusion of dust outputs in the stress test adds memory pressure to the UTXO set. A better-designed Bitcoin node should not have this vulnerability. Another DoS attack occurred on October 7 and 8, which also put a significant amount of pressure on the Mempool memory, raising the Mempool to nearly a GB, with a transaction backlog of nearly a week. Since a large number of nodes run on Raspberry Pi devices and other constrained systems, this large Mempool managed to crash over 10% of all Bitcoin nodes2. Most of the spam itself, however, was of low priority. Such spam does not put pressure on block inclusion, but neither does it cost the spammer any bitcoins; transactions that are never confirmed do not incur a cost for the sender. An inadvertent CPU DoS occurred due to a mining-pool’s “cleanup” block, a single 1 MB transaction that served to remove a massive number of unspent transactions sent to crackable “Brain wallet” addresses (which use a passphrase, instead of private keys, to create Bitcoin addresses and spend bitcoins). Other nodes required substantial CPU time to validate this block, as the current implementation required O(n2) time to validate a transaction. There may be other CPU DoS possibilities inherent in the Bitcoin protocol that attackers can exploit. Another DoS is inherent in “transaction malleability”. Someone can take a valid transaction, permute it so it has a different txid, and broadcast that modified transaction to the network. If the attacker’s transaction is accepted into the blockchain, this can disrupt wallet services, hardware wallets, and other systems tracking txids to determine when a transaction commits to the blockchain. Recently, an attacker performed this DoS “because I am able to do it.”3 Finally, a later (failed) spam campaign attempted to flood the network with invalid transactions, perhaps intending either a traffic DoS or a CPU DoS. The “money drop”, a public release of private keys by one of the purported instigators of the stress test, seems intended to cause a big race which would cause a large number of “double-spend” transactions. This did not produce a meaningful disruption of the network, although it was probably intended to introduce computational load. 2https://www.reddit.com/r/Bitcoin/comments/3ny3tw/with_a_1gb_mempool_1000_nodes_are_ now_down 3https://bitcointalk.org/index.php?topic=1198032.msg12579271 68 One aspect not encountered during the stress test was the effect of filtering valid but spammy transactions. The introduction of spam filters, if an unknown attacker continued a longer-term DoS attempt, could in itself be a DoS. If the attacker adapts to the filters, eventually the filters will either fail to stop the spam or incur false positives. Even a small false positive rate might be disruptive: could a payment network tolerate a 1-2% transaction failure rate due to spam filters? 4.4 Data collection In our study, we set up a server connected to a public-facing network. We installed Bitcoin Core 0.11 and kept it running between June 19 and September 23, 2015. We collected three main data sets using Bitcoin daemon’s JSON-RPC interface. 1. Bitcoin Blockchain: On September 23, we downloaded the entire blockchain using the getblock and getrawtransaction methods. This returned details for all blocks and transactions, such as the timestamps of blocks, the timestamps at which we received the transactions, the number of transaction inputs and outputs, as well as the input and output amount. We stored the data as plain-text JSON strings. As a result, the total data size is 350 GB. 2. Mempool: Between June 19 and September 23, the getrawMempool method was invoked every minute. This returned a list of unconfirmed txids currently in the Mempool. These would be either committed to the blockchain or later discarded by the P2P network. We saved this list of txids, along with the timestamp of the RPC call, on the Hadoop file system. During this period, we captured 12 million distinct txids in the Mempool, amounting to 250 GB of plain-text data. 3. Unconfirmed transactions: For every unconfirmed transaction that we obtained, we immediately looked up the transaction details using the getrawtransaction method, since the Mempool could discard the transaction at any moment. To optimise for speed and storage, we ignored transactions we had previously seen. Finally, we saved all the transaction details, along with the data collection timestamp, on Hadoop. Between June 19 and September 23, we captured 1.3 TB of unconfirmed transactions in plain text. The total volume of data collected is 2 TB, which we saved as plain-text JSON strings on the Hadoop file system and analysed with Spark. We summarise our data sets in Table 4.1. As we collected data using only a single node, our perspective of the P2P network, and thus the transactions in the Mempool, is potentially biased. In particular, network 69 Table 4.1: Data sets. All data sets cover a period between June 19 and September 23. Data Period Size Blockchain Between Jan 9, 2009 and Sept 23, 2015 350 GB Memory pool Between June 19 and Sept 23, 2015 250 GB Unconfirmed transactions Between June 19 and Sept 23, 2015 1.3 TB Table 4.2: Transaction features Feature Notation Description Inputs I Number of inputs Outputs O Number of outputs Ratio R I ÷ O Priority P Value-weighted measurement Size S Size (bytes) Size and Ratio S × R Emphasise fan-in and fan-out Fees F Value of unclaimed outputs Coin days destroyed CDD Coin age and spending velocity Value V Total output value Fees to values ratio F ÷ V Emphasise fee differences propagation takes time. For transactions in the Mempool, the timestamps that we observed may be later than the originating timestamps. Furthermore, whether a transaction is relayed is up to individual nodes. A transaction created a few hops away is not guaranteed to reach our node. It is, however, beyond the scope of this research to adjust for such biases. We assume that our observation of the network is largely consistent with the rest of the network. 4.5 Spam clustering We use an unsupervised machine learning method, k-means clustering, to find similarities and evaluate our findings. This is not necessarily a perfect filter, but, as we manually verify, this does efficiently detect the spam transactions in the “stress test”. To use k -means clustering, we create a multi-dimensional vector representing features of a Bitcoin transaction. We include in Table 4.2 the list of features and follow up with defining features that were not previously discussed. R is necessary to highlight the difference between fan-in and fan-out transactions. We further highlight this difference by multiplying the size of the transaction by its ratio (otherwise, transactions with clear differences in R are clustered together based on similarities in S). We include another property to highlight the velocity of spending 70 bitcoins represented as CDD4. This feature gives more weight to older coins, and can be calculated as ∑n i=0(valuei × agei). Unlike P , CDD does not consider S, age is measured in number of days rather than blocks (an estimate of 144 blocks are produced each day), and value is in bitcoins. 4.5.1 Methodology Since spam campaigns may not link transactions and addresses, parsing the blockchain to look for linked transactions might be a futile process. Our approach is different: we cluster transactions based on their motifs (trends in the Bitcoin network), and disregard transactions’ identifying information (output addresses, txid, etc.). Our main assumptions at this stage echo those required for machine learning algorithms: a pattern exists, we cannot mathematically point out differences in patterns (without data visibility), and we have a large trove of data to show the patterns exist. We assume motifs do exist because spam requires construction in-bulk to have a measurable effect on the network. Thus spammers naturally create large numbers of transactions that “look similar”. We also expect that such groups of transactions may have different motifs compared with normal Bitcoin behaviour, since spammers want to minimise the cost and maximise the impact, producing different types of transactions (e.g., very high fan-out or dust output) that particularly stress the network. What we seek is a high-level interpretation of the data into distinct clusters that we can then use to label transactions as spam and validate our results. Thus, to investigate our main goal of identifying spam motifs, we consider the entire Bitcoin network as an entity, rather than analysing features of a transaction independently from network norms. The latter process relies heavily on what features should be considered to identify spam, which might assign more weight to some features while disregarding others that are more influential. We use k -means clustering, as provided in Spark’s machine learning library (MLlib). k -means clustering is a type of machine learning algorithm for unsupervised learning. This algorithm is particularly useful to cluster similar data together when it is non-trivial to define similarity using the unlabelled data. Similarity of vectorised data is determined using k -means by minimising the Within-Cluster Sum of Squares (WCSS); the data is matched to the cluster centroid with the closest mean. The following equation is used to iterate over the data to get optimal cluster centroids in order to minimise WCSS: min ∑k i=1 ∑n x∈Si ‖x− µi‖2, where k is the number of clusters, x is the data element (in vector form), Si is the set containing n elements, and µi is the mean of Si (i.e., the mean of all the elements in vector form that are contained in Si). 4This feature is used by Bitcoin block explorers, see for example: https://blockr.io 71 To reproduce the results discussed in this chapter, the following properties of k-means must be considered: the number of clusters k was set to 10, the number of maxIterations was set to 100, and initializationMode was set to random. The silhouette coefficient measures the homogeneity of the data in a cluster. This is performed by measuring the average dissimilarity (defined in terms of distance between data elements) between a given element within its cluster, and comparing the result with the average dissimilarity between that same element and elements of another cluster considered to be the next best-fit. However, in our case, our aim is to show general transaction motifs, rather than to show detailed transaction differences or find anomalies. We arrive at k = 10 after testing multiple values for k to show enough visibility of transaction patterns. If we choose k = 11 for example, we obtain a new cluster where the average of transaction outputs is 8 rather than 11 (as shown in cluster 9 in Table 4.3). Instead, we accept that the clustering algorithm groups these transactions together in cluster 9, given that they are similar in other features. Conversely, with k < 10, clusters contain transactions that differ in most of their features; this does not enable us to inspect the clusters to easily determine which of them fit our definitions of spam. With k = 10, we see the “outliers” visible in a dedicated cluster (cluster 8 in Table 4.3), whereas with k < 10 these outliers are included in other clusters that do not match well. The initial step for processing data was weeding out some transactions that alter the clustering results. To set a starting point, we create two checks to filter transactions. First, we check if the transaction creates dust output (we explain this check in detail later). The second check determines if the transaction’s fan-out ratio is unusual (a threshold is set at 0.3). The rationale for these two checks is as follows: If a fan-in transaction creates dust output, then it qualifies as spam, otherwise it is minimising the set of UTXOs that must be maintained to verify transactions. Moreover, if a fan-out is unusual, this is enough to qualify a transaction for clustering, and we later determine if the transaction is spam by inspecting clustering results, and checking for dust outputs in clusters that seem to contain normal transactions. We analyse confirmed transactions that occurred between June 24 and July 17, 2015. The total number of transactions in this epoch is 3,321,429. To obtain k-means clusters, we perform k-means training on all transactions that were confirmed during the July spam campaign epoch, that occurred between July 7 and 17, the total number of transactions in this training epoch is 1,645,667. Using the cluster centroids from the spam epoch, we analyse the pre-spam epoch to validate our results. 4.5.2 Results and motifs We now discuss motifs found in more than 1.6M transactions that occurred during the spam epoch. Table 4.3 shows each cluster centroid’s features. As discussed earlier, these 72 Table 4.3: Cluster centroids (confirmed transactions) C TXs I O R P S F CDD V 0 48K 1.35 46 0.06 0.74 1.8K 0.0004 0.195 4.06 1 28 4.4K 1 4.4K 0.001 645K 0.04 0.06 0.0 2 896 106 1 103 0.17 16K 0.001 0.34 0.13 3 20 1.1K 1 1.1K 0.0008 162K 0.01 0.012 0.0 4 13.5K 31 1 31 0.04 4.7K 0.0002 0.02 0.006 5 16 1.4 13 0.15 535K 668 0.0004 25K 1K 6 9.5K 20 17 19 0.4 3.5K 0.0004 0.14 1.4 7 425K 1.1 2 0.8 1 224 0.0001 0.022 1.43 8 2 1 19 0.05 136M 787 0.0002 740K 3K 9 117K 1.2 11 0.14 72.43 561 0.0002 2.7 6.5 Table 4.4: Standard deviation of selected features (confirmed transactions) C I O R P S F CDD V 0 4 104 0.77 27 3.6 0.002 17 40 1 1.2K 0 1.2K 0 176 0.012 0.05 0 2 43 0.2 35 2 6 0.0005 4 1.8 3 403 0 403 0 60 0.004 0.02 0 4 8 0.1 8 0.8 1.2 0.0002 0.5 0.24 5 1 7 0.1 0.35M 0.38 0.0001 26K 1.2K 6 2 0.4 2 1.65 0.35 0.0002 0.5 4 7 0.4 0.9 0.4 9 0.1 0.0002 0.2 15 8 0.0 0.5 0 3M 0.02 0 0.2M 748 9 0.5 6 0.2 2K 0.2 0.9µ 177 70 centroids are the result of optimising WCSS, and are represented as the means of the values of all transactions in the corresponding cluster. Table 4.4 shows the standard deviation of the cluster centroids5. 1. Fan-in. Clusters 2 and 4 include about 14K fan-in transactions. The pattern is distinct: large I and one O (in rare cases O is for two addresses). The transactions vary in S due to variations in I, and a notable distinction is in CDD. Cluster 2 includes larger values for CDD, which indicates that the inputs are not used for rapid transfer of value. Moreover, these transactions may not have been used as spam per se, but are rather part of tumblers or mixers where a large number of inputs are collated into single outputs and the chain continues, in order to mix coins together and obtain relatively better privacy. These transactions involve long chains of many inputs to a single address, the last address then transfers funds to multiple outputs in fan-out transactions, and so on. A large number of fan-in transactions impact the Mempool, but minimise the UTXO set. 2. Fan-out. The fan-out pattern involves one or two addresses sending funds to many 5The notation used in the tables corresponds to the notation used for the transaction features defined earlier. Note that both tables include rounded values, while attempting to maintain distinctions for small values with the minimum amount of rounding necessary. For better presentation, we omit some features. 73 addresses, as shown in Clusters 0, 5, 8 and 9; the total number of transactions in these clusters is about 165K. These transactions increase the UTXO set. This pattern was dominant in the clustering results; it resulted in multiple clusters for fan-out transactions that differ in features other than R. A low value for CDD indicates a fast movement of coins. Note that Cluster 0 includes transactions that have a single address sending small amounts to more than 3K addresses. 3. Unable-to-decode. With 425K transactions, Cluster 7 includes the largest num- ber of transactions. The distinct feature of most of these transactions is a one-to-one mapping: one address sending to a single output that cannot be decoded. Moreover, the fees paid for these transactions (which are collected by miners since the output cannot be decoded) equal the default fee value of 0.1 mBTC per KB. Another feature of this cluster is the zero value for CDD (and low P ), which indicates rapid movement of bitcoins. 4. Dust. The final motif of the analysed spam campaign is the dust transactions we had previously discussed. Cluster 7 contains non-spam transactions; normal transactions are matched to this cluster since they look similar to unable-to-decode transactions (low values for most features). It is not straightforward to visually inspect the cluster samples and determine if they are indeed spam. Therefore, we parse the transactions in this cluster to determine which of them fit our definition of dust spam. We explain in a later section how we parse the results to find dust spam transactions. 5. UTXO cleanup. Clusters 1 and 3 include “clean-up” transactions, created by miners to collate spam transactions to minimise the UTXO, thereby decreasing the spam impact on the network. The output addresses value of these transactions may be zero, meaning that all the inputs are collected as fees by the miner who includes the transaction in a block. Clean-up transactions include “Brain wallet” addresses (discussed earlier). These two clusters are not categorised as spam, and the transactions are a consequence of the spam campaign. The number of inputs to these transactions range between 1K and 5K (resulting in a large standard deviation). Note that clusters 5 and 8 contain few transactions due to their unusually high P . Cluster 8, which contains only two transactions, is indeed interesting and earns its unique cluster: along with high P , the values of these transactions are around 2,500 and 3,995 bitcoins (that is almost $0.6M and $0.96M in respectively). Both transactions include a generous fee of 0.002 BTC. In summary Clusters 0, 2, 4, 6, 7, and 9 correspond to our definition of Bitcoin spam, including dust transactions and unusual ratios, while clusters 1 and 3 are a consequence of spam and not-spam motifs. 74 4.5.3 Validation It is important to note that we lack an external source to create ground truth for our results. Without a labelled data set, or a third-party spam list, we cannot measure the clustering results to be spam more accurately than matching the results to our definitions of spam. In order to find dust transactions, we check if P is low (less than 57M) and whether the transaction creates any outputs of 0.1 mBTC (about $0.02), which is the default fee value. We consider this a conservative estimate of the dust transactions involved in the spam campaign, and at the same time we consider the 0.01 BTC normally involved in dust checks to be too large. We also applied clustering to transactions that occurred in the pre-spam epoch6, between June 24 and July 7 (after filtering for dust and unusual ratios). The results are discussed in the next section, where we see a difference in the intensity of motifs before and during the spam epoch. This validates our clustering results: we find that the centroids obtained from training k-means, using the spam epoch data, can also detect spam patterns in non-spam epochs. 4.6 Impact on Bitcoin We now describe the effects of spam campaigns on the Bitcoin network, especially on users who send non-spam transactions, and the miners. For the users, we measure the change in transaction fees and transaction delays (i.e., the time between when we first observe a transaction in the Mempool and when the transaction is committed to the blockchain). A large amount of spam is likely to increase the backlog of unconfirmed transactions. As a result, transactions are delayed for longer time periods. With more intense competition, senders pay higher fees, in the hope that their transactions will be included sooner. For the miners, we measure the corresponding increase in the block reward. Figure 4.1 shows the clustering results in the non-spam and spam epochs. Note that in the pre-spam epoch (before July 7), clustering results show Cluster 1 transactions (UTXO-cleanup motif). This does not mean that miners were cleaning up spam; these transactions are similar to UTXO-cleanup transactions in terms of high I and low O, and similar P values. To highlight periods of the spam campaign, we measure the number of unconfirmed transactions in the Mempool, which indicates the amount of backlog in the network. Every minute, we take a snapshot of the Mempool and count the number of unconfirmed transactions. We take the average on a daily basis and plot the result in Figure 4.2. 6We used the cluster centroids obtained from clustering transactions in the spam epoch to cluster transactions in the non-spam epoch. 75 20 15 -0 6- 30 20 15 -0 7- 05 20 15 -0 7- 10 20 15 -0 7- 15 0 50000 100000 150000 200000 250000 Nu m be r o f t ra ns ac tio ns in b lo ck ch ai n not identified as spam cluster 0 cluster 1 cluster 2 cluster 3 cluster 4 cluster 5 cluster 6 cluster 7 cluster 9 Figure 4.1: A stacked bar chart showing the number of transactions per day in the blockchain. Note that the spam period is from July 7 to 17. Each major spike in the graph refers to a period of significant backlog. The first spike, which happened between July 7 and 17, corresponds to the spam campaign in our study. There are sporadic spikes between July and August, but we do not have sufficient insight on the cause. Finally, a spike appeared around September 13, when an anonymous group conducted another stress-test on the network with their “money drop” (as discussed earlier). As a result, a large number of transactions were created to compete for the free bitcoins, although only a few of them would be included in the blockchain eventually. Such a deluge of transactions caused the second backlog in Figure 4.2. We do not, however, consider these transactions to be spam. Focusing on the mid-July spam period, we next examine the number of transactions that were committed to the blockchain. We are interested in how each block allocates its scarce 1 MB of real-estate space to spammers and non-spammers. As shown in Figure 4.3, the number of transactions surged during the spam epoch. Between a quarter to half of the daily transactions have been identified as spam. As a baseline comparison, we also show that the number of spam transactions before the spam period is significantly lower, with the exception of June 30. Based on anecdotal evidence, some users were attempting to stress-test the Bitcoin network on a small scale, which resulted in a brief rise in spam7. 7http://motherboard.vice.com/read/wikileaks-is-now-a-target-in-the- massive-spam-attack-on-bitcoin 76 20 15 -0 6- 21 20 15 -0 7- 05 20 15 -0 7- 19 20 15 -0 8- 02 20 15 -0 8- 16 20 15 -0 8- 30 20 15 -0 9- 13 0 50000 100000 150000 200000 Av er ag e nu m be r o f t ra ns ac tio ns in m em po ol e ve ry d ay Figure 4.2: The average number of unconfirmed transactions in the Mempool every day. 20 15 -0 6- 30 20 15 -0 7- 05 20 15 -0 7- 10 20 15 -0 7- 15 0 10 20 30 40 50 60 70 To ta l t ra ns ac tio n fe es pe r d ay (B TC ) not identified as spam identified as spam Figure 4.3: A stacked bar chart showing the total amount of transaction fees every day. 77 For the non-spammers, the spam period was a time when both transaction fees and delays were higher than normal. We show the comparison in Figures 4.4 and 4.5. On average, the delays in processing non-spam transactions increases by 7 times, from 0.33 to 2.67 hours. Likewise, the average non-spam transaction fees also surged, increasing from 45 to 68 Satoshis for every byte of transactions (or from $0.11 to $0.17 per kilobyte of transactions)–an uptick of 51%. While non-spammers suffered, miners slightly benefit from the fee hike. As shown in Figure 4.3, miners were earning twice their normal fee-based revenue during the mid- July spam period, as compared with the non-spam period. However, even on days with maximum fees, the amount of extra mining income from fees was less than 1% of the block reward (which is 25 BTC per block, and about 3,600 BTC per day). The total transaction fees that spam transactions paid amounted to 201 BTC (or about $49,000) over a 10-day time period, a modest sum that caused a rather noticeable disruption to the network. 4.7 Discussion The spam campaign happened when the Bitcoin network was divided on a critical com- ponent of the protocol: the block size limit of 1 MB. The result was a fork between two camps: some want to raise the limit while others refuse to alter the rules set by Satoshi Nakamoto (Bitcoin’s creator). We can speculate that the spammer was motivated to launch a DoS attack to demonstrate the fragility of Bitcoin’s resilience if the block size limit is not raised. This is supported by an earlier spam campaign, where an online Bitcoin wallet service claimed responsibility under the pretext of “stress testing”. With regards to the methodology proposed in this chapter, we do not suggest that this model can be used to prevent Bitcoin spam completely, nor should it be used as such. It was used to measure and analyse spam after the fact, without the spammers being aware that they are being measured. Spammers can learn from this research what heuristics and features we used, to alter their motifs and adapt accordingly. Although we used dust checks to validate our results, this is not a foolproof measurement to accurately validate spam in Bitcoin. It is trivial to create a transaction that does not generate any dust outputs. However, if a transaction does not create dust, then the clustering algorithm matches that transaction to a cluster that highlights other features, particularly differences in ratio. We use dust validation when there is a possibility a cluster contains normal transactions. Since we defined unusual fan-out ratios to constitute spam transactions (and they are gathered in distinct clusters), along with our conservative measurement of dust, we believe that the results we had shown earlier provide a good estimate of the July spam campaign. 78 20 15 -0 6- 23 20 15 -0 6- 30 20 15 -0 7- 07 20 15 -0 7- 14 20 15 -0 7- 21 20 15 -0 7- 28 10-1 100 101 102 Av er ag e tr an sa ct io n de la y pe r d ay (H ou rs ) identified as spam not identified as spam Figure 4.4: Average transaction delay between when a transaction appears in the Mempool and when it is committed to the blockchain. 20 15 -0 6- 23 20 15 -0 6- 30 20 15 -0 7- 07 20 15 -0 7- 14 20 15 -0 7- 21 20 15 -0 7- 28 0 20 40 60 80 100 120 Av er ag e tr an sa ct io n fe es pe r d ay (S at os hi s/ by te ) identified as spam not identified as spam Figure 4.5: Average transaction fees per transaction per day. Note that the fees are normalised against the size of each transaction. 79 It is important to note that the July 2015 spam campaign on Bitcoin would be infeasible on many altcoins8 since they may use a different model for transaction fees. For example, Litecoin charges the mintxfee for each small output. Bitcoin could adopt a similar model, or a dynamic model for fees, possibly using clustering results to observe spam patterns, and change mintxfee accordingly. 4.8 Related work In their Bitcoin SoK paper [28], Bonneau et al. highlight open research questions and discuss issues with Bitcoin regarding stability and scalability, ranging from Bitcoin forks and network analysis to incentivising correct behaviour and adding resilience with proposed changes. The authors highlight penny flooding, as discussed in [91], which is related to our discussion of dust transactions. In the Appendix of the extended version of their SoK paper, the authors discuss in more detail Bitcoin’s stability and transaction validity. The extended version includes a discussion outlining options to overcome the drawbacks of maintaining the entire UTXO to process new transactions: using a statefile, updated incrementally with new data, it is possible to more efficiently retrieve transactions for verification using a transaction’s hash in O(log M), where M is the number of unspent transactions. It also possible to further minimise the data structure required to validate transactions using hash-based authenticated data structures as proposed in [81]. Becker et al. [22] discuss the possibility of denying service to the Bitcoin network using a virtual protest: protestors join forces to collectively execute a DoS attack by overwhelming the network and depleting precious block space (using transactions that are much larger than normal Bitcoin transactions). If a protest is underway, this could frustrate non-protestors and decrease faith in the resilience of Bitcoin to process transactions in a timely manner (Bitcoin’s main features include processing payments in minutes, as well as low transaction fees). This attack was called “Occupy Bitcoin” by Kroll et al. [74]. Our clustering approach is different from previous work aiming to find patterns in Bitcoin: we strip away identifying information (such as txid, addresses, etc.), and cluster transaction features in order to determine patterns, rather than linking transactions together to de-anonymise users. For example, Meiklejohn et al. [85] cluster transactions based on determining shared authority properties, while using transaction features similar to those used here. Other empirical research on detrimental effects on the Bitcoin network measures the lifetime of Bitcoin exchanges [89], through analysing daily transaction volumes and how exchange breaches affect survival time. Running a profitable exchange logically results in a 8Alternative cryptocurrencies (called altcoins) are similar to Bitcoin that utilise different protocols and have their own blockchains. 80 more lucrative target, hence a breach is more likely which leads to the eventual shutdown of an exchange. Another approach is to parse online forums to obtain data indicators of possible attacks on the network, as was done in [117]. 4.9 Conclusion This chapter presented an empirical study of a spam based “stress test” DoS attack against Bitcoin. Using our clustering based approach we find that 385,256 (23.41%) out of 1,645,667 total Bitcoin transactions were spam during a 10-day period at the peak of the spam campaign. We also show that this attack had a negative impact on non-spam transactions, increasing average fees by 51% (from 45 to 68 Satoshis/byte) and processing delay by 7 times (from 0.33 to 2.67 hours). This shows that an adversary who is willing to spend modest amounts of bitcoin (we estimated $49,000) to pay higher fees, can DoS Bitcoin. Follow up DoS attacks against Bitcoin have used other methods, such as “money drops”, and transaction malleability to degrade the operation of Bitcoin. We point out that changes to Bitcoin’s minimum fees could mitigate some of the spam motifs we witnessed. Our results show that exploration into Bitcoin transaction spam filtering techniques, and other Bitcoin DoS mitigation approaches, merit further investigation. 81 82 Chapter 5 Offline payment protocols In this chapter, we include the details for the offline payment protocol we created for a project, called DigiTally, in which we tried to use short message authentication protocols to produce a payment protocol that can be used offline to finalise payments. The motivation was to create a resilient payment protocol, that is not susceptible to issues raised in Chapter 2, and does not require network coverage to exchange messages with a centralised online server. To do that, we designed the protocol to preserve the integrity of transactions, by creating short authentication codes that can be exchanged between participants. Moreover, similar to the social trust proposal (Chapter 3) and micropayment schemes (Section 2.4.2), we aim to provide means for decentralised verification of payments to increase performance and reduce points of failure. The content of this chapter is adapted from our paper SMAPs: Short Message Au- thentication Protocols, by Khaled Baqer, Johann Bezuidenhoudt, Ross Anderson, and Markus Kuhn, which appears in the Proceedings of the International Workshop on Security Protocols 2016 [18]. 5.1 Summary There is a long history of authentication protocols designed for ease of human use, which rely on users copying a short string of digits. Historical examples include telex test keys and early nuclear firing codes; familiar modern examples include prepayment meter codes and the 3-digit card verification values used in online shopping. In this chapter, we show how security protocols that are designed for human readability and interaction can fail to provide adequate protection against simple attacks. To illustrate the problem, we discuss an offline payment protocol and explain various problems. We work through multiple iterations, or “evolutions”, of the protocol in order to get better tradeoffs between security and usability. We discuss the limitation of verifying such protocols using BAN logic. 83 Our aim is to develop human-friendly protocols that can be used in constrained offline environments. We conclude that protocol designers need to be good curators of security state, and also pay attention to the interaction between online and offline functions. We suggest that delay-tolerant networking might be a future direction of evolution for protocol research. 5.2 Introduction Mobile payment systems have transformed the life of people in less developed countries (LDCs), bringing a convenient means of exchange and store of value to people living far from any conventional banking service. For rural people, phone payments enable everything from pensions to farm subsidies to be paid directly and efficiently, reducing the possibility for corruption and extortion. Even for people living in large cities, phone payments have greatly cut the cost of financial services. A further key application is remittances: migrant workers who have moved from the countryside to the city can send cash home to relatives. However, the currently deployed systems, such as M-Pesa in Kenya, use SMS or USSD as their carrier, and they stop when the network does. This leaves hundreds of millions of the world’s very poorest people stranded. Living off-network in mountains, forests and small islands, they still have to use cash for transactions in their village, and are fully exposed to the depredations of dishonest officials. There are four major limitations of current phone payments. First, both merchant and customer have to be online, so they don’t work in areas without network coverage. Second, the cost of the SMS is unnecessary when half the customer’s transactions are with the same village merchant. Third, existing systems are mostly locked in to a particular telco and bank as they rely on SMS or USSD. This lock-in increases costs and prevents inter-scheme payments, which is a problem for migrant workers. Fourth, most payment systems operate online. Resilience is not considered thoroughly, at least not beyond providing redundant servers to process requests. When networks go down because of power cuts or congestion, or rural networks close at night because base stations depend on solar panels and have no batteries, payment services cease. Electronic purses for offline payments exist in the academic literature, and are also fielded, whether as standalone systems or EMV extensions (e.g., Net1 [3] and Germany’s Geldkarte1). However, they have not been implemented by LDC mobile providers, who can install SIM-toolkit applets. There are both engineering and business issues; we are less concerned with the latter here, as they vary widely between countries. The engineering issue is that existing purse systems are designed for complex messaging between the purses (e.g., between the purse in a smartcard and another purse embedded 1https://en.wikipedia.org/wiki/Geldkarte 84 in a parking meter or ATM) while phone-to-phone payments conducted in the absence of a network or direct link (NFC, Bluetooth, infrared, etc.) must by default rely on users copying numbers between their phones. The protocols must therefore be redesigned for usability, which means minimising the number of digits that need to be typed while supporting robust error handling and recovery. In this chapter, we show how security protocols that are designed for human readability and interaction can end up vulnerable to simple attacks. To illustrate the problem, we discuss our attempt to design an offline payment protocol, and the various problems we had to deal with. We present multiple iterations, or “evolutions”, that the protocol went through as we sought to get reasonable tradeoffs between security and usability. Our target demographic is phone users in LDCs, many of whom are illiterate. 5.3 System model The concept of operations is similar to existing phone payments, except that payments also work when the phone is offline. Sam is a GSM SIM card issuer who may also operate the offline payment system described in this chapter, which allows payments between SIM purses in areas with intermittent network coverage. In the following examples, Alice will play the role of a paying customer and Bob will be the shop owner receiving payment, but these roles can be reversed. Sam issues SIMA to Alice and SIMB to Bob. The SIMs of Alice and Bob interact with each other only via human decimal number entry and comparison. Therefore, this protocol has to ensure that SIMA and SIMB agree on their transaction history, in spite of the low entropy of the manually-executed challenge– response round trips. The SIMs send records of past payments back to Sam whenever they detect network coverage, for reconciliation between their bank accounts. In some applications, the network may be intermittent rather than absent, and so our protocols enable payment networks to support delay-tolerant authentication (see Section 5.5.2). Where both Alice and Bob have smartphones that can communicate directly (Bluetooth, Wi-Fi, NFC, etc.), a very capable offline payment system can be built with conventional tools. However, many poor people use simple GSM phones that cannot communicate with each other directly in the absence of a GSM network. Then the only way for SIM-toolkit applets to communicate across phones is that their users copy sequences of displayed decimal digits. Numerical transactions are already familiar from other systems such as airtime purchases and prepayment utility meters. It is well known, for example, that even illiterate people can cope with twenty-digit sequences provided these are arranged as five groups of four digits, which is enough for a 64-bit ciphertext, and is used in prepayment meters [8]. 85 5.4 Design evolution The following setup is required in each iteration of the protocol. SIMA and SIMB are identified in the protocols by unique, human-recognisable decimal numbers A and B respectively, typically their phone numbers. Sam embeds an individual symmetric 128-bit private key KA into SIMA and KB into SIMB, which are tamper-resistant to some extent. (Sam could generate these keys from a master key KS using a key-derivation function, such as KA = hKS(A), KB = hKS(B)). These per-card keys are each known only to Sam and to one single SIM card. 5.4.1 Basic protocol The basic payment protocol proceeds as follows2: 1. Alice agrees to pay Bob X and each of them enters both this amount and the other party’s phone number into their phones. 2. Bob chooses a 4-digit nonce NB and forms a 4-digit MAC C (using the shared secret key K) of B and X. He tells Alice the values (NB, C) where C = MACK(B,A,X,NB) mod 10 4 (5.1) with MAC being a 64-bit message-authentication-code function. 3. Alice verifies C. She now believes that she and Bob agree on the amount X and the payer and payee phone numbers A and B (with probability 0.999). 4. Alice authorises the transaction by entering her PIN; her purse decrements its balance by X and generates a 4-digit nonce plus a 4-digit MAC to bind her name and nonce to the data contained in the challenge C: (NA, R) where R = MACK(A,NA, C,NB, B) mod 10 4 (5.2) 5. Bob enters NA and R into his purse, and checks it increments by X. Verification. We then analyse this protocol via the Burrows–Abadi–Needham (BAN) logic [29]. We idealised the protocol as: B −→ A : {B,A,X,NB}K (= C) A −→ B : {A,NA, C,NB, B}K (= R) 2The steps performed by the users include the agreement on the amount X, the visual or verbal exchange of codes as displayed by the device, and the entry of codes into a device. Generating nonces and verifying MACs are performed by the SIMs. This applies to subsequent versions of the protocol. 86 We wished to prove that Bob the shopkeeper should trust the payment amount X, i.e., B |≡ X. This can only be deduced using jurisdiction rule, for which we need B |≡ A |⇒ X (B believes A has jurisdiction over X) and B |≡ A |≡ X (B believes A believes X). The former follows from our trust in the software, which has the property that it only creates ciphertexts that start with its own identifier. The value X is contained in the challenge C (but see section 5.4.2). The latter, that B |≡ A |≡ X, must be deduced using the nonce verification rule from ]R (R is fresh) and B |≡ A |∼ X (B believes A uttered X). Now ]R follows from the fact that R = h(K;A,NA, C,NB, B) and contains the nonce NB Bob just generated. B believes A uttered X from the software constraint already mentioned. In our original design, we did not include NB in Bob’s challenge, and as a result could not get the protocol to verify. This is in effect what happens with EMV, which in consequence is vulnerable to the preplay attack [27]. In our initial design, we also wondered whether C should contain Alice’s phone number too: C = MACK(B,A,X,NB). The BAN analysis showed this is superfluous. However, we include it for error-detection, as discussed later. Despite the fact that the protocol verified, there was an attack. 5.4.2 Evolution 1: Eliminating narrow pipes A crooked merchant Bob can perform the following attack against Alice: 1. Alice agrees on price X and Bob receives Alice’s phone number A. 2. Bob now chooses a higher price X ′. 3. Bob then repeatedly feeds X and X ′ to his SIMB, which will generate a new nonce NB each time and output a MAC C. Bob continues until he finds a colliding pair with the same MAC (X,NB) and (X ′, N ′B) such that MACK(A,X,NB, B) ≡ MACK(A,X ′, N ′B, B) ≡ C mod 104. This will take just a few dozen trials for a 4-digit MAC. 4. Bob aborts all the trial transactions except for the last one for (X ′, N ′B). 5. Bob then gives (NB, C) to Alice, but asks his own SIMB to proceed with the last one, using N ′B and X ′. This way, Alice makes a payment for X, but Bob receives one for X ′ > X, violating conservation of money. 87 The vulnerability is that R only includes C, not X, and while C depends on X, C does not have enough entropy to prevent an online collision attack by Bob against SIMB that allows X ′ to replace X. Once this is noticed, the fix is easy: include the amount X in the input to the payment MAC R, rather than the challenge C. Such attacks are beyond the scope of BAN, which was only intended for protocols involving cryptographically long nonces and MACs, and does not keep track of guessing entropy or collision risks. Some more modern cryptographic verification tools (e.g., CryptoVerif [25]) can quantify such probabilities. Anyway, the repaired protocol now runs: 1. Alice agrees to pay Bob X. 2. Bob chooses a 4-digit nonce NB, forms a 4-digit MAC C with B and X, and sends Alice: (NB, C) where C = MACK(B,A,X,NB) mod 10 4 (5.3) 3. Alice verifies C to ensure they agree on payer, payee and amount. 4. Alice authorises the transaction by entering her PIN; her purse decrements its balance by X and generates a 4-digit nonce plus a 4-digit MAC: (NA, R) where R = MACK(A,NA, X,NB, B) mod 10 4 (5.4) 5. Bob enters NA and R into his purse, and checks it increments by X. The verification proceeds as before, although by now we might place less reliance on a BAN-logic verification of a short message authentication protocol. 5.4.3 Evolution 2: Transaction chaining The revised protocol still raised security concerns: 1. Bob could try to add money to his SIM card by faking transactions with fake customers and just guessing the response R. Each attempt will succeed only with probability 10−4, and repeated attempts can be blocked by a retry counter. We can also mix up R with NA. However, the detailed design is far from trivial. For example, Bob could connect SIMB to a PC to automate an attack, and interleave guessing attempts with real payments from a customer SIM he controls. In the worst-case attack, Bob is the local payment service agent, and has hundreds of customer SIM cards in stock. We need to make the fraud probability acceptably low without making the authorisation code too long, or otherwise making operations overly complex or fragile. 88 2. Alice can similarly fake a transaction with probability 10−4, but this may be less of a concern in practice, as Alice cannot repeat thousands of transaction attempts against SIMB without collaboration by Bob; while a colluding Bob could just run the attack without Alice (as above). 3. Bob can also try to fake transactions with real customers A, by keeping a record of their MACK(A,NA, X,NB, B) replies. In such a fake transaction, Bob can choose A and NA, and if the real Alice has already paid n times in the past for a regularly bought item of fixed price X, then Bob has enough data to be able to look up a valid pair (NB, R) to complete a fake transaction with probability n · 10−4. The last attack is of particular concern if Alice makes daily purchases of the same price X for years, as the probability of Bob being able to fake a transaction response R from SIMA approaches one. Our solution is to establish a payment session between Alice and Bob that maintains additional shared entropy stored in both SIMA and SIMB. We include this state in the MAC inputs, such that knowledge of past MAC responses no longer helps Bob guess future ones. We also replace NA and NB with MACs of the transaction data that only the bank Sam can verify. Then even if a fake transaction is accepted by a SIM, it will still be spotted when one of the parties eventually uploads it to Sam. Most phone payment transactions in LDCs are to familiar recipients. A villager will do most of their shopping at the one village store; a migrant worker will make most remittance payments to his wife or perhaps his mum back in his home village. So the obvious next evolution is to look for ways in which subsequent payments can be made easier. This has been a familiar strategy in established online banking systems for about 30 years now. If the security of payments authenticated using short codes must depend on maintaining as much shared security state as possible, then why not use a hash chain of all transactions in the current session, rather than just the current transaction context? Let a payment session be a hash chain maintained in both cards. Its state is kept in each card as a 256-bit value Si, along with a transaction counter i. When A and B start a new series of transactions, both their SIMs initialise this session as a hash state of T0 := (A,B) (5.5) S0 := H(0, T0) (5.6) i := 0 (5.7) where H is a collision-resistant hash function with 256-bit output (e.g., SHA-256 or SHA-3). The transaction counter i records how many payments have been committed in this session. Alice and Bob now should have the same value of S0. 89 SIMA and SIMB also set up a shared transaction key KAB. At this stage we can assume it is simply derived from their phone numbers using a key derivation function and the universal shared secret K: KAB = hK(A,B). In Section 5.5 we discuss options for mitigating the risk of a shared master secret. Once the payment session is established, Alice can pay Bob as follows: 1. Alice and Bob agree on a price X. 2. Alice and Bob then select the payment session to be used, and their SIMs retrieve not just A and B but also the hash chain values (i, Si). 3. SIMA checks that X is within limits agreed with Sam, and that Alice’s on-card purse value MA has enough funds for the transaction, presumably MA −X ≥ 0. It aborts the transaction otherwise. Likewise, SIMB checks whether both X and its purse value MB +X are within limits agreed with Sam, and aborts if not. (Transaction limits agreed with Sam may be variable, and depend on for example X, i, A, or B.) 4. Bob’s SIMB generates and displays a challenge of nc,1 deterministic digits that Alice and Sam can verify, nc,2 deterministic digits that Sam can verify, as well as nc,3 digits of random entropy to help ensure that payment sessions do not repeat – an nc = (nc,1 + nc,2 + nc,3)-digit decimal challenge message Ci+1 = E nc hKAB (Si) [(MACKAB(i, Si, X) mod 10 nc,1) ‖ (MACKB(i, Si, X, FB) mod 10 nc,2) ‖ NB] (5.8) which Alice copies into her phone. Here 0 ≤ NB < 10nc,3 is a number picked by SIMB and ‖ is concatenation of decimal digits. Enc is a pseudo-random permutation over Z10nc (see [24]), to ensure that adversaries cannot be certain about the function of individual digits without knowing KAB. FB is optional other information that can be included and that SIMB will eventually transmit to Sam (but not to SIMA) such as MB or a timestamp, added as an entropy source and as forensic evidence of disputed transactions. 5. Alice’s SIMA then calculates an nr = (nr,1 + nr,2 + nr,3)-digit response Ri+1 = E nr hKAB (Si,Ci+1) [(MACKAB(i, Si, X, Ci+1) mod 10 nr,1) ‖ (MACKA(i, Si, X, Ci+1, FA) mod 10 nr,2) ‖ NA] (5.9) containing nr,1 digits that Bob can verify, nr,2 digits that Sam can verify, and nr,3 digits NA chosen by SIMA. FA is optional other information that SIMA will record and eventually transmit to Sam, such as MA or a timestamp. 90 6. SIMA then updates its non-volatile state and on-card transaction records: MA := MA −X (5.10) i := i+ 1 (5.11) Ti := (Si−1, X, Ci, Ri) (5.12) Si := H(i, Ti) (5.13) 7. SIMA finally displays its response message (5.9). 8. Bob types Ri+1 into his phone, which reports the success or otherwise of the trans- action. The point of hash chaining is twofold: • Even if someone gets lucky and guesses a correct authentication code Ri for an individual transaction, this will still leave the underlying security state Si inconsistent between Alice and Bob, between whom subsequent transactions will then fail with high probability. • The entropy of the security state Si makes it less likely that a query to MACKAB is ever repeated. This reduces the risk that knowledge of past values of R and C can help an attacker to predict future such values, and use these to set up fake transactions. There are various attacks to consider, for example Bob can complete a fake transaction on SIMB with probability 10 −nr,1 , or duplicate sessions to double-spend Alice’s responses. To fine-tune such risks, we can vary the number of digits allocated in the exchanged messages C and R between the first transaction and later transactions in a session, for example: nc,1 nc,2 nc,3 nr,1 nr,2 nr,3 first transaction 1 1 3 6 3 3 later transactions 1 1 1 4 3 0 In addition, we need to limit the number of failed transactions: • A SIM can keep a record of failed transactions and can ensure that only a fraction of all transaction attempts per session are allowed to fail. For example, a payment session can be terminated once 5 of the last 10 transaction attempts have failed. Such retry limits should be implemented only on a per-session basis, to reduce the risk of denial-of-service attacks, where an adversary deliberately exhausts a SIM-wide retry limit. 91 • If retry limits apply per session, we then also need to limit the total number of sessions that a SIM can participate in, to well below 10nr,1 , e.g., a few thousand. This way, Bob cannot iterate fake transactions across many non-existing customers. The exact limits and number of authentication-code digits can be tuned based on risk analysis and usability studies. They could also be made variable, though the usability consequences would have to be tested carefully. Likewise, the content of FA and FB remain for further study. 5.5 Mitigating the risk of a shared master secret 5.5.1 Evolution 3: Group key scheme If smartcards were perfectly tamper-resistant, then the initial design would be largely complete at this point. Smartcards are certainly more resistant than twenty years ago, when large-scale compromises of pay-TV cards were pretty well an annual occurrence. This is partly due to technological improvements such as side-channel countermeasures, top-layer sensor meshes, and randomised place-and-route; and partly due to minimising the value that can be extracted by cloning a single card. In EMV, for example, the keys authenticate transactions on a single account, so the extractable value is typically in the low tens of thousands of dollars, rather than the millions that could be made from cloning a pay-TV card. As long as reverse engineering a card costs tens of thousands of dollars and the attacker needs perhaps a dozen identical cards to work with, this is probably enough. The overlay SIMs used in our field trial are not certified to EMV standards, though the vendor assures us that an EMV compliant overlay product will be available in due course. So in the short term we might deploy a two-tier system in which merchants get a high-security EMV-certified SIM card containing a master key while customers get a medium-security overlay SIM card containing a derived key, namely their phone number or card serial number encrypted under the master key to provide a diversified key. This is the approach used for some 20 years in Net1 and for a decade in Geldkarte. It would thus have the advantage of being familiar to banks and their insurers, with a zero loss history and the comfort that comes from reusing existing standards and business processes. If anyone seeking to monetise a break of a card needs to extract money from merchants (as the other users are too dispersed and too poor) then the Geldkarte approach could make perfect business sense. However, a substantial volume of LDC phone payments are migrant remittances, where neither the sender nor the receiver is a merchant. A second approach is combinatorial keying, which was proposed in the 1990s in the context of satellite TV, where it was known as “broadcast encryption”, and attempted to 92 give each pay-TV subscriber a set of keys of which some suitable subset could enable her to decrypt each programme. If a card were cloned, then that particular subset could be blacklisted without affecting the great majority of other subscribers; only a small number of subscribers with the same subset of keys would have to be issued with replacement cards. A similar idea can be adapted here, where the communications are many-to-many. For example, we can divide all users into d = 100 key groups, give everyone 100 out of 5050 keys, and thus require an attacker who wanted to impersonate any user to clone 100 cards rather than just one. In more detail: Sam generates a set of shared 128-bit group keys, in the form of an upper triangular matrix of d(d+ 1)/2 keys (Ki,j)0≤i≤j gB. (5.17) KgA,gB or KgB ,gA will be available in both SIMA and SIMB, and both can use Kg to select it. Any other SIMU picked at random (for reverse engineering) by someone who does not know Kg will contain it only with probability 2d −1 − d−2 (i.e., 2d−1 if gA 6= gB and d−1 if gA = gB). This is straightforward to do if each user gets a SIM card issued by a payment service provider that is also an MNO. Where the two are different, and especially if the payment service runs on an overlay SIM independent of the phone company’s SIM, it is not so straightforward, because of the need to establish a trustworthy link between the user’s phone number and her key group. This can be done by an online protocol if the phone is online as the overlay SIM is first fitted, but otherwise requires copying quite a few digits. 93 5.5.2 Evolution 4: Delay-tolerant Needham–Schroeder In many applications of interest, both Alice and Bob will get network connectivity from time to time as they travel to town or through areas with mobile service. There is significant research in more general store-and-forward mechanisms or delay-tolerant networks for extending service in LDCs. Even in the few cases where one of the parties does not ever travel to town, connection can be established via a data mule. For example, if Bob is old and housebound, while his daughter Alice works in the big city and sends him money, a neighbour who travels past Bob’s hut can take data to and from a point of network presence. This should let us establish authenticated key exchange, although it appears to be unexplored territory from the protocols community’s point of view. An initial idea is to turn the bug in the Needham-Schroeder protocol [96] into a feature. The fact that Bob has no guarantee of the freshness in Alice’s initial exchange with Sam, and that she can therefore pass on a key packet to him a month or even a year afterwards, can be used to create a delay-tolerant version of the protocol. In order to preserve the original notation of [96], we reverse the roles in this iteration whereby Alice (A) is the merchant travelling to the city, and Bob (B) is the housebound villager who requires relayed messages from Sam (S). We assume that Alice and Bob have just done the first transaction in a new payment session, and want Sam’s help to set up a KAB that is present only in their SIM cards and thus cannot be compromised if any other card is reverse-engineered. 1. Alice catches the boat into town, and SIMA detects a connection. SIMA automatically uploads transaction logs to Sam (to update shadow accounts for reconciliation). She also sends a request to authenticate with any new counterparty, in this case Bob3: A→ S : A,B,NA 2. Sam sends KAB, via SMS or USSD, encrypted with the shared key KAS: S → A : {NA, B,KAB, {KAB, A}KBS}KAS 3. When Alice returns to the village, SIMA can display, during transactions discussed below, the authenticated shared key: A→ B : {KAB, A}KBS 4. The final two steps of the Needham–Schroeder protocol are performed offline between the transacting parties, by showing authenticated nonces to obtain agreement: B → A : {NB}KAB A→ B : {NB − 1}KAB 3The notation here follows the original Needham–Schroeder protocol [96]. 94 Note that an implementation would be likely to optimise this somewhat, for usability reasons. The final two steps can be reduced to challenges and responses of a few digits each using the methods in earlier sections, but the key packet {KAB, A}KBS is harder. If KAB were a 128-bit key that alone would require copying 40 decimal digits. Users of prepayment utility meters (including illiterate users) regularly copy 20-digits strings that represent DES-encrypted commands, so perhaps 40 digits can be done as a very occasional procedure. More likely, an operator might take the view that 30 digits are enough; an 80-bit key would not be the weakest link. Again, this requires real testing in the field. Where users interact with merchants directly, the merchant can perhaps help (subject to worries about the merchant ripping off vulnerable customers). But where a helpful neighbour is acting as a data mule for an elderly and housebound Bob, and carrying messages scribbled on bits of paper, 40 digits may well be too much. More research is needed about how to integrate online payments, offline payments, data mules, and delay-tolerant networking in general. The value of this research is not just to extend and enhance payment systems per se, but also all sorts of other electronically enabled services, from pay-as-you-go solar panels to agricultural subsidies and payments from aid donors. To evaluate such systems we need to gain insight into what transactions are taking place online versus offline; which transactions are critical (e.g., large transactions or cash-outs); incentives for contacting back-end systems; incentives for carrying data on behalf of other users; and realistic threat models. Should we assume that everyone in a village would happily conspire to cheat a bank, or that the headman or store owner in a village might be happy to be responsible for the village’s good behaviour, or that villagers might vouch for each other? 5.6 Conclusion Phone payments have been transformative in Africa, South Asia and elsewhere, helping millions of the very poor to raise themselves from poverty, and helping the less poor too. Extending them to areas without network service will continue this process but will also give disproportionate benefit to the very poor as the existing networks have been rolled out first to the less poor regions. The Gates Foundation advertised for a technology to do this and we received funding from them to build a prototype. The mission is to design an offline electronic purse system optimised for use in less developed countries. Unlike existing systems such as Geldkarte and Proton4, our design aims to maximise usability in off-network transactions by minimising the number of digits 4See https://en.wikipedia.org/wiki/Proton_(bank_card) and https://web.archive.org/web/ 20110511165455/http://www.atosworldline.be/index/en_US/5118014/5126207/Proton.htm 95 a user has to speak, hear and type, while providing robust recovery mechanisms for the inevitable errors and making sure there is not any scalable attack that is large enough to care about. We discuss the prototype and initial usability evaluation in the next chapter. In this chapter we have set out the evolution of our proposed core payment protocol. There are a number of lessons for protocol researchers. First, when we start dealing with authentication protocols with short strings of digits, our intuitions about security can fail us, and formal verification does not always help. Our first attempt at a protocol was vulnerable to a simple collision attack, because we used the entropy that was visible rather than all the entropy available. Worse, the defective protocol verified just fine using the BAN logic. That does not imply that BAN is useless (as we did find another design flaw) but just that it is not enough in this context. We leave to future work whether other verification tools can find such flaws. Second, when dealing with very short authentication codes, transaction chaining can be useful, and is a reasonable next step once we have learned to use all the available entropy. It had already appeared a generation ago in EFTPOS systems in Australia, where MAC residue was used for key updating, but then apparently had been forgotten. Third, when a system is offline part of the time, then the interaction between the offline component and the online one bears careful study. If much of the world will have to live with delay-tolerant networking, then it is time to start thinking about how to incorporate delay tolerance into the infrastructure, rather than its being an occasional hack that developers will often get wrong. This may seem counter-intuitive to some providers, given that banks in the developed world spent the first half of the 1990s ripping out offline capabilities from ATM systems, and the second half reducing merchant floor limits for credit-card transactions. Yet it is noteworthy that when they introduced EMV in the 2000s, offline capabilities reappeared, and they did so by placing carefully calibrated amounts of trust in largely tamper-resistant smartcards and terminals. A lot of services are going to have to coexist with network outages. That means, we suggest, that the next challenge for the protocols community will be to work out ways to build support for delay-tolerant networking into the infrastructure. We hope the examples here will help bootstrap the discussion. 96 Chapter 6 Usability evaluation of offline payments We built a prototype system for our project, DigiTally, that implements the offline payment protocol we discussed in the previous chapter1. This project aimed to create an offline payment system that works on cheap phones to alleviate the problems stemming from the lack of network coverage. Design goals included prioritising usability while avoiding unfamiliar hardware. Another design goal we aimed for is to create a simple and resource-efficient protocol that works on resource-constrained hardware, such as SIMs and feature phones (not smartphones). We also prioritised speed (quick results of transaction processing), and aimed to minimise or mitigate user errors. We use overlay SIMs to load our DigiTally code into a tamper-resistant environment to safely execute cryptographic operations, and protect the DigiTally protocol. Using overlay SIMs allows us to have a portable solution that works on any phone (including smartphones), that can be inserted into a user phone so that the DigiTally application is accessible from a SIM menu. The process is familiar to mobile payment systems users since those systems are also accessible through SIM menus. The content of this chapter is adapted from our paper DigiTally: Piloting offline payments for phones, by Khaled Baqer, Ross Anderson, Jeunese Adrienne Payne, Lorna Mutegi, and Joseph Sevilla, which appears in the Proceedings of the Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017) [20]. 1The DigiTally project webpage is available at https://www.cl.cam.ac.uk/~kabhb2/DigiTally/ and archived at https://web.archive.org/web/20171008121605/https://www.cl.cam.ac.uk/~kabhb2/ DigiTally/. 97 6.1 Summary Mobile payments support a range of services in many less developed countries including everyday payments, migrant remittances, credit, tax collection, and welfare benefits. These services depend entirely on the mobile phone network as their carrier, so they stop where the network does. This leaves millions of the very poorest people stranded – people living in remote areas where there is little to no network service. It also leaves urban users at the mercy of network congestion. We developed a prototype system, DigiTally, which lets users make offline payments by copying short strings of digits from one mobile handset to another. Offline payments are already used for electricity (both in prepayment meters and pay-as-you-go solar); can we extend them into a general-purpose payment system, to increase service resilience in the face of network congestion or outage, and provide service to currently excluded areas? We report the results of a preliminary study with an early prototype of DigiTally, tested on participants from a university in Nairobi, Kenya. The code-sharing process presented a possible usability challenge. To explore this and other aspects of our first prototype, DigiTally was introduced to Kenyan participants in order to resolve any major issues before a later field trial. We discuss the lessons learned from our field visits and initial usability evaluation; we hope that this contribution is helpful for researchers and policy makers interested in mobile payments and financial inclusion. We also present our findings and observations. We found that, although offline payments involve copying codes in both directions between the payer’s phone and the payee’s, the extra workload was acceptable to most users. 6.2 Introduction Because of the strong positive effect on development, the Bill & Melinda Gates Foundation called for innovations that could increase the uptake of mobile payments2. One of the largest impediments is that current systems operate entirely online; both the payer and the payee have to be able to communicate with the payment system server for a payment to be completed. This excludes millions of people living in remote areas with no network service; such people make up 10-40% of the population depending on the country. It also makes payments harder in the event of network congestion (we have observed 30-second delays in down-town Nairobi). Additionally, where the payment service operator is not the same firm as the Mobile Network Operator (MNO), charges become an issue. The main contribution is to describe a preliminary study that took place at Strathmore University in Nairobi, in September 2016. We set out to establish whether DigiTally was 2Enable Universal Acceptance of Mobile Money Payments: https://gcgh.grandchallenges.org/ challenge/enable-universal-acceptance-mobile-money-payments-round-14. 98 usable in three different environments: a coffee shop, a campus bookshop, and a cafeteria, and by students from a range of backgrounds. These students were experienced users of M-Pesa and thus able to compare it with DigiTally; we also got assessments from the checkout staff. Further contributions are as follows: we analyse a security technology and offer lessons learned from a preliminary usability study. This chapter should be of interest to researchers interested in development, and also to those interested in evaluating payment systems; both communities can benefit from our insights. Documenting the study in one publication should help them and others interested in this type of research. We discuss the background and related work in Section 6.3. We describe the technology in Section 6.4, and we discuss our method and results in sections 6.5 and 6.6. We describe our observations in Section 6.7, and provide more discussion in Section 6.8. Finally, we present our conclusions in Section 6.10. 6.3 Background and related work We discussed mobile payment systems in Section 2.3. That section is relevant to understand how mobile payments work and what principals are involved in processing transactions. Jack and Suri discussed the economics of M-Pesa in [66], highlighting issues that affected system uptake, including liquidity and network reliability. Zimmerman and Baur discuss the challenges facing financial inclusion efforts [124], including network coverage and reliability, liquidity, complexity of user interfaces and payment processes, the lack of dispute resolution, and the lack of customer protection against fraud. Dupas et al. report that a significant proportion of the participants in their study listed fraud, unreliable services, and transaction fees as issues [49]. The goal of our work is to tackle the network coverage and unreliable services problems by processing payments offline reliably and deterministically, and to simplify the user interface as much as possible by mimicking familiar mobile payment systems. Furthermore, we aim to decrease transaction fees to encourage users to process transactions electronically. The purpose of this research was to investigate possible usability challenges to be tackled in subsequent prototypes. Offline payment systems have already been implemented, such as Geldkarte3 in Germany and Net1/UEPS [3] in South Africa, but require dedicated devices or unfamiliar hardware that can be costly to replace if lost or stolen. Similar systems might perhaps be implemented on modern smartphones if both the payer and the payee had them. However, most users in less developed countries still use feature phones that do not have NFC, Bluetooth, or cameras; these phones cannot communicate data automatically in the absence of a network. We also minimise the assumptions we make regarding what features are available on users’ phones. 3Geldkarte website: https://geldkarte.de/. 99 Our goal is to design a system that operates without relying on such features, in case some of them are inoperable. For example, if we rely on a camera and it is damaged, the entire system is useless until the user has their camera fixed. By operating within these constraints, we are able to design a system that works on all low-end mobile devices; our project aims to provide a solution for the poorest demographics. The use of feature phones emerged as a requirement during trips to rural Kenya. Users in Busia county, for example, specifically requested solutions that work on feature phones. One person commented: “Don’t give us [systems that work on] smartphones. We don’t have those things and we don’t know how to use them.” When asked what kind of phones users owned, all but one person (in a group of more than 20), held up a feature phone; the non-feature phone was an old BlackBerry. Reliance on feature phones (called ‘kabambes ’ in Kenya) is due to both battery life and cost (in terms of price as well as maintenance). With such constraints, the only way to transfer information offline is for one phone to display it, and for a human to type it into the other phone (a similar approach used in device-pairing methods). Kainda et al. [71] looked at the tradeoff of usability and security with regards to different device-pairing methods, using out-of-band channels, as it applies to various device usage or capability restrictions. Their results show that typing strings (“copy & enter”) ranked first in terms of the aforementioned tradeoff. They also recommend that system designers take into account factors that affect that tradeoff including user conditioning, user motivation, security failures, and attentiveness. Moreover, transferring value by copying digits is well established in other applications, such as prepayment electricity meters [8]. 6.4 Technology An early design goal we set was that DigiTally must not require users to operate any unfamiliar hardware, and that the transaction flow should be as close as possible to the familiar one. This leads immediately to the challenge of programming feature phones: different operating systems make it difficult to implement and test applications for various models, leading to large costs for maintenance and support. To minimise these costs within our constraints, the practical approach is to program the SIM card in the user’s phone. SIMs were designed to host multiple applets in secure containers, which can prevent one applet from accessing other applets’ data. A SIM provides a secure environment that we can control and that is compatible with all mobile devices adhering to the Global System for Mobile Communications (GSM) and smartcard interoperability standards. This means that we can program our system on a SIM, and insert it into any device that accepts a SIM. The user can move the SIM from one device to another, allowing for portability. SIMs also provide valuable built-in features, including 100 Figure 6.1: An overlay SIM (top) and a regular SIM (bottom). The top part of the overlay SIM can be peeled off and stuck on top of a regular SIM atomic operations and rollback mechanisms, as well as the ability to store secure tokens and cryptographic keys in a tamper-resistant chip. Feature phones normally have a single SIM slot already taken by the SIM issued by the Mobile Network Operator (MNO). MNOs do not generally let anyone else program their SIMs, but there is a new technology that can be used to circumvent this restriction. This is the overlay SIM (or sticker SIM): a SIM card only 120 microns thick that can be inserted between the regular SIM card and the phone. Figure 6.1 shows an overlay SIM and a regular SIM. Overlay SIMs were developed to support low-cost mobile roaming. They can also be used as a proof-of-concept prototyping environment, and to bypass MNOs’ restrictions on devices if necessary. We used the programmable overlay SIM as a regular SIM, inserted in the single SIM slot. The same overlay SIM can also be stuck on top of an existing non-overlay SIM, and our system will work seamlessly on the device while allowing it to still work as a phone4. 6.4.1 Overview of DigiTally We designed and developed DigiTally as a Java Card applet accessed through a user’s phone (Figure 6.2a). This applet can be loaded on an overlay SIM or a regular SIM. We chose to use a regular SIM for this study (for reasons we discuss in Section 6.5.4 item 2). As discussed earlier, having an overlay-SIM ready applet enables deployment on feature phones even if the MNO chooses not to install DigiTally on their own regular SIM. Rather than asking participants to use their own phones, we provided participants with dedicated phones (Nokia 130), preloaded with a SIM that included the DigiTally applet. 4Overlays SIMs are used in China for roaming purposes, and were used in Kenya by a local bank to provide financial services to their users and to contest the local phone-payment monopoly. Overlay SIMs work on top of standard carriers’ SIMs. 101 The applet includes a menu that mimics existing mobile payment systems to capitalise on users’ familiarity with them. The applet offers the user options such as Balance, Send Money, and Receive Money (Figure 6.2b). Figures 6.3 and 6.4 illustrate the steps required to complete a transaction. The icon for the applet displayed in Figure 6.2a is enforced through the mobile OS. When a single SIM is present, selecting the SIM icon will display the applets that reside on that SIM. When multiple SIMs are present, such as in the case of using an overlay SIM on top of a regular SIM, selecting the SIM icon directs the user to two different menu options that represent the applet names in the two SIMs. Note that there could be more than one menu option displayed after selecting the SIM icon, if one or both of the SIMs offer more than one applet. There are two main differences between traditional mobile payment systems and DigiTally. The first difference is that the SIM in an offline payment system stores the balance as a value counter, and the payment protocol changes this local balance in the SIM, whereas a traditional mobile payment system merely sends requests to a backend system to process operations on the user’s balance. The second difference is the Receive Money option: in the absence of an online payment server, the recipient must be involved in completing a transaction. In traditional mobile payment systems, the recipient is not actively involved; they merely get an SMS saying how much money has arrived and from where. In offline payments, both phones must be involved in a transaction. The protocol is designed so that the parties learn immediately whether the transaction has failed or completed. Note that DigiTally does not require data to be sent over the network and can, therefore, work completely offline. For the purposes of this chapter, we focus on the codes exchanged by the users to complete transactions. We include below a summary and overview of the protocol discussed in Chapter 5 and as implemented in the DigiTally applet. 6.4.2 DigiTally codes We now outline the stages required to complete a DigiTally transaction. We provide a simplified version of the protocol to emphasise the generation of codes required to complete a DigiTally transaction. The prototype implementation included the full protocol as discussed in Sections 5.4.3 and 5.5.1 to implement transaction chaining and group key scheme. Moreover, with regards to the length of codes, we chose to implement and evaluate 8-digit codes for the challenge (Code1) and response (Code2) 5. There are two codes involved in completing a transaction. To simplify the discussion, we will assume that Alice is paying Bob. 5Refer to Section 5.4.3 for a discussion about the length of the codes. 102 1. Code1: After the payee B (Bob) has entered into his SIM the transaction amount X and the phone number of the payer A (Alice)6, his SIM generates a random nonce (NB), and then computes a MAC on A, B, NB, X and the log of previous transactions ` between the two parties7. This MAC and the nonce together make up the 8-digit Code1, which is shown on the payee’s device (Figure 6.2c). Alice similarly enters into her phone the amount X and Bob’s number B8; it prompts her for Code1, which she enters (Figure 6.2d). If the two parties agree on X, ` and each others’ identity, then Alice’s phone accepts Code1. If there’s a disagreement – whether due to attempted cheating or an honest mistake – her phone will generate an error9. 2. Code2: If Code1 is correct, Alice’s SIM card decrements her account’s available balance by the transaction amount X and generates Code2 to authenticate the transaction. Code2 is also 8 digits long; it consists of a 4-digit nonce NA generated by Alice’s SIM and 4 digits from a MAC on A, B, X, ` and NB. Alice then shows or tells Code2 to Bob (Figure 6.2e). He enters Code2 into his phone, and, if it is valid, his SIM increments his balance by X and a transaction log is displayed on his phone (Figure 6.2f). A similar transaction log is shown to Alice to confirm the completion of the transaction (decrementing her balance by X). This is the simplest payment protocol we could devise that enables value to be transferred from Alice’s card to Bob’s by copying 8 digits in one direction and 8 digits in the other. Its security was analysed in the previous chapter. Our focus now is usability, and there is useful precedent from prepayment meters. The STS prepayment electricity meters widely used in Kenya transfer value using a 20-digit number, presented as five groups of four digits [8]. A householder buys codes from an ATM or sales agent and copies them into their electricity meter; codes can also be bought online, using M-Pesa. Thus, we know that our prospective customers can copy digits, and that even illiterate people can use a phone. The object of the study is to establish whether the DigiTally transaction flow is similarly usable. Some care is necessary to ensure robustness in the face of mistakes. The DigiTally implementation includes various segments that execute atomic operations (enforced by the Java Card platform) to ensure that critical components of the payment protocol are fully executed (or reverted to their initial state if a problem occurs). This provides the ability 6In our trial, user identities are randomly generated numbers, each simulating a phone number. 7If no previous transactions exist, then the first transaction initialises the relationship between the two parties. 8Alice and Bob can pick each others’ names from a menu. The first transaction stores the contact’s details, and subsequent transaction can later retrieve information from the locally saved contacts (on the phone or SIM). 9For example, if Alice enters $4 on her device, and Bob enters $5, then Bob’s Code1 will generate an error on Alice’s device. The same thing happens if the wrong phone number is selected. 103 to create reliable checkpoints. We checkpoint the transaction when Code1 is successfully generated so that a transaction can be recovered if, for example, a merchant accidentally generates a second, new Code1 before the customer replies to the first one. Additionally, as Alice’s balance is decremented prior to displaying Code2 on the payer’s device, we have to ensure that if she is interrupted (e.g., by a flat battery), her SIM can still retrieve Code2 later (Code2 is saved in the final checkpoint of the transaction, and can be retrieved from the “last transaction” log). Checkpointing was also tested (see Section 6.7.2). DigiTally codes do not have to be kept secret, since they can be used only once and only by the payer and payee in a specific transaction. There is no added risk if other users observe the transaction or overhear the codes being exchanged verbally. This is in contrast to mobile payment systems that rely on secret codes that compromise the users’ security if intercepted, and impose a burden on the payer to deliver them to the recipient out-of-band. Users of feature phones cannot use a secure messaging service, so probably have to use voice or SMS and hope for the best. As well as the (small) technical fraud risk, such codes can create anxiety and make dispute resolution problematic. 6.5 Method The usability evaluation of DigiTally was conducted by the author of this thesis from the University of Cambridge (UK) and researchers from Strathmore University (Kenya), in early September 2016. Participants from Strathmore University (hereafter referred to as “the university”) performed real-world transactions using DigiTally and then answered questions about their experience. As well as open-ended questions, we make use of the System Usability Scale (SUS), which is a standardised tool widely used for measuring usability. We provide the details below. 6.5.1 Participants We wanted to test the technology as part of a pilot, and so we needed participants who could reveal weaknesses in our design. The aim was to maximise usability before introducing DigiTally in a larger-scale field test with target users. As an initial trial to establish any major issues with the DigiTally system itself, we recruited Kenyan participants from the university. The participants were recruited following their registered interest in the study in response to an advertisement. Potential participants were interviewed with an aim to achieve diversity in terms of demographics, and to establish that participants would be able to give us detailed feedback to inform major re-designs before a field trial with a representative sample of rural users. 104 (a) Selecting DigiTally applet from the fea- ture phone’s application menu (b) DigiTally menu shown on a feature phone (c) Bob’s phone displays Code1 (d) Alice enters Code1 into her phone (e) Alice’s phone displays the response code (Code2), given to Bob to authorise the pay- ment (f) Bob’s phone displays the transaction log after accepting Code2 which finalises the transaction Figure 6.2: DigiTally user interface. 105 Start Select Receive Money Verify PIN Select contact A Enter amount X Generate Code1 C = MACK(A,B,NB, X, `) Code1 = NB ‖ C Input Code2 R = MACK(Code1, NA, A,B,X, `) Valid Code2? Tries remain? Increment balance Display logDisplay error End no yes no yes Figure 6.3: Recipient’s (Bob B) transaction steps (this flow chart illustrates the simplified protocol; see Section 6.4.2 which discusses the prototype implementation). 106 Start Select Send Money Verify PIN Select contact B Enter amount X Input Code1 C = MACK(A,B,NB, X, `) Valid Code1?Tries remain? Generate Code2 and decrement balance Display log R = MACK(Code1, NA, A,B,X, `) Code2 = NA ‖ R End Display error no yes no yes Figure 6.4: Payer’s (Alice A) transaction steps (this flow chart illustrates the simplified protocol; see Section 6.4.2 which discusses the prototype implementation). 107 Twelve students and seven merchant staff were recruited as participants to use DigiTally for five days. Merchants were recruited from the university’s cafeteria, bookshop, and a local coffee shop. According to Sauro [109], Vizri [118], and Nielsen and Landauer [97], very few participants are needed for early phase usability studies such as this, since adding more users tends to uncover the same issues, with “five” often referenced as the “magic number” of participants. Confidence intervals may be wide as a result, but the average SUS score should be stable [108]. For wider coverage, we included a greater variety of participants, which required larger numbers than the proposed “magic number 5”. This is in line with more cautious estimates of 10-20 participants for a usability test (e.g., Faulkner [57] and Macefield [80]). In total, we had 19 participants for this preliminary study. There were 7 female and 5 male students, who were studying a range of topics in different faculties, including finance, law, and information technology. As for the merchants, the cafeteria included 1 female and 2 male staff members; the bookshop had 1 female and 2 males; and the local coffee shop had 2 females (only one of whom was responsible for processing DigiTally transactions). Each merchant was given one feature phone to process DigiTally payments; staff members in the cafeteria and bookshop shared the device to process payments, while the coffee shop had one person operating the feature phone. Because we did not provide cash-in and cash-out services (to convert physical currency to DigiTally balances, and vice versa), participants were motivated to use DigiTally to spend the balance they had in their SIMs. 6.5.2 Evaluation materials Questionnaires and surveys are one of the most widely used methods for measuring attitudes [47]. They often involve asking participants to select one of a number of points on a rating (Likert) scale [111]. Good scales are valid (in that they represent the intended construct), and reliable (giving measurement consistency) [40, 111]. Standardised usability questionnaires and surveys are preferred because they are quantifiable, are economical to reuse, and allow for replicability of findings [111]. The most popular standardised survey for measuring attitudes towards usability is the SUS, favoured due to its brevity and being free to use [110]. The items in this survey factor into two sub-constructs of usability: system learnability (ease-of-use for new users), and system usability (defined as ease-of-use more generally) [110]. It was important to use feature phones for this study rather than the smartphones that many students normally use. Participants were thus given identical feature phones to test the basic usability of the technology first; a field test with representative users in rural Kenya who already use feature phones would be the next step. We asked participants to complete a short pen and paper survey, which included the 108 SUS, to make an initial assessment of DigiTally. Here, we asked participants for three free-text responses, to get insights into how they viewed DigiTally and where we might need to improve. These were: 1. What did you like about DigiTally? 2. What did you dislike about DigiTally? 3. Additional comments 6.5.3 Procedure Students and on-campus merchants at the university were invited to try DigiTally and to provide their opinions of their experience of the system. This involved using dedicated feature phones to transact using the DigiTally applet, which was installed on each phone’s SIM. The students’ phones were preloaded with a DigiTally balance of Ksh 2000 (about $19.50) that could be used to make transactions with merchant participants who accepted DigiTally. The merchants (cafeteria, bookshop, and a local coffee shop) were given phones that had a zero balance on their DigiTally applet. Before the trial, participants gave informed written consent and were shown how DigiTally works. After the trial, participants were asked if they were willing to complete a survey about their experience, which was entirely voluntary. This survey consisted of the SUS and the three open-ended questions outlined in Section 6.5.2 (Evaluation Materials). Each day towards the end of business hours, the researchers visited each merchant to check their balances and reimburse them, in cash, the amount they had in their DigiTally balance (after transferring the DigiTally balance to another phone, and thus resetting a merchant’s balance to zero at the time of reimbursement). Each merchant signed a form indicating the date and amount of reimbursement. The reimbursement forms were necessary to create a detailed accounting of the money for our funders and auditors. 6.5.4 Ethics, data collection, and privacy This study was approved by ethics committees at both universities involved in this research (one in the UK and the other in Kenya). To address the various concerns raised by the ethical review boards in both universities, we changed the design of the study to minimise any possible risks: 1. Financial risks: to address the risk of possible loss of personal funds, we provided the participants with a preloaded amount into their DigiTally applet (as explained in Section 6.5.3). We decided not to provide cash-in and cash-out services to eliminate the possibility that participants would use their own funds to increase their DigiTally balances. 109 2. Privacy risks: we were asked for assurance that the overlay SIM would not com- promise the privacy of participants by interfering with the participants’ regular SIMs. Although we initially intended to use an overlay SIM on top of a regular SIM, we decided instead to use the overlay as a regular SIM and provide users with dedicated devices. The reason for this decision is that proving that an overlay SIM is not compromising users’ privacy was out of the scope of this project. Note that the overlay SIM (and the DigiTally applet) does not compromise users’ privacy in our implementation, since it operates as an independent sandboxed applet and does not interfere with other applets’ data if there are any installed. Therefore, we consider the preliminary study to have no significant ethical implications. There were no known risks to participants (financial, psychological, emotional, or phys- ical). Participants were required to provide informed consent before starting the study. The informed consent form outlined what the study involved from the participant, how long it would take, our commitment to confidentiality, and their rights as a participant. Participants were reminded, verbally and in the consent form, that their participation was entirely voluntary, and that they may choose to not answer particular or all questions and may withdraw from the study at any point without having to provide a reason, and without fear of penalty from the researchers. This study did not involve deliberately deceiving participants in any way. This study also did not involve recruiting participants from vulnerable groups, such as children, patients, people with learning disabilities, people engaged in illegal activities, or people in custody. Participants were fully debriefed and their questions were answered throughout the study. Additionally, participants were given contact information for the lead researchers from both universities should they have any further enquiries. The data we used for our evaluation includes the survey results, as well as the perfor- mance and transaction data processed by the SIM. By collecting these data, we were able to analyse performance and usability issues as we discuss in Results (Section 6.6). We collected the following: 1. Error rates (code-entry errors, and wrong PIN input) 2. Number of transactions 3. Number of attempts to unlock the SIM 4. Total amounts for all transactions (spent and received) 5. Transaction duration times The SIM also stored the full log of the last transaction, which is overwritten when a 110 new transaction is successfully processed10. This log contains Code2 which can be retrieved after the transaction is completed. Recall that Code2 can be used only once and with the right recipient (the identity of the recipient is one of the inputs required to generate the code), but a user might want to retrieve Code2 in case their battery goes flat before exchanging the code with the recipient. The collected data were stored in the SIM card (a trusted tamper-resistant element). DigiTally does not leak any information outside this trusted element. Therefore, per- formance data are inaccessible without the correct authenticator. Only pre-configured devices, programmed to provide the correct authenticator, had access to the SIMs to obtain the performance measurements. Each SIM was programmed to lock itself after a certain number of failed attempts to access the contents of the trusted tamper-resistant environment. At the completion of the preliminary study, we retrieved the phones from all the participants to extract the performance data from the SIMs and save it for later analysis. All the stored data were encrypted after extraction. With the exception of participant names, no Personally Identifiable Information (PII) were collected in the study (such as date of birth, national ID numbers, etc.), as they were irrelevant to the scope of the study. We carefully explained to all participants what data were being collected, by demonstrating many full transactions on two phones and explaining the performance variables to be measured for the purposes of the preliminary study. 6.6 Results The following sections report on the actual error rates and transaction times, as well as the SUS results. We present descriptive statistics to help summarise the data and to show any emerging patterns. We make no claims about statistical significance, which would be reserved for inferential statistics. This would not be appropriate since the tested population do not represent the intended users of DigiTally, and were recruited in order to reveal any major usability issues with the technology itself. We thus make no claims about generalisability of the results. We also discuss the open-ended results and the lessons learned through the feedback obtained from the users. We discuss the features that participants considered to be the most important, and indicate issues that may need to be re-evaluated to improve usability. Note that there is a difference between the total number of transactions completed by students, and the total number of transactions completed by merchants. This is due to the fact that students moved funds between their phones, perhaps to test the system and 10This log is informational only, and is different from the cryptographic parameters securely stored on the SIM and used in the payment protocol as discussed in Section 6.4.2. Refer to Chapter 5 for more details. 111 No. of transactions PIN errors Code1 errors Code2 errors Total code errors Average time (seconds) S1 30 2 0 8 (26.7%) 8 (26.7%) 30.9 S2 28 3 9 (32.1%) 0 9 (32.1%) 24.4 S3 18 3 2 (11.1%) 8 (44.4%) 10 (55.6%) 28.1 S4 22 0 9 (40.9%) 1 (4.6%) 10 (45.5%) 44.9 S5 29 1 1 (3.5%) 0 1 (3.5%) 24.2 S6 26 1 0 1 (3.9%) 1 (3.9%) 54.3 S7 26 0 1 (3.9%) 0 1 (3.9%) 50.9 S8 28 0 5 (17.9%) 0 5 (17.9%) 32.4 S9 10 0 0 4 (40.0%) 4 (40.0%) 28.8 S10 29 1 0 0 0 37.1 S11 8 2 0 0 0 42.1 S12 22 0 5 (22.7%) 0 5 (22.7%) 38.9 Table 6.1: The frequency and types of errors made by student participants and the percentage of each participant’s transactions that their errors affected No. of transactions PIN errors Code1 errors Code2 errors Total code errors Average time (seconds) M1 58 2 0 4 (6.8%) 4 (6.8%) 40 M2 61 0 0 6 (9.8%) 6 (9.8%) 43.9 M3 58 2 0 4 (6.8%) 4 (6.8%) 69.8 Table 6.2: The frequency and types of errors made by merchant participants and the percentage of each participant’s transactions that their errors affected in some cases to demonstrate it to their friends; moving funds between devices back and forth did not incur any costs in terms of transaction fees. 6.6.1 Errors and speed Students completed an average of 23 transactions. The highest number of transactions completed by one participant was 30; the lowest was 8. For merchants, the average number of transactions was 59. The highest number of transactions completed by one merchant was 61; the lowest was 58. Tables 6.1 and 6.2 display a summary of the Error Rates and Time on Task, which should be considered against the number of transactions each participant successfully completed. 6.6.1.1 Error rates DigiTally captured the number of errors participants made while trying to complete the task. Student participants made the most errors when entering the first code (Code1) presented to them by the merchant. The number of errors here ranged from 0 to 9, with 7 of the 12 student participants making this error. These were non-critical errors since they do not prevent successful completion of the transaction. However, future trials will need to test the ease with which users can recover from this and other errors. For Code2, the number of errors ranged from 0 to 8, with 5 of the 12 student participants making this error. The merchant participants made the most errors when entering Code2 which is required to authorise and complete the transaction. The number of errors here ranged from 4 to 6. 112 Merchant M2’s Code2 errors were higher because they included errors experienced during training. We adjusted the number in the table to reflect the errors experienced during the study without those experienced during the training. Merchants made no Code1 errors; this type of error would occur if they used DigiTally to make payments instead of receiving (the merchants made a single daily payment using DigiTally for the reimbursement of transactions, see Section 6.5.3). We include PIN errors in Tables 6.1 and 6.2 for completeness. However, we do not consider entering PINs to be a major problem, at least no more than in any other system that uses them. For clarity, we report (in brackets) the proportion of errors relative to the number of transactions completed as a percentage (for non-zero values). 6.6.1.2 Time on task DigiTally recorded the time on task for each participant. The transaction timer starts when a menu option is selected (Send Money or Receive Money), and the timer is stopped before the transaction log is displayed to the user (the transaction log includes the transaction’s duration time). The average time for students to complete a transaction was 36.4 seconds. Their average completion times ranged from 24.2 seconds to 54.3 seconds. It is worth noting that a large number of errors by a participant did not necessarily translate into longer average time spent on transactions. For the merchant participants, the average time to complete a transaction was 51.23 seconds. Their average completion times ranged from 40 to 69.8 seconds. Merchants’ transaction times are larger since a merchant can start a transaction, give Code1 to the payer, then complete a few tasks to serve the customer until Code2 is entered. Such tasks include having to prepare food at the same time as processing DigiTally payments, which is the case for merchant M3. For merchants dedicated to processing payments (cashier roles), the transaction times are lower (M1 and M2). 6.6.2 SUS results The SUS is not diagnostic: it will not reveal specific problems, but it does give an idea about overall ease-of-use, and whether significant changes might be needed. To give a clear idea of the results, the SUS is calculated so as to provide a score out of 100. However, the SUS Score is not a percentage. A score of 68 actually falls at the 50th percentile (i.e., the average SUS score is 68). If the score is below 68, there are likely to be serious usability problems that need tackling. A score of 80.3 or higher is ideal. The average SUS score for DigiTally was 78.8, which is considered “Good”, and would be given a “B+” grade. The lowest score was 50, which is considered “Poor”, and is equivalent to a “F” grade. This score was given by a merchant and was the only score 113 considered “Poor” by SUS standards. The highest score was 100, which is the best possible, and is equivalent to an “A+” grade. Eight participants gave the equivalent of an “A+” grade. For merchant participants, the average SUS score was 71.4, which is considered “Good” and equivalent to a “C+” grade; for student participants, the average SUS score was 83.1, which falls just short of being considered “Excellent”, and is equivalent to an “A” grade. Although the SUS is intended to be a measure of ease-of-use, Lewis and Sauro argue that it can also be used as a measure of learnability (using items 4 and 10 of the SUS) [79]. As with calculating the SUS score, learnability can be calculated to give a score ranging from 0-100. The average learnability for this initial trial was 82.9, which falls just short of “Excellent” and would be given an “A” grade. The remaining 8 items are what Lewis and Sauro call a measure of usability [79]. For the current initial trial, this score was 77.8, which is also considered “Good” and would be given a “B+” grade. 6.6.3 Responses to open-ended questions Participants’ answers to the three open-ended questions were categorised into several themes. This qualitative data analysis involved two researchers independently coding user comments to identify common themes, supported by quotations. These researchers then came together to assess agreement and categorised themes based on a well-established definition of usability (ISO-9241), which consists of effectiveness (usefulness), efficiency (ease-of-use), and satisfaction. 6.6.3.1 Perceived usefulness 1. Money saving. Most participants mentioned the benefit of there being no transac- tion fees. They pointed out that this would be useful for those in poor communities, as well as being attractive to price-sensitive customers and merchants. Aside from helping users avoid transaction fees, one participant mentioned that it also meant they did not have to use a smartphone, making DigiTally even more cost-friendly. 2. Network independent transactions (interoperability). Many mentioned the benefit of not having to rely on network coverage. For some, this made the DigiTally transaction process seem more reliable. It also makes it more predictable as users do not have to wait for a confirmation SMS, which with M-Pesa can take up to 30 seconds. 3. Security. The general consensus was that DigiTally seemed very secure. The codes were a major factor behind this perception. While recognising this benefit, a few participants suggested that the codes were too long, and recommended that the developers consider shortening them. 114 4. Money tracking. Participants liked being able to review their last transaction and balance. For some, however, this was not enough. It was suggested that this feature would be more useful if the user could review all previous transactions. 6.6.3.2 Perceived ease-of-use 1. Ease. Participants perceived DigiTally to be simple and easy to use. Some clarified that DigiTally was easy to use after a learning period. One merchant described the process as cumbersome; this merchant was both serving customers and taking payments. Other merchant participants were either less busy or were dedicated cashiers. One participant also stated that DigiTally might be harder to use for the elderly and those with poor eyesight. 2. Learnability. DigiTally was most often praised for its learnability; in general, participants felt like DigiTally was easy for a first-time user to understand. Even the busy merchant (M3) who had complained that the system was cumbersome said she had no difficulty training a staff member to use it. The same merchant also warned that learnability might be lower for some, including the target population, where there might be more illiteracy and less education more generally. Another merchant was curious to know how money would be deposited in a production DigiTally system (this merchant is already an M-Pesa agent). 3. Speed. Participants found transacting with DigiTally to be relatively fast, once they knew the process. Two out of the eight merchants said that DigiTally had too many steps and was time-consuming compared to cash or M-Pesa. Requiring as much effort from the merchant as from the customer to give and receive codes was problematic when the merchant had a lot of customers and needed to perform other tasks at the same time. One merchant also did not like having to search for the customer in their contacts list. One participant pointed out that the speed of transaction would be less of an issue in rural areas. 4. Errors and recoverability. One participant stated that it is hard to make errors that would cause the user to lose money, and that they found errors easy to rectify. Many others did not agree. In general, error reset was considered too difficult. Although the process for error recovery was perceived as cumbersome, none of the users had to go through it during the trial. See Section 6.7.2 for more details. 5. Cashlessness. Two participants noted that by using DigiTally they would not need to carry cash around. This was considered convenient. It was also noted that merchants do not have to find the exact change when dealing with customers using 115 DigiTally. This benefit is shared with other electronic payment systems; Kenyans already like M-Pesa as it dispenses with the risk and the inconvenience of cash. 6. Codes. Although many did not have much of a problem with the codes, finding them short enough and necessary for security, some stated that they did not like them, mostly because the code or the process was too long or awkward. Having to exchange and input codes was also perceived as an opportunity to make errors. On the other hand, one of DigiTally’s perceived advantages was the deterministic nature of the transactions, because exchanging the codes provided immediate feedback that a transaction was completed, without needing to wait for an SMS. 7. Distance. Sending money to a remote payee was identified as a potential problem. It was pointed out that errors could be more likely and recovering from them would be harder if users were trying to complete a transaction at a distance. 6.6.3.3 Satisfaction 1. Likeability. Participants liked using DigiTally, using words like “happy”, “good”, “best”, “seamless”, “enjoy”, “enthusiasm”, “encouraging”, “smart”, and “satisfied” to describe their experience. Comments suggest that likability could be improved by targeting issues associated with the length of the codes and with recovering from errors. 2. Other services. In addition to a merchant who wanted to know how to do cash-in and cash-out transactions, one student participant also indicated that they would like to be able to do this using DigiTally. We did not provide cash-in and cash-out services during the trial. 6.7 Observations We describe in the following sections our observations during the preliminary study. These observations were documented during regular visits to merchants to answer their questions, and during daily visits to reimburse merchants for their DigiTally transactions (as discussed in Section 6.5.3). We highlight two important observations, visual cues and the error-recovery process, and discuss the lessons learned. 116 Figure 6.5: The cafeteria cashier (recipient) using the phone on the left, while a student participant (payer) displays Code2 through the window to authorise the payment Figure 6.6: Participants displaying the back of their phones (yellow DigiTally label shown on the back) 117 6.7.1 Visual cues We observed that participants chose to display the codes so that the other participant could see them and enter them into their own device. None of the participants indicated that they exchanged the codes verbally, and some stressed that visual exchange is easier. In crowded areas, a verbal exchange of the codes could lead to misheard digits. The cafeteria is a crowded environment with long lines of customers, where a window separates the cashier and the customer, and a small opening at the bottom of the window allows exchanging cash or passing cards (Figure 6.5). In this environment, we observed multiple DigiTally transactions where the only verbal exchange was to acknowledge that a transaction was completed by saying “it’s OK ”, and in some cases just a nod. When we mentioned this observation, most participants agreed that “it’s much easier”, and gave the example of the cafeteria. In general, once participants had some experience with the system, we noticed that they would always show the code and never speak it even in quiet areas. In the transaction flow people developed, visual cues were used to complete each major step, starting with the customer declaring that they want to use DigiTally by showing a yellow label on the back of the phone that identifies the participant by name (Figure 6.6). However, we should not neglect the fact that some users in the target population for financial inclusion might be visually impaired. 6.7.2 Error recovery We designed error recovery to prompt the user for three dummy inputs (any sequence of digits) to ensure that they were fully aware that they were indeed resetting the saved transaction data; in our implementation, a wrong reset could lead to inconsistent states on the SIMs involved. Inconsistent states currently require a hard reset of the relationship between the two users by deleting the contact information and creating a new contact on each phone. Before each prompt for Code2, the recipient is shown a confirmation of the transaction details. After the dummy inputs are provided, the session is reset by discarding Code1 as well as any intermediate results that were saved for validating Code2. Now the recipient (merchant) can start a fresh transaction: the next time a transaction is initiated between the same two parties, a new Code1 is generated. Even with default-text entry enabled in the applet to fill in the previously entered code, participants requested an easier way to rectify a mistake. As error recovery was performed only by one student participant during the actual trial, most were likely evaluating the training they received rather than anything they actually did. Two users needed to reset a transaction during training. One participant commented: “For the error issue: I think you should make it simpler to correct the error 118 and provide instructions on how to correct it once the error is made since you will not be around to show them. For example, when an error of different amounts that needs one to put code 1 three times, I think you should write ‘repeat code 1 three times to correct the error’.” The other participant agreed: “If there’s a simpler way of resetting a wrongly transacted code DigiTally will be better.” None of the other participants, including the merchants, had to use this error-recovery method, likely due to the training component instructing users to agree on the amount before proceeding with the transaction. We emphasised the importance of this step; perhaps the burden of going through the error recovery process helped motivate careful exchange of the codes. Our prototype error-recovery mechanism needs a redesign. A simple alternative would be a menu option to reset transaction data, requiring authentication with the user’s PIN rather than requiring them to enter dummy inputs. 6.8 Discussion This chapter reports the results of a preliminary study with an early prototype of DigiTally, tested on participants from a university in Nairobi. We described how DigiTally involves sharing two 8-digit codes to protect users from unauthorised payments while at the same time allowing them freedom from reliance on network coverage. This has an advantage that the payment service operator has no marginal transaction costs and can offer zero fees for some transactions. However, the code-sharing process has always presented a possible usability challenge. To test this and other aspects of this early prototype, DigiTally was introduced to Kenyan participants in order to identify and resolve any major usability issues before a later field trial with a more representative sample of service users. The errors made in sharing codes suggest a need to make recovery from this type of error more intuitive. Nevertheless, most participants completed the task with few code-sharing errors. The average speed for every student was less than a minute, and overall the average transaction speed was close to half a minute. Given our observations of participants and the comments they made, it seems that the process, overall, seemed straightforward and fast. The SUS scale gives insight into the extent to which DigiTally’s usability might inhibit its use. One of the lessons learned was the need to demonstrate clearly to users: (a) how errors can be avoided; and (b) how the codes prevent cheating. Observations with initial trial users and open-ended answers following the trial indicate that users’ main fear is the difficulty of recovery from errors. This is already an issue for M-Pesa when people send payments to the wrong phone number (it is a well-known problem, and one of the project researchers experienced it first-hand). Recoverability is especially important in the initial trials and adoption phase of a new system because expert help will not be available in remote villages; users’ ability to figure out how to 119 recover from mistakes may well be the difference between adopting and rejecting DigiTally. Higher SUS scores tend to predict loyalty and word-of-mouth recommendations [109]. Users with scores over 80 (cf. the average score of 83.1 for student participants) are called “Promoters” because they are more likely to recommend a system, while users with a score below 60, called “Detractors”, are more likely to say negative things about a system. DigiTally is in a position to create “Promoters”, especially for student participants. There are various things we can do to make DigiTally more usable for busy merchants. For example, we can give merchants a smartphone app that reads the customer’s phone number from a QR code on a sticker on their phone, display Code1, and read Code2 from their phone screen. This way the merchant does no more work than with M-Pesa. Two of the merchants preferred DigiTally to M-Pesa because of speed; DigiTally does not force a cashier to wait for up to half a minute while a payment confirmation makes its way through Nairobi’s congested mobile network. DigiTally takes more keystrokes, but the outcome is then immediate. The fact that DigiTally has zero marginal costs also means that it can be offered as a zero-fee payment mechanism between friends and family, like personal cheques in the UK. Loan clubs, such as Rotating Savings and Credit Associations (ROSCAs) or savings clubs, and money-guards (used to enforce savings through the commitment of funds), whether formal or informal, are an important part of the financial ecosystem in many less developed countries. See Collins et al. [36] and Banerjee and Duflo [15] for more information about financial inclusion and the financial tools used by the poorest demographics (living on less than $2 a day). Given our relatively high SUS, learnability, and usability scores, the challenge is encouraging first time use. The factor most apparently affecting first time use, based on free-text responses, was the 8-digit codes. The most frequent suggestion was to remove or reduce them. Although many participants were familiar with M-Pesa, and thus sharing an 11-digit phone number, this is potentially less time-consuming and error-prone because it is the same number every time for the recipient (their own phone number) and a single time entry for the sender. It also requires no input from the recipient, so a merchant can focus on other tasks. Participants perceived DigiTally to be secure, and the codes were the reason behind this perception. As mentioned in Section 6.6.3.1 item 3, participants asked if these codes can be shortened. However, this would not be possible without compromising the security of the system (as discussed in Chapter 5). A few participants also requested a better money tracking tool: some participants liked that statistics about their transactions were available (e.g., amounts sent and received as explained in Section 6.5.4), and some users requested the ability to view more than just the last transaction (as discussed in Section 6.6.3.1 item 4). This is not a critical issue, as we can engineer the system to include more 120 transactions as allowed by hardware constraints and storage capacity. Because migrant remittances are a key application of mobile payment systems (such as M-Pesa), some of our participants discussed the possibilities of remote transactions (see Section 6.6.3.1 item 4). We designed DigiTally primarily for face-to-face offline transactions and not for remote transactions that would require a medium to exchange the codes. However, if such a medium exists, then DigiTally codes can be exchanged online to process remote transactions. Codes can be exchanged using SMS or over the phone; they could even be exchanged by post. DigiTally can be viewed as a platform to do the cryptographic operations required to process payments, on users’ phones rather than on a centralised server, and systems can be built on top of DigiTally to perform remote transactions – relying on the DigiTally SIM to do the cryptography. DigiTally was praised for being quick, easy to use, and easy to learn. Kenyans are already familiar with M-Pesa, which DigiTally was designed to mimic; that may have helped directly, while other factors our participants liked, such as being able to review transactions, may have been features that they would like to see in M-Pesa. Therefore, the results in this chapter do not measure how DigiTally would be perceived by users who have never used any mobile payment system. But M-Pesa, like most mobile payment systems, cannot work offline, and therefore fails to provide service to the poorest communities. Just as M-Pesa appealed most strongly to people with phones but no payment cards, so also DigiTally should appeal most strongly to people with no network service at all. This was confirmed when we made two field trips to scout possible sites for a second- round field trial. When we visited a small town near Nairobi with good network coverage, stakeholders were interested in playing with the system, but saw its main benefit as being potentially programmable so that it could support their specific applications. When we visited Busia, a rural community near Lake Victoria with very poor network coverage, stakeholders were delighted, and played with it for hours. They considered DigiTally to be just what they needed to solve their problems with network coverage and reduce transaction costs. 6.9 Limitations In this section, we discuss various limitations and possible future work for offline payments in general and DigiTally in particular11. 1. Loss of device: loss of the device, and the SIM it includes, may lead to loss of funds. The dispute resolution and consumer protection mechanism we envision 11This section is not included in our paper [20]. The content is relevant to a large-scale deployment, and some of the issues discussed were raised after project completion. 121 for DigiTally includes refunding the customer with the last known balance, after processing received transactions from other customers if any are available (that can effect the last known balance, and provide proof that the customer’s balance must be either increased or decreased based on recent activity). However, this may create an opportunity for fraudulent claims which we discuss below. 2. Fraud: a user can claim loss of a device to claim back funds even though those funds were already spent. However, we envision that this kind of fraud can be rate-limited; a simple verification to check how many times a user claims loss of a device can reject the user’s application for excessive claims and terminate the user’s account (prevent the user from registering as a client in the future to prevent fraudulent claims; this can be an effective deterrent). Moreover, social factors may impose self-regulation to avoid negative reputation within a community. 3. Fragility of the overlay SIM: the adhesive on the overlay SIM we used (as provided by Taisys) can wear off if the overlay is regularly removed and reapplied in the device. However, the overlay SIM is 1) used solely for prototyping, and a production system would likely embed the DigiTally code in a regular SIM, which requires the cooperation of the SIM provider, and 2) as we designed the prototype, we envisioned that the overlay SIM would be applied correctly once and there is no reason to remove and reapply the overlay SIM. We mention the fragility of the overlay SIM as a possible limitation for the following reason: during our research visit to Kenya, we were informed that the leading MNO instructs their customers to remove the overlay SIM, provided by the MNO’s competitors, if that customer wants to proceed with in-shop customer service. If the customer does not remove the overlay, then they are denied service. This creates a venue for the leading MNO to undermine its competitors by forcing customers to comply (and would ultimately lead to customers choosing not to utilise the services of the MNO’s competitors). However, we stress that this problem is vendor-specific: Taisys, or another vendor, may introduce to the market a more robust overlay SIM that can be easily and regularly removed and reapplied. 6.10 Conclusion We designed and developed an early prototype of DigiTally, an offline phone payment system, and tested it on Kenyan participants in a preliminary study. In addition to error rates, transaction speed data, and SUS scores, we reported on supporting data from demonstration sessions (observed behaviours and comments), and free-text written responses at the end of the study. Our results indicate that participants found DigiTally 122 easy to use and that they liked key aspects of the system, including its perceived security and that it did not require network coverage to process payments. We demonstrated that DigiTally can be used for making payments without network coverage or transaction fees. Some specific technical improvements are needed, most notably the process of recovery from errors. We discovered that while DigiTally is slightly less convenient to use than existing mobile payment systems, the added burden is not excessive; people in areas with poor network coverage are eager to use it. Furthermore, the burden mostly falls on merchants, and busy merchants are likely to have enough money to buy better terminals. There is a realistic prospect of developing DigiTally into a workable system that will extend mobile payments to the millions of people who are currently excluded from the world of electronic payments. However, this was not an option in Kenya because of a change in the business environment. Just after we completed the field trial, Airtel decided to minimise their mobile payment services in Kenya turning the M-Pesa operator, Safaricom, into the dominant player; and Equity Bank, with whom we had had discussions, stopped issuing overlay SIM cards to its customers. That left Safaricom as the only feasible deployment route, and they were not interested. In the absence of competition, they say their way forward is in marketing gambling apps to existing customers rather than in extending service to the rural poor. 123 124 Chapter 7 Conclusion The core theme of our research was creating more resilient payment systems. To this end, we investigated several issues related to security engineering in general, and payment systems in particular. We highlighted resilience features in existing payment systems that provide network and system resilience features (through offline capabilities and dedicated networks), usability features (e.g., familiarity and simplicity), dedicated tamper-resistant hardware (to prevent loss of critical data and tampering with the protocol implementation), censorship-resistance features included in decentralised systems, low barriers of entry that can lower the friction required for participation, and decentralised transaction processing to alleviate the performance bottleneck resulting from central transaction processing. We highlighted the lack of trust and reputation metrics in open-membership systems; the lack of such metrics exposes such systems to various network attacks. We investigated how using a payment system, with a pseudo-currency, creates useful metrics for building trust and reputation systems. We aimed to incentivise correct behaviour, to generate metrics to identify which nodes are most interconnected, and empower nodes to shift trust to other nodes. In other words, we facilitate the mechanisms required to democratise trust and power, by empowering participating nodes to vote for nodes that deserve their trust. We consider this a key building block to enable cooperation in open-membership systems that cannot prevent malicious activity through centralised operators. We reviewed performance bottlenecks in Bitcoin that can cripple system performance if the network is exposed to spam campaigns. Our goal was to understand performance and resilience in cryptocurrencies. Although these are decentralised by design (at least theoretically), there are failure points that require more research. We were the first to highlight how a spam campaign on Bitcoin could degrade system performance leading to higher processing delays at the cost of a fairly modest sum in transaction fees. A large amount of spam can increase the backlog of unconfirmed transactions, leading to delays. To cope, senders pay higher fees, and so does the attacker – but the attacker also leverages the higher fees paid by others. We found that over 20% of Bitcoin transactions were 125 spam during the 10-day period at the peak of the campaign, which increased delays in processing non-spam transactions from 0.33 to 2.67 hours on average. The attack cost about $49,000 to mount. The stated motivation behind the stress test campaign was to provide a justification for raising the Bitcoin block size limit. This was achieved in due course with a Bitcoin fork. Adversaries that are richer and more determined might disrupt the network more, and for longer than 10 days. The experience of this attack gives some idea what might be expected. This leads to the unfortunate result that Bitcoin is losing its initial appeal for low-value transactions and micropayments, since transaction fees are currently higher and less competitive compared to traditional, and more familiar, payments systems. In addition, Bitcoin uses substantial amounts of energy, yet alternatives for decentralised consensus are unable to completely prevent attacks. We are not convinced that it is the way forward for low-value payments. We therefore built on our experience to create an offline payment system, DigiTally, to extend financial inclusion efforts in developing countries. Offline payments are already used for electricity (both in prepayment meters and pay-as-you-go solar), we aimed to extend them into a general-purpose payment system, to increase service resilience in the face of network congestion or outage, and provide service to currently excluded areas. The goal of DigiTally was to tackle the network coverage and reliability problems by processing payments offline reliably and deterministically. We aimed to create these capabilities while simplifying the user interface and prioritising for usability, simplicity and speed, in order to avoid usability problems that can be a hindrance for uptake. We analysed and discussed the offline payment protocol we created for DigiTally. The main motivation for using short message authentication protocols was to construct a lightweight protocol that can be used offline to finalise payments, while prioritising for resource-constrained devices and maintaining an acceptable level of usability. We discussed the limitation of verifying such protocols using BAN logic. We stress that protocol designers need to be good curators of security state, and also pay attention to the interaction between online and offline functions. We designed the protocol to preserve the integrity of transactions, by creating short authentication codes that can be exchanged between participants to get better tradeoffs between security and usability; our aim was to develop usable human-friendly protocols that can be used in constrained offline environments. We aimed to maximise usability in off-network transactions by minimising the number of digits a user has to speak, hear and type, while providing robust recovery mechanisms for the inevitable errors and making sure there isn’t any scalable attack that is large enough to care about. We then developed a prototype system, which lets users make offline payments by copying short strings of digits from one mobile handset to another. DigiTally involves sharing two 8-digit codes to protect users from unauthorised payments while freeing them 126 from reliance on network coverage. The payment service operator will typically have no marginal transaction costs and can offer zero fees for some transactions. One of DigiTally’s perceived advantages was the deterministic nature of the transactions; exchanging the codes provided immediate feedback that a transaction was completed, without needing to wait for an SMS. In addition to error rates, transaction speed data, and SUS scores, we reported on supporting data from demonstration sessions (observed behaviours and comments) and free-text written responses at the end of the study. We discussed the lessons learned from our field visits and initial usability evaluation conducted in Nairobi, Kenya. We found that, although offline payments involve copying codes in both directions between the payer’s phone and the payee’s, the extra workload was acceptable to most users. More research is needed to integrate online payments, offline payments, and delay- tolerant networking in general. The value of this research is not just to extend and enhance payment systems, but also all sorts of other electronically enabled services, from pay-as-you-go solar panels to agricultural subsidies and payments from aid donors. We hope that this thesis is helpful for researchers and policy makers interested in creating resilient payment systems, and that our research and fieldwork can inform and assist future financial inclusion efforts. 127 128 Bibliography [1] Alessandro Acquisti, Roger Dingledine, and Paul Syverson. On the economics of anonymity. In International Conference on Financial Cryptography and Data Security, pages 84–102. Springer, 2003. [2] Carlisle Adams. Have money, will travel: A brief survey of the mobile payments landscape. Office of the Privacy Commissioner of Canada, 2013. [3] Ross Anderson. UEPS–a second generation electronic wallet. In European Symposium on Research in Computer Security, pages 411–418. Springer, 1992. [4] Ross Anderson. Why cryptosystems fail. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), pages 215–227. ACM, 1993. [5] Ross Anderson. Security Engineering – A guide to building dependable distributed systems. John Wiley & Sons, 2008. [6] Ross Anderson. Risk and privacy implications of consumer payment innovation. In “Consumer Payment Innovation in the Connected Age”, pages 29–30, 2012. [7] Ross Anderson and Khaled Baqer. Reconciling multiple objectives–politics or markets? In International Workshop on Security Protocols, pages 144–156. Springer, 2017. [8] Ross Anderson and Johann Bezuidenhout. Cryptographic credit control in pre- payment metering systems. In IEEE Symposium on Security and Privacy (SP), pages 15–23. IEEE, 1995. [9] Ross Anderson, Charalampos Manifavas, and Chris Sutherland. NetCard–A practical electronic-cash system. In International Workshop on Security Protocols, pages 49–57. Springer, 1997. [10] Elli Androulaki, Mariana Raykova, Shreyas Srivatsan, Angelos Stavrou, and Steven M. Bellovin. PAR: Payment for anonymous routing. In Privacy Enhancing Technologies Symposium, pages 219–236. Springer, 2008. 129 [11] Elli Androulaki, Ghassan O. Karame, Marc Roeschlin, Tobias Scherer, and Srdjan Capkun. Evaluating user privacy in Bitcoin. In International Conference on Financial Cryptography and Data Security, pages 34–51. Springer, 2013. [12] Maria Apostolaki, Aviv Zohar, and Laurent Vanbever. Hijacking Bitcoin: Routing attacks on cryptocurrencies. In IEEE Symposium on Security and Privacy (SP), pages 375–392. IEEE, 2017. [13] Moshe Babaioff, Shahar Dobzinski, Sigal Oren, and Aviv Zohar. On Bitcoin and red balloons. In Proceedings of the 13th ACM conference on electronic commerce, pages 56–73. ACM, 2012. [14] Adam Back. Hashcash–A denial of service counter-measure, 2002. [15] Abhijit Banerjee and Esther Duflo. Poor economics: A radical rethinking of the way to fight global poverty. Public Affairs, 2011. [16] Khaled Baqer and Ross Anderson. Do you believe in Tinker Bell? The social externalities of trust. In International Workshop on Security Protocols, pages 224– 236. Springer, 2015. [17] Khaled Baqer and Ron Steinfeld. Distributed payment systems. 2014. [18] Khaled Baqer, Johann Bezuidenhoudt, Ross Anderson, and Markus Kuhn. SMAPs: Short Message Authentication Protocols. In International Workshop on Security Protocols, pages 119–132. Springer, 2016. [19] Khaled Baqer, Danny Yuxing Huang, Damon McCoy, and Nicholas Weaver. Stressing out: Bitcoin “stress testing”. In Third Workshop on Bitcoin and Blockchain Research, affiliated with the International Conference on Financial Cryptography and Data Security, pages 3–18. Springer, 2016. [20] Khaled Baqer, Ross Anderson, Jeunese Adrienne Payne, Lorna Mutegi, and Joseph Sevilla. Digitally: Piloting offline payments for phones. In Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017), pages 131–143, Santa Clara, CA, 2017. USENIX Association. ISBN 978-1-931971-39-3. URL https://www.usenix. org/conference/soups2017/technical-sessions/presentation/baqer. [21] Simon Barber, Xavier Boyen, Elaine Shi, and Ersin Uzun. Bitter to better–how to make Bitcoin a better currency. In International Conference on Financial Cryptog- raphy and Data Security, pages 399–414. Springer, 2012. 130 [22] Jo¨rg Becker, Dominic Breuker, Tobias Heide, Justus Holler, Hans Peter Rauer, and Rainer Bo¨hme. Can we afford integrity by proof-of-work? Scenarios inspired by the Bitcoin currency. In The Economics of Information Security and Privacy, pages 135–156. Springer, 2013. [23] Alex Biryukov and Ivan Pustogarov. Proof-of-work as anonymous micropayment: Rewarding a Tor relay. In International Conference on Financial Cryptography and Data Security, pages 445–455. Springer, 2015. [24] John Black and Phillip Rogaway. Ciphers with arbitrary finite domains. In Cryptog- raphers’ Track at the RSA Conference, pages 114–130. Springer, 2002. [25] Bruno Blanchet. CryptoVerif: Computationally sound mechanized prover for cryp- tographic protocols. In Dagstuhl seminar “Formal Protocol Verification Applied”, page 117, 2007. [26] Mike Bond and Ross Anderson. API-level attacks on embedded systems. IEEE Computer, 34(10):67–75, 2001. [27] Mike Bond, Omar Choudary, Steven J. Murdoch, Sergei Skorobogatov, and Ross Anderson. Chip and Skim: cloning EMV cards with the pre-play attack. IEEE Symposium on Security and Privacy (SP), 2014. [28] Joseph Bonneau, Andrew Miller, Jeremy Clark, Arvind Narayanan, Joshua A. Kroll, and Edward W. Felten. SoK: Research perspectives and challenges for Bitcoin and cryptocurrencies. IEEE Symposium on Security and Privacy (SP), 2015. [29] Michael Burrows, Martin Abadi, and Roger M. Needham. A logic of authentication. Proceedings of Royal Society, pages 233–271, 1989. [30] Vitalik Buterin. Selfish mining: A 25% attack against the Bit- coin network. 2013. URL https://bitcoinmagazine.com/articles/ selfish-mining-a-25-attack-against-the-bitcoin-network-1383578440/. [31] Miles Carlsten, Harry Kalodner, S. Matthew Weinberg, and Arvind Narayanan. On the instability of Bitcoin without the block reward. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 154–167. ACM, 2016. [32] Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999. [33] David Chaum. Blind signatures for untraceable payments. In Advances in cryptology, pages 199–203. Springer, 1983. 131 [34] David Chaum, Amos Fiat, and Moni Naor. Untraceable electronic cash. In Pro- ceedings on Advances in cryptology, pages 319–327. Springer-Verlag New York, Inc., 1990. [35] Yao Chen, Radu Sion, and Bogdan Carbunar. XPay: Practical anonymous payments for Tor routing and other networked services. In Proceedings of the 8th ACM workshop on Privacy in the electronic society, pages 41–50. ACM, 2009. [36] Daryl Collins, Jonathan Morduch, Stuart Rutherford, and Orlanda Ruthven. Port- folios of the poor: how the world’s poor live on $2 a day. Princeton University Press, 2009. [37] Kyle Croman, Christian Decker, Ittay Eyal, Adem Efe Gencer, Ari Juels, Ahmed Kosba, Andrew Miller, Prateek Saxena, Elaine Shi, Emin Gu¨n Sirer, Dawn Song, and Roger Wattenhofer. On scaling decentralized blockchains. In International Conference on Financial Cryptography and Data Security, pages 106–125. Springer, 2016. [38] Wei Dai. B-money. 1998. URL http://www.weidai.com/bmoney.txt. [39] Christian Decker and Roger Wattenhofer. Information propagation in the Bitcoin network. In IEEE Thirteenth International Conference on Peer-to-Peer Computing (P2P), pages 1–10, 2013. [40] Jamie DeCoster. Scale construction notes. Department of Psychology Uni- versity of Alabama, 2000. URL https://www.researchgate.net/profile/ Jamie_Decoster2/publication/264874105_Scale_Construction_Notes/links/ 540f0f740cf2f2b29a3dccbb.pdf. [41] Roger Dingledine and Paul Syverson. Reliable MIX cascade networks through reputation. In International Conference on Financial Cryptography and Data Security, pages 253–268. Springer, 2002. [42] Roger Dingledine and Dan S. Wallach. Building incentives into Tor. In International Conference on Financial Cryptography and Data Security, pages 238–256. Springer, 2010. [43] Roger Dingledine, Michael J. Freedman, David Hopwood, and David Molnar. A reputation system to increase mix-net reliability. In International Workshop on Information Hiding, pages 126–141. Springer, 2001. [44] Roger Dingledine, Michael J. Freedman, and David Molnar. The free haven project: Distributed anonymous storage service. In Designing Privacy Enhancing Technologies, pages 67–95. Springer, 2001. 132 [45] Roger Dingledine, Nick Mathewson, and Paul Syverson. Reputation in P2P anonymity systems. In Workshop on economics of peer-to-peer systems, volume 92, 2003. [46] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The second-generation onion router. Technical report, Naval Research Lab Washington DC, 2004. [47] Ronald G. Downey and Craig V. King. Missing data in Likert ratings: A comparison of replacement methods. The Journal of general psychology, 125(2):175–191, 1998. [48] Saar Drimer and Steven J. Murdoch. Keep your enemies close: Distance bound- ing against smartcard relay attacks. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07, pages 7:1–7:16, Berkeley, CA, USA, 2007. USENIX Association. ISBN 111-333-5555-77-9. URL http: //dl.acm.org/citation.cfm?id=1362903.1362910. [49] Pascaline Dupas, Sarah Green, Anthony Keats, and Jonathan Robinson. Challenges in banking the rural poor: Evidence from Kenya’s western province. Technical report, National Bureau of Economic Research, 2012. [50] Cynthia Dwork and Moni Naor. Pricing via processing or combatting junk mail. In Advances in Cryptology, pages 139–147. Springer, 1993. [51] Martin Emms, Budi Arief, Leo Freitas, Joseph Hannon, and Aad van Moorsel. Harvesting high value foreign currency transactions from EMV contactless credit cards without the PIN. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 716–726. ACM, 2014. [52] EMVCo. EMV – Integrated Circuit Card Specification for Payment Systems, Book 1: Application Independent ICC to Terminal Interface Requirements, 4.3 edition, 2011. [53] EMVCo. EMV – Integrated Circuit Card Specification for Payment Systems, Book 2: Security and Key Management, 4.3 edition, 2011. [54] EMVCo. EMV – Integrated Circuit Card Specification for Payment Systems, Book 3: Application Specification, 4.3 edition, 2011. [55] EMVCo. EMV – Integrated Circuit Card Specification for Payment Systems, Book 4: Cardholder, Attendant, and Acquirer Interface Requirements, 4.3 edition, 2011. [56] Ittay Eyal and Emin Gu¨n Sirer. Majority is not enough: Bitcoin mining is vulnerable. In International Conference on Financial Cryptography and Data Security, pages 436–454. Springer, 2014. 133 [57] Laura Faulkner. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments, & Computers, 35(3): 379–383, 2003. [58] Mainak Ghosh, Miles Richardson, Bryan Ford, and Rob Jansen. A TorPath to TorCoin: proof-of-bandwidth altcoins for compensating relays. In Privacy Enhancing Technologies Symposium, 2014. [59] Steve Glassman. The Millicent protocol for inexpensive electronic commerce. 1995. URL http://www.research.digital.com/SRC/millicent. [60] Steven Goldfeder, Harry Kalodner, Dillon Reisman, and Arvind Narayanan. When the cookie meets the blockchain: Privacy risks of web payments via cryptocurrencies. Proceedings on Privacy Enhancing Technologies (PoPETs), 2018. [61] Laura Gray. Are you ready for 30 June 2018? saying goodbye to SSL/early TLS. 2017. URL https://blog.pcisecuritystandards.org/ are-you-ready-for-30-june-2018-sayin-goodbye-to-ssl-early-tls. [62] Ethan Heilman, Alison Kendler, Aviv Zohar, and Sharon Goldberg. Eclipse at- tacks on bitcoin’s peer-to-peer network. In 24th USENIX Security Symposium (USENIX Security 15), pages 129–144, Washington, D.C., 2015. USENIX Asso- ciation. ISBN 978-1-931971-232. URL https://www.usenix.org/conference/ usenixsecurity15/technical-sessions/presentation/heilman. [63] Danny Yuxing Huang. Using Crypto-currencies to Measure Financial Activities and Uncover Potential Identities of Actors Involved. PhD thesis, UC San Diego, 2017. URL https://escholarship.org/uc/item/6jk6w02v. [64] Nick Hughes and Susie Lonie. M-PESA: mobile money for the “unbanked” turning cellphones into 24-hour tellers in Kenya. Innovations: Technology, Governance, Globalization, 2(1-2):63–81, 2007. [65] GSMA Intelligence. The Mobile Economy–Africa 2016. London: GSMA, 2016. URL https://www.gsma.com/mobileeconomy/africa/. [66] William Jack and Tavneet Suri. Mobile money: The economics of M-PESA. Technical report, National Bureau of Economic Research, 2011. [67] Rob Jansen, Nicholas Hopper, and Yongdae Kim. Recruiting new Tor relays with BRAIDS. In Proceedings of the 17th ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 319–328. ACM, 2010. 134 [68] Rob Jansen, Nicholas Hopper, and Yongdae Kim. LIRA: Lightweight incentivized routing for anonymity. In Network and Distributed System Security Symposium (NDSS), 2013. [69] Rob Jansen, Andrew Miller, Paul Syverson, and Bryan Ford. From onions to shallots: Rewarding Tor relays with TEARS. In Privacy Enhancing Technologies Symposium, 2014. [70] Stanis law Jarecki and Andrew Odlyzko. An efficient micropayment system based on probabilistic polling. In International Conference on Financial Cryptography and Data Security, pages 173–191. Springer, 1997. [71] Ronald Kainda, Ivan Flechais, and A.W. Roscoe. Usability and security of out-of- band channels in secure device pairing protocols. In Fifth Symposium on Usable Privacy and Security (SOUPS 2009), page 11. ACM, 2009. [72] Ghassan O. Karame, Elli Androulaki, and Srdjan Capkun. Double-spending fast payments in Bitcoin. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), pages 906–917. ACM, 2012. [73] Jake Kendall, Johanna Godoy, Robert Tortora, and Jan Sonnenschein. Payments and money transfer behavior of Sub-Saharan Africans. 2012. [74] Joshua A. Kroll, Ian C. Davey, and Edward W. Felten. The economics of Bitcoin mining, or Bitcoin in the presence of adversaries. In Proceedings of WEIS, volume 2013, 2013. [75] Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), pages 133–169, 1998. [76] Ben Laurie. Decentralised currencies are probably impossible–but let’s at least make them efficient. 2011. URL https://www.links.org/files/ decentralised-currencies.pdf. [77] Ben Laurie. An efficient distributed currency. 2011. URL https://www.links.org/ files/distributed-currency.pdf. [78] Raph Levien. Attack-resistant trust metrics. In Computing with Social Trust, pages 121–132. Springer, 2009. [79] James R. Lewis and Jeff Sauro. The factor structure of the system usability scale. In International conference on human centered design, pages 94–103. Springer, 2009. 135 [80] Ritch Macefield. How to specify the participant group size for usability studies: a practitioner’s guide. Journal of Usability Studies, 5(1):34–45, 2009. [81] G. Maxwell. Merkle tree of open transactions for lite mode?, 2011. URL https: //bitcointalk.org/index.php?topic=21995.0. [82] Timothy May. The crypto anarchist manifesto. Effector On line, 1992. [83] Timothy May. The cyphernomicon: cypherpunks FAQ and more. Version 0.666, September, 1994. [84] Timothy May. Crypto anarchy and virtual communities. Crypto Anarchy, Cyberstates, and Pirate Utopias, pages 65–79, 2001. [85] Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. A fistful of Bitcoins: characterizing payments among men with no names. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC), pages 127–140. ACM, 2013. [86] Cynthia Merritt. Mobile money transfer services: the next phase in the evolution of person-to-person payments. Journal of Payments Strategy & Systems, 5(2):143–160, 2011. [87] Ian Miers, Christina Garman, Matthew Green, and Aviel D. Rubin. Zerocoin: Anonymous distributed e-cash from Bitcoin. In IEEE Symposium on Security and Privacy (SP), pages 397–411. IEEE, 2013. [88] Ulf Mo¨ller, Lance Cottrell, Peter Palfrader, and Len Sassaman. Mixmaster protocol– version 2. 2003. URL https://gnunet.org/mixmaster-spec. [89] Tyler Moore and Nicolas Christin. Beware the middleman: Empirical analysis of Bitcoin-exchange risk. In International Conference on Financial Cryptography and Data Security, pages 25–33. Springer, 2013. [90] Tim Moreton and Andrew Twigg. Trading in trust, tokens, and stamps. In Proceed- ings of the First Workshop on Economics of Peer-to-peer Systems, 2003. [91] Malte Mo¨ser and Rainer Bo¨hme. Trends, tips, tolls: A longitudinal study of Bitcoin transaction fees. In Second Workshop on Bitcoin Research, affiliated with the International Conference on Financial Cryptography and Data Security, Puerto Rico, 2015. [92] Malte Mo¨ser, Rainer Bo¨hme, and Dominic Breuker. Towards risk scoring of Bitcoin transactions. In International Conference on Financial Cryptography and Data Security, pages 16–32. Springer, 2014. 136 [93] Steven J. Murdoch, Saar Drimer, Ross Anderson, and Mike Bond. Chip and PIN is broken. In IEEE Symposium on Security and Privacy (SP), pages 433–446. IEEE, 2010. [94] Steven J. Murdoch, Mike Bond, and Ross Anderson. How certification systems fail: Lessons from the ware report. IEEE Security & Privacy, 10(6):40–44, 2012. [95] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. 2008. [96] Roger M. Needham and Michael D. Schroeder. Using encryption for authentication in large networks of computers. Communications of the ACM, 21(12):993–999, 1978. [97] Jakob Nielsen and Thomas K. Landauer. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT’93 and CHI’93 conference on Human factors in computing systems, pages 206–213. ACM, 1993. [98] Julius Okello. Effectiveness and challenges of using mobile money service in the implementation of the social assistance grants for empowerment programme in Uganda. IMTFI Final Report, 2015. [99] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999. [100] Jean Ray. Malpertuis, volume 142. Marabout, 1962. [101] Fergal Reid and Martin Harrigan. An analysis of anonymity in the Bitcoin system. Springer, 2013. [102] Ronald L. Rivest. Electronic lottery tickets as micropayments. In International Conference on Financial Cryptography and Data Security, pages 307–314. Springer, 1997. [103] Ronald L. Rivest and Butler Lampson. SDSI–a simple distributed security infras- tructure. Crypto, 1996. [104] Ronald L. Rivest and Adi Shamir. PayWord and MicroMint: Two simple micro- payment schemes. In International workshop on security protocols, pages 69–87. Springer, 1996. [105] Dorit Ron and Adi Shamir. Quantitative analysis of the full Bitcoin transaction graph. In International Conference on Financial Cryptography and Data Security, pages 6–24. Springer, 2013. 137 [106] Ayelet Sapirshtein, Yonatan Sompolinsky, and Aviv Zohar. Optimal selfish mining strategies in Bitcoin. In International Conference on Financial Cryptography and Data Security, pages 515–532. Springer, 2016. [107] Eli Ben Sasson, Alessandro Chiesa, Christina Garman, Matthew Green, Ian Miers, Eran Tromer, and Madars Virza. Zerocash: Decentralized anonymous payments from Bitcoin. In IEEE Symposium on Security and Privacy (SP), pages 459–474. IEEE, 2014. [108] Jeff Sauro. A practical guide to the system usability scale: Background, benchmarks & best practices. Measuring Usability, 2011. [109] Jeff Sauro. How to measure learnability. 2013. URL http://www.measuringu.com/ blog/measure-learnability.php. [110] Jeff Sauro and James R. Lewis. When designing usability questionnaires, does it hurt to be positive? In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 2215–2224. ACM, 2011. [111] Jeff Sauro and James R. Lewis. Quantifying the user experience: Practical statistics for user research. Morgan Kaufmann, 2016. [112] Clay Shirky. The case against micropayments. OpenP2P O’Reilly, 2000. [113] Clay Shirky. Fame vs fortune: Micropayments and free content. First published Sep, 5:2003, 2003. [114] Clay Shirky. Why small payments won’t save publishers. 2009. URL http://www. shirky.com/weblog/2009/02/why-small-payments-wont-save-publishers. [115] Manny Trillo. Stress test prepares VisaNet for the most wonderful time of the year, 2013. URL http://www.visa.com/blogarchives/us/2013/10/10/ stress-test-prepares-visanet-for-the-most-wonderful-time-of-the-year/ index.html. [116] Jordi van den Breekel, Diego A. Ortiz-Yepes, Erik Poll, and Joeri de Ruiter. EMV in a nutshell. Technical report, 2016. [117] Marie Vasek, Micah Thornton, and Tyler Moore. Empirical analysis of denial-of- service attacks in the Bitcoin ecosystem. In International Conference on Financial Cryptography and Data Security, pages 57–71. Springer, 2014. [118] Robert A. Virzi. Refining the test phase of usability evaluation: How many subjects is enough? Human factors, 34(4):457–468, 1992. 138 [119] Visa. Visa service disruption. 2018. URL https://www.visaeurope.com/newsroom/ news/visa-service-disruption. [120] N. Walubengo. The mobile money apocalypse; what would hap- pen to your money? 2012. URL http://pesatalk.com/ the-mobile-money-apocalypse-what-would-happen-to-your-money. [121] Qiyan Wang, Zi Lin, Nikita Borisov, and Nicholas Hopper. rBridge: User reputation based Tor bridge distribution with privacy preservation. In Network and Distributed System Security Symposium (NDSS), 2013. [122] David Wheeler. Transactions using bets. In International Workshop on Security Protocols, pages 89–92. Springer, 1997. [123] Paul Youn, Ben Adida, Mike Bond, Jolyon Clulow, Jonathan Herzog, Amerson Lin, Ronald L. Rivest, and Ross Anderson. Robbing the bank with a theorem prover. Technical report, University of Cambridge, Computer Laboratory, 2005. URL https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-644.pdf. [124] Jamie M. Zimmerman and Silvia Baur. Understanding how consumer risks in digital social payments can erode their financial inclusion potential. Technical report, World Bank, Washington DC, 2016. URL https://openknowledge.worldbank.org/ bitstream/handle/10986/24568/Understanding000inclusion0potential.pdf. 139