Strategies to Roam Enzymatic Fitness Landscapes in Ultrahigh-Throughput
Ultrahigh-throughput screening of enzymes enables us to explore large regions of sequence space – the theoretical universe of all possible enzyme variants for a given protein function. Each theoretical enzyme has a fitness value; peaks for high activity variants, and valleys for inactive enzymes that together shape the topological features of a fitness landscape. Enzyme function is as of yet virtually impossible to predict from sequence alone and finding fitness peaks often relies on random, iterative cycles of sampling and selecting variants of interest from said fitness space; a technique better known as directed evolution. Sequence space is vast (20N), and the probability of finding sparse peaks of high fitness, or, enzyme variants that catalyse a biotransformation of interest with high efficiency, is significantly dictated by the throughput of the screening approach. Increasing the throughput of enzyme screening has been greatly aided by the development of (water-in-oil emulsion) droplet technology, miniaturizing the reaction vessel to a single droplet of ~pL volume, thus enabling 1 billion parallel reaction vessels per mL in which unique enzymes are assessed for product formation. Droplet technology has been limited by its detection methods, which often rely on the fluorescence of the enzymatic product. Protein kinases, which catalyse the post-translational phosphorylation of protein side chains in signalling cascades are examples of protein classes that have remained elusive to droplet-based ultrahigh throughput screening due to a lack of suitable fluorescent readouts. Likewise, most small molecules of interest are not fluorescent, and droplet-based evolution of enzymes to produce value-added molecules would greatly benefit from a generalisable, label-free detection method for non-fluorescent small molecules of interest in droplets. In this dissertation, I outline a bead-based droplet assay used to probe human protein kinases in ultrahigh-throughput. The archetypal MKK-ERK cascade is essential to all complex life, and functions to communicate decisions on cell fate. Randomising multiple residues of MKK1 simultaneously, and testing >500,000 MKK1 variants for phosphorylation of its cognate partner ERK2 gave a unique insight into the multiple solutions, or fitness peaks, which exist to catalyse phosphate transfer in this protein signalling family. The randomised residues show high degrees of connectivity, where substitutions at one position shape the sequence preference at other positions through non-linear magnitude epistasis, establishing the importance of epistatic effects in human protein kinase pathways. In an effort to create a label-free screening approach which is more generalisable to other protein classes, an aptamer-based sensor for tryptophan was created. Tryptophan is not detectable in droplets by conventional fluorescent activated droplet sorting or absorbance activated droplet sorting, and as such exemplifies how aptamers serve to detect enzymatic formation of an otherwise non-fluorescent molecule in droplets. Tryptophan synthase, which condenses L-serine and indole to create tryptophan was evolved using the sensor, probing the vastness of fitness space for a C-C bond forming reaction in ultrahigh-throughput to discover biocatalysts of industrial relevance.