Amyloidogenic proteins in the SARS-CoV and SARS-CoV-2 proteomes

The phenomenon of protein aggregation is widespread and associated with a wide range of human diseases. Our knowledge on the aggregation behaviour of viral proteins, however, is still rather limited. Here, we investigated the distribution of aggregation-prone regions in the the SARS-CoV and SARS-CoV-2 proteomes. An initial analysis using a panel of sequencebased predictors suggested the presence of multiple aggregation-prone regions in these proteomes, and revealed an enhanced aggregation propensity in some SARS-CoV-2 proteins. We then studied the in vitro aggregation of predicted aggregation-prone regions in the of SARS-CoV-2 proteome, including the signal sequence peptide and fusion peptide 1 of the spike protein, a peptide from the NSP6 protein (NSP6-p), the ORF10 protein, and the NSP11 protein. Our results show that these peptides and proteins form aggregates via a nucleationdependent mechanism. Moreover, we demonstrated that the aggregates of NSP11 are toxic to mammalian cell cultures. These findings provide evidence about the aggregation of proteins in the SARS-CoV-2 proteome. Highlights The SARS-CoV and SARS-CoV-2 proteomes contain proteins harbouring aggregation-prone regions. The SARS-CoV-2 proteome tends to be more aggregation prone than the SARS-CoV proteome. Accessory proteins tend to be more aggregation prone than structural and non-structural proteins. The proteins ORF10 and NSP11 of SARS-CoV-2 can form amyloid aggregates. The signal sequence and fusion peptide 1 of spike can form amyloid aggregates.

viral proteins may help the virus in hijacking the replication machinery of a host cell [3,4], (ii) the aberrant aggregation of viral proteins may represent an additional mechanism by which viruses damage the host cells [5,6], (iii) the viral particles can trigger the misfolding and aggregation of host proteins, a process that for some viruses has been linked with the onset of Alzheimer's disease [7].
Some viral proteins are known to form amyloid aggregates that are implicated in the viral pathogenesis. A remarkable example is that of the protein PB1 of the influenza A virus, which forms one of the three subunits of the viral polymerase. During the early stages of viral infection, PB1 accumulates in its monomeric form, but then convert into an amyloid-like form at a later stage of infection [8,9]. As a disordered protein, PB1 switches from a random conformation to an α -helical or which can be divided into three classes: structural, accessory, and non-structural proteins ( Figure 1) [17]. Our results identify non-structural proteins (NSPs) as particularly aggregation prone, which play a crucial role during the initial and transitional phases of the viral life cycle in the host cell. We also compared the aggregation propensities of the SARS-CoV-2 proteins with those of the SARS-CoV proteins. We then focused our analysis on specific protein regions of SARS-CoV-2 by investigating the signal sequence peptide and fusion peptide 1 of spike protein, full-length ORF10, NSP6-p (residues 91-112 of NPS6), and NSP11. All these protein regions exhibited APRs in our in silico analysis. To test these predictions, we then carried out experimental investigations of these protein regions using fluorescence and spectroscopy methods. We further employed atomic force microscopy (AFM) to visualize the morphology of resultant aggregates. In addition, we investigated the cytoxicity of NSP11 aggregates on different mammalian cell lines.

Abundance of amyloidogenic regions in the SARS-CoV and SARS-CoV-2 proteome
The tendency to self-assemble into amyloid structures is an intrinsic property of proteins and depends on the presence of APRs within their amino acid sequences [18,19]. This tendency is in competition with that of self-assembling to form functional complexes [20]. To investigate this competition, in this study we analyzed the tendency of SARS-CoV and SARS-CoV-2 proteins to form amyloid aggregates using different computational prediction methods. We employed a combination of three different individual predictors -FISH Amyloid, AGGRE-SCAN, and FoldAmyloid, and a meta-predictor MetAmyl to analyse the presence of aggregation prone regions. Further, hydrophobic residues involved in formation of aggregates mainly constitute the insoluble bodies in and around cells [21,22]. Therefore, we also used an additional server, CamSol, to predict the degree of hydrophobicity and hydrophilic regions in the proteins of both viruses. APRs predicted by different servers are abundant in both proteomes, and all proteins are found to contain at least one APR (Tables S1-S6).
For the comparison of amyloidogenic propensity of the SARS-CoV-2 and SARS-CoV proteomes, we calculated the the mean predicted percentage amyloidogenic propensity (PPAP) from the aggregation-prone regions obtained from four servers (MetAmyl, FISH Amyloid, AGGRESCAN, and FoldAmyloid) for both viruses (Figure 2a,b). Numerous proteins, par-ticularly accessory proteins of SARS-CoV-2 (Figure 2a) were observed to be more amyloidogenic as compared with accessory proteins of SARS-CoV (Figure 2b).
In SARS-CoV-2, among the structural proteins, membrane and envelope proteins are found to be more amyloidogenic by FoldAmyloid in comparison with nucleocapsid and spike proteins (Figure 2c and Table S1), and the accessory proteins to have more abundance of APRs (Table S2). Except for ORF9b, all other proteins are showing FoldAmyloid profile value above the threshold value (Figure 2d). The functions of these proteins are less well characterized, though they may be involved in enhancing the virulence [17]. As shown in Table S3, the NSPs contain several APRs. These 16 NSPs perform diverse roles, such as evading the host immune system, protection from host defence mechanisms, virus replication, and spreading of the infection [17]. From the FoldAmyloid analysis, after plotting the average profile value for protein (Figure 2e), NSP4 and NSP6 show profile values above the cut-off, which indicates a highly amyloidogenic nature.
Furthermore, to gain insights into the possible cleavage of APRs, 20S proteasome cleavage sites within the entire SARS-CoV-2 proteome were predicted by the NetChop 3.1 server (Tables S7-S9). The reason of identifying these sites, in context of our study, is twofold.
First, the predicted sites residing inside APRs suggest that due to aggregation the proteasome system may not be able to successfully cleave the viral proteins. Secondly, proteasome could cleave the viral proteins and release amyloidogenic reagions inside the host cell. Here, in case of accessory proteins, we found many cleavage sites located in APRs (Tables S7-S9).

Aggregation-prone regions in structural proteins
Four structural proteins of SARS-CoV-2 participate in the virion assembly and packaging processes and in providing structure to the virus [23]. We analyzed the APRs of these structural proteins ( Table S1). One of these proteins, spike, is a heavily glycosylated transmembrane protein whose N-terminal S1 domain harbours receptor binding sites for the host cell, and C-terminal S2 domain mediates the fusion between virion and host cells [24]. It has sev- predicted by all the four servers to be amyloidogenic. We note that a recent cryo-EM study revealed a hinge motion of S1, which take the virus from its active state to its inactive state [25].
To investigate a possible relationship between Covid-19 and Alzheimer's disease [7], we performed a multiple sequence alignment (MSA) of the Aβ42 peptide with spike ( Figure S1a).
Interestingly, we observed that Aβ42 containes a similar amyloidogenic region ( 26 SNKGAI 31 ) as detected in spike ( 968 SNFGAI 973 ). By inspecting frames in the open state ( Figure S1b), we observe that this APR can be exposed in the active state, thereby speculating a possible mechanism by which the virus could influence the aggregation of Aβ42.
The membrane protein M gives shape to the virus, promotes viral membrane curvature, and binds with the nucleocapsid RNA complex during virus packaging. The regions of residues 20-38, 51-57, 60-71, 80-97, 139-145 of M are the commonly predicted APRs. The envelope protein E is a transmembrane protein with an ion channel activity that facilitates the assembly and release of viral particles. The amyloid-forming propensity of 9-residue stretch (VY-VYSRVK) of E have been reported previously for SARS-CoV [14]. The nucleocapsid protein N is the proteinaceous part of the viral nucleocapsid. It interacts with viral RNA and helps its packaging into the virion. It also interacts with other viral proteins such as the membrane and NSP3. According to the analysis, AGGRESCAN detected a total of 18% amyloidogenic regions in SARS-CoV-2 N and only 16% in SARS-CoV N. Similar to these results, FoldAmyloid, also predicted only 11% amyloidogenic regions in SARS-CoV N, ~3% less than SARS-CoV-2 N, which contains a total of 14% amyloidogenic regions.

Aggregation-prone regions in accessory proteins
The coronavirus genome codes for proteins termed accessory, which are multifunctional proteins that play an important role in modulating the host response to virus infection, such as down-regulation of interferon pathways, the release of proinflammatory cytokines and chemokines, and the induction of autophagy. According to predictors used in this study, all accessory proteins have multiple aggregation-prone regions (

Aggregation-prone regions in non-structural proteins
SARS-CoV-2 has 16 non-structural proteins, and we identified APRs in all of them (  [26]. NSP5 is a serine like protease that catalyzes the rest eleven cleavage events of the Orf1ab polyprotein. Our data suggest that there are several short stretches of APRs in NSP5. NSP12 is an RNA-dependent RNA polymerase, and NSP7 and NSP8 function as its processivity clamps. It also has very short stretches of predicted APRs. NSP10 is a cofactor for NSP16, which protects viral RNA from host antiviral measures. NSP13 is an RNA helicase whose 224-228, 292-298, 355-360, 543-547, and 571-575 regions were the most commonly predicted APRs. NSP14 is a methyltransferase that adds a 5' cap to viral RNA and is involved in proofreading of the viral genome by virtue of its has 3'-5' exonuclease activity.
NSP15 is an endoribonuclease that has a defensive role from host attacks.

Experimental analysis of SARS-CoV-2 proteins and protein regions
After the computational prediction of APRs in the SARS-CoV and SARS-CoV-2 proteomes, we investigated the in vitro aggregation behaviour of various proteins and peptides. For this purpose, we selected pH 7.4 and temperature 37 ºC, and traced the aggregation process using the fluorescent dye thioflavin T (ThT), which interacts with amyloid fibrils and gives a maximum emission peak at ~490 nm upon binding to β -sheets in amyloid fibrils [27] [28]. As protein aggregation has been shown to occur via a nucleation-polymerization mechanism [29,30], we studied this reaction using ThT fluorescence (ߣ max at 490 nm) in presence of a fixed volume of incubated samples (25 µM). Before initiation of aggregation reaction (at 0 hr incubation), the peptides were treated with hexafluoroisopropanol (HFIP) to start from monomeric conformations.
Then, we employed AFM to gain insights in morphological feature of aggregates. To this end, we studied the in vitro aggregation potential of the spike signal peptide and fusion peptide 1, ORF10 protein, NSP6-p, and NSP11 proteins of SARS-CoV-2. Additionally, 131-180 residues of NSP1 protein (NSP1-p) have not been predicted to contain any APR by any of the software used. CamSol results also suggest the high hydrophilic character of these residues.
Therefore, we considered NSP1-p as negative control in this study. In accordance with our bioinformatic analysis, we found that NSP1-p peptide does not form amyloid-like aggregates and it does not show any change in the ThT assay.

Spike signal peptide
Spike plays a key role in the receptor recognition and cell membrane fusion process. In its mature form, spike contains four regions, a signal sequence, ectodomain, transmembrane domain and endodomain, which is further divided into the S1 and S2 subunits. The 12residue N-terminal signal sequence (SP) directs spike to its destination in the viral membrane [31]. CamSol calculated SP to be poorly soluble and all aggregation prediction servers identified it as an APR (Table S1)

Spike fusion peptide 1
Spike contains a fusion peptide (FP1) of 15-20 residues in the S2 subunit that helps the virus penetrate the host cell membrane [31,32]. All the predictors used in this study identified APRs in this region. We then analysed the aggregation of FP1 in vitro finding an increment in ThT fluorescence intensity (Figure 4a). Upon a 3.5 hour incubation, we found an increased ThT fluorescence by ~6 fold in comparison to a freshly dissolved peptide sample.
Using ThT fluorescence at 490 nm, the aggregation of FP1 reached saturation in about 3 hours with a T 1/2 of about 1 hour (Figure 4b). Tapping-mode AFM used to investigate the morphological features of aggregates in this study exposed the presence of numerous entangled filaments (Figure 4c,d). These differently sized filamentous FP1 aggregates of 96 hours have a height distribution peaking at ~6 nm (Figure 4e). For FP1, the ThT fluorescence readings were maintained at saturation while preparing the samples for AFM imaging.

ORF10 protein (full-length)
The SARS-CoV-2 ORF10 protein is predicted to be translated into a 38-residue long protein that does not have significant homology with any known proteins. Although the evidence of presence of ORF10 in SARS-CoV-2 infected cells is limited [33,34], it has been investigated to have high dN/dS (nonsynonymous over synonymous substitution rate) (3.82) and is positively evolving [35]. To this end, it has been predicted to contain multiple APRs. AGGRE-SCAN located the regions of residues 1-9, 11-20, and 25-38 as APRs, while FISH Amyloid detected only the regions of residues 23-27 and 31-35 as APRs (see Table S2 for all APRs).
In in vitro experiments, ORF10 showed a ~7 fold increase in ThT fluorescence after 11 days of constant stirring at 1000 rpm and incubation at 37 °C (Figure 5a). It showed a constant increase in ThT fluorescence (at 490 nm) upto 150 hours. The speedy kinetics of ORF10 aggregation eliminated the lag phase directly skipping to a lengthier log phase nucleation reaction. The T 1/2 of aggregation kinetics of this process is calculated to be ~30 hours ( Figure   5b). In comparion with fibrillar aggregates of other peptides studied in this report, ORF10 aggregates were visualized to form amorphous aggregates using AFM (Figure 5c-e). The ThT fluorescence of ORF10 protein aggregates were maintained at saturation while preparing the samples for AFM imaging.

NSP6-p
The SARS-CoV-2 NSP6 protein is one of an essential host immune system antagonist. It antagonizes IFN-I signaling through blocking TANK binding kinase 1 (TBK1) to suppress interferon regulatory factor 3 (IRF3) phosphorylation efficiently than the SARS-CoV and MERS-CoV NSP6 proteins [36]. Whilst accomodating multiple transmembrane regions, the region of residues 91-112 of NSP6 is of particular importance since it lies outside the membrane and thereby can interact with host proteins [37]. This region exhibited APRs using all predictors ( Table S3). The intrinsic solubility of this region is also calculated to low by CamSol server. As an experimental validation, we studied that aggregation of the NSP6-p peptide (residues 91-112 of NSP6) in buffered conditions. Its aggregation using ThT fluorescence is detected at ~40 hours (Figure 6a). A kinetic analysis revealed a ~15 hours long lag phase followed by an exponential phase attainting the final plateau phase (Figure 6b). The

NSP11 protein (full-length)
NSP11 of SARS-CoV-2 is a 13 amino acid length peptide, cleaved from polyprotein 1a. The first 9 residues of NSP11 are similar to the first 9 residues of NSP12 (RdRp) [38]. For the identification of amyloid-forming residues and the aggregation propensity of NSP11, we used four aggregation prediction servers. AGGRESCAN predicted the region 6-13, FISH Amyloid predicted 6-10, and MetAmyl Predicted 8-13 in NSP11 as APRs (Table S3). ThT dye binds to NSP11 amyloid fibrils (192-hour incubated sample) and increases its fluorescence intensity as compared to the freshly-dissolved NSP11 monomers (Figure 7a). Futhermore, according to a kinetic analysis of NSP11 aggregation, following a lag phase of ~45 hours , the process reached a plateau phase after 110 hours (Figure 7b) (Figure 7d,e).
We then monitored by far-UV CD-spectroscopy the changes in the secondary structure during the aggregation of NSP11. The far-UV CD spectrum of monomeric NSP11 is representative of the disordered proteins, which is also reported in our previous study [39], represents a robust negative band near 200 nm (Figure 7c). However, after incubating NSP11 at 37 ºC in phosphate buffer (20 mM phosphate, 50 mM NaCl, pH 7.4) at 1000 rpm for 192 hours, we observed a gain of weak negative band, appearing as a shoulder around 218 nm, suggesting the presence of β -sheet secondary structure elements in aggregated NSP11 protein (Figure   7c). This result indicates that, upon aggregation, NSP11 converts from disordered to a conformation with a considerable secondary structure.

Toxic effects of NSP11 aggregates on the viability of mammalian cells
Studies on the amyloid aggregation of viral proteins may reveal an additional potential of the viruses to damage host cells. We therefore investigated whether or not β -sheet rich amyloid fibrils of SARS-CoV-2 NSP11 could be cytotoxic to human cells. We performed MTT assay, a colorimetric-based cell viability test to assess the effect of aggregates on the SH-SY5Y and HepG2 cell lines. The cells were treated with varying concentrations of NSP11 monomers (used as control) and amyloid aggregates (192 hours) separately. No significant cell death was observed after 24 hours (Figure 8a,c). However, when treatment was extended to 72 hours, cell death observed at the highest aggregate concentration is comparatively enhanced (Figure 8b,d).
We also observed that percent cell viability in HepG2 cells is reduced more than that in SH-SY5Y cells, suggesting the aggregates are comparatively more toxic to the liver cell line.
HepG2 cell viability remained around 74%, while in SH-SY5Y percent cell viability remained around 86%. These results thus suggest that NSP11 aggregates are cytotoxic at relatively higher concentrations.

Discussion and Conclusions
This study was inspired by a series of reports that have associated viral infections with protein misfolding diseases of the central nervous system (CNS), including Alzheimer's disease and related dementias [40]. The Ljuangan virus was detected in the hippocampus of Alzheimer's disease patients [41], and HIV was found to affect the levels of Aβ42 in the cerebrospinal fluid [42]. An epidemiological study recently linked herpes infections with an increase in the risk of dementia [43], consistently with other reports where HSV-2 was associated with a decline in temporal cognitive abilities of aged individuals [44] and HSV-1 was linked to Alzheimer's disease through the facilitatation of the formation of amyloid-like structures in neural stem cells and bioengineered brain tissue cultures [45]. Furthermore, the H5N1 influenza virus was reported to induce acute neurological signs such as motor disturbances and mild encephalitis in animal models resulting in CNS disorders of protein aggregation like Alzheimer's and Parkinson's diseases [46].
In coronaviruses, SARS-CoV has been reported to target mouse brains [47] and was isolated from brain tissues of a Covid-19 patient with severe CNS symptoms [48]. Another case study showed the development of polyneuropathy in SARS-infected patients after the first outbreak in 2004 [49]. It was then reported that SARS-CoV can enter the brain via the olfactory bulb and that intracranial inoculation can cause extensive neuronal infection leading to death [50].
Likewise, SARS-CoV-2 was associated with severe neurologic manifestations, including acute cerebrovascular diseases, skeletal muscle injury, impaired consciousness, and acute hemorrhagic necrotizing encephalopathy [51,52]. The presence of SARS-CoV-2 in the cerebrospinal fluid of patients has been linked with meningitis and meningoencephalitis [53,54], and neurons infected with SARS-CoV-2 displayed altered distribution of the tau protein [55].
AGGRESCAN is based on a scale for natural amino acids derived from in vivo experiments.
It also assumes that short and specific sequences within the protein can regulate protein aggregation. It gives a hot spot area (HSA) score for susceptible aggregate forming residues [56]. FoldAmyloid calculates the probability of backbone-backbone hydrogen bond formation and efficiently classifies the amyloidogenic peptides. It determines the amyloidogenic residues scoring above 21.4, a threshold assumed by the server [57]. MetAmyl is a metapredictor and combines the strength of different individual predictors -PAFIG, SALSA, Waltz and FoldAmyloid. It creates a logistic regression model and gives score which is interpreted as probability for a fragment to form an amyloid fibril [58]. FISH Amyloid is a new machine learning prediction method based on the presence of a segment with the highest scoring for co-occurrence of residue pairs [59]. Additionally, CamSol is used to predict the hydrophilic and hydrophobic regions of proteins. It calculates the intrinsic solubility of proteins, which is inversely related to their aggregation propensity. CamSol assigns values to each amino acid, negative values below -1 represent insoluble residues, and positive values above +1 represent soluble residues [60]. The region in a protein that is predicted with five or more than five amino acids long were considered as potentially amyloidogenic regions. Furthermore, in silico mapping of 20S proteasome cleavage sites across SARS-CoV2 proteome predicted with proteasome prediction server NetChop 3.1 algorithm, webserver tool.

Preparation of the samples for the aggregation assays. Peptides were dissolved in a 100%
HFIP to remove pre-existing aggregates, and left to evaporate at room temperature overnight resulting in dry peptides. These peptide films were then dissolved according to their hydrophobic character and solvent recommended by GenScript and Thermo Scientific, USA, before incubation for aggregation experiments, as detailed in Table 1. Monomeric samples for the aggregation assays were then incubated at 37 ºC with constant stirring (1000 rpm) on Eppendorf ThermoMixer C.
where A1 indicates the initial fluorescence, A2 the final fluorescence,

Atomic force microscopy (AFM).
We obtained AFM images of aggregated fibrils using tapping-mode AFM (Dimension Icon from Bruker). We carried out the measurements by depositing a twenty-fold diluted aggregated solution on a freshly cleaved mica surface for each of the peptides. After incubating for 1 h, the surfaces were rinsed three times with deionized water, dried at room temperature overnight, and finally, images were recorded. Data availability: All data are contained within the manuscript or as supporting information.
and computational analysis predictions. TB, KG, SKK, and PK analyzed data and wrote the manuscript.

Conflict of interests:
All the authors declare that there is no conflict of interests.
Smeyne, Highly pathogenic H5N1 influenza virus can enter the central nervous system and induce neuroinflammation and neurodegeneration, Proceedings of the National