Tuberculous meningitis has high mortality, linked to excessive inflammation. However, adjunctive anti-inflammatory corticosteroids reduce mortality by only 30%, suggesting that inflammatory pathophysiology causes only a subset of deaths. In Vietnam, the survival benefit of anti-inflammatory corticosteroids was most pronounced in patients with a C/T promoter variant in the leukotriene A_{4} hydrolase (

Tuberculous meningitis is a serious infection of the lining of the brain, which affects over 100,000 people a year. Without treatment, it is always fatal: even with proper antibiotics, about a quarter of patients do not survive and many will have permanent brain damage. Overactive inflammation is thought to contribute to this process. Corticosteroid drugs, which dampen the inflammatory process, are therefore often used during treatment. However, they merely reduce mortality by 30%, suggesting that only some people benefit from them.

Two recent studies have linked the genetic makeup of individuals who have tuberculous meningitis to how they respond to corticosteroids. There were, in particular, differences in the LTA4H gene that codes for an inflammation-causing protein. According to these results, only individuals carrying high-inflammation versions of the LTA4H gene would benefit from the treatment. Yet a third study did not find any effect of the genetic background of patients.

All three papers used frequentist statistics to draw their conclusions, only examining the percentage of people who survived in each group. Yet, this type of analysis can miss important details. It also does not work well when the number of patients is small, or when the effectiveness of a drug varies during the course of an illness.

Another method, called Bayesian statistics, can perform better under these limitations. In particular, it takes into account the probability of an event based on prior knowledge – for instance, that the risk of dying varies smoothly with time.

Here, Whitworth et al. used Bayesian statistics to reanalyse the data from these studies, demonstrating that death rates were correlated with the type of LTA4H gene carried by patients. In particular, corticosteroid treatment worked best for people with the high inflammation versions of the gene. However, regardless of genetic background, corticosteroids were not effective if patients were extremely sick before being treated.

The work by Whitworth et al. demonstrates the importance of using Bayesian statistics to examine the effectiveness of medical treatments. It could help to design better protocols for tuberculous meningitis treatment, tailored to the genetic makeup of patients.

Tuberculous meningitis (TBM) is the most severe form of tuberculosis. Despite effective antimicrobial therapy, it results in 20–25% mortality in HIV-negative individuals and ~40% mortality in HIV-positive individuals (

To further these findings, two new studies of the association of

Bayesian posterior probabilities comparing the two cohorts are shown (probability that mean of starred group is higher, ** > 0.99, *** > 0.999, all other comparisons, not significant). See also Figure S1 for probability differences for each GCS.

Vietnam | Indonesia | |
---|---|---|

439 | 376 | |

| | |

| | |

| |

Both studies used, as the primary metric of significance testing, Cox regression modelling, an approach that assumes that the ratio of hazard rates between groups is constant throughout the observed period (

Contrast of ‘95% significant’ in Bayesian and frequentist paradigms (

Bayes: ‘A is significantly greater than B’ = Posterior probability that A greater than B is at least 0.95.

Frequentist: ‘A is significantly greater than B’ = For any circumstance where A is at most B, the probability of getting data in this critical region, as we did, was at most 0.05.

Therefore:

We expect 1 in 20 of Bayesianly (95%) significant results to be truly negative and therefore false positives;

We expect up to 1 in 20 of truly negative results to be frequentistly significant (at the 95% level) and therefore false positives.

Therefore (assuming all positives are at 95% level):

In the frequentist paradigm, the expected number of false positive results is proportional to the number of comparisons done on true negatives;

In the Bayesian paradigm, the expected number of false positive results is proportional to the number of apparent positive results, and unaffected by any vast number of accompanying apparent negative results.

Therefore, we used a Bayesian approach to analyse data from the two cohorts (

The severity grade-specific analyses, coupled with temporal analyses made possible by Bayesian methods, reveal that the

The age range of patients was similar in the Vietnam and Indonesia cohorts with Indonesia patients tending to be younger (

Frequency of GCS values indicated on the Y-axis as a percentage of the total cohort (n = 376 Indonesia, n = 439 Vietnam). Bayesian posterior probabilities of significant differences between Vietnam (VN) and Indonesia (IN) for mean GCS comprising Grade 2, VN > IN P 0.99996 (15 VN > IN p=0.99999; 11–14 VN < IN P ranging from 0.99985 (13) to 0.98 (14); rest of the values non-significant); for GCS comprising Grade 3, VN > IN, P 0.01 (GCS4, 0.98; GCS9, 0.04; all others not significant).

TBM disease severity classification.

Because the Indonesia cohort was skewed towards more severe disease on presentation, one explanation for the lack of an

If

Definitions and usages.

Definitions.

Abbreviated and example usages.

_{A}, was 30% absolute greater than the corresponding probability p_{B} for group B. (‘_{A} = p_{B} + 0.3, and not that p_{A} = p_{B} × 1.3).

‘

‘

In the original Vietnam study, the TT genotype was associated with survival and the CC and CT genotypes had similarly increased mortality over TT (

Survival probability over all grades in Vietnam (

Bayesian posterior probabilities comparing the two cohorts (probability that starred group is higher, * > 0.95, ** > 0.99, *** > 0.999, all other comparisons, not significant). Comparisons within each cohort yielded no significant differences in

Vietnam | Indonesia | |
---|---|---|

| | |

| | |

| | |

| |

When we stratified the Vietnam patients by grade and

In sum, our analysis revealed that in Vietnam,

Bayesian analysis found that, in the overall Indonesia cohort, survival of the TT patient group was higher than non-TT though falling short of significance (maximum probability 0.92) (

Since Grade 2 patients constitute the bulk of the Indonesia cohort (75.5%), why was the

Thus, Bayesian analysis revealed an

Why might

We confirmed by Bayesian analysis that within each cohort, mortality risk increased with grade severity (

Mean posterior survival probability curves (coloured lines) overlaid by Kaplan-Meier survival plots (black lines) for Vietnam (

Comparison of survival curves (

In sum, these analyses show that the inherent higher mortality associated with more severe disease on presentation was sharply accentuated in Indonesia. Indonesia grade 2 patients experienced similar mortality risk as Vietnam Grade 3 patients with the Indonesia Grade 3 patients experiencing far greater mortality. This higher mortality could potentially explain the finding that the

The finding in two independent cohorts in Vietnam collected from 2001 to 2004 and 2011–2015 that a common functional human variant was associated with responsiveness to adjunctive glucocorticoid treatment in TBM represented an ideal example of pharmacogenomics, coming as it did from mechanistic understanding of the underlying reason (

When we analysed the Vietnam cohort separated by grade severity, we saw that there was indeed a relationship between

Why was the

Panels A and B are comparable to Figure S2B of

Finally, in addition to providing guidance for TBM pharmacogenomic approaches, we hope that our analyses highlight the unique value of Bayesian methods for providing guidance for other complex diseases with difficult treatment decisions. The vital importance of defining the patient populations and subgroups which will benefit the most from specialised interventions and treatments is increasingly appreciated (

The anonymized patient cohort data used here has been previously described in detail (

Patient cohorts were compared overall as well as stratified into disease severity groups based on the TBM grade and by

Comparisons of age and time to median mortality were done using comparisons of arithmetic means of distributions, allowing Bayesian model choice between the following distribution families: Gaussian, log-Gaussian, Student, log-Student, Gamma, inverse-Gamma, Gamma-power. For age, log-Gaussian had overall the highest posterior probability; for time to median mortality, inverse-Gamma was preferred.

We thank R Troll for evaluating the two published cohort studies, realizing that Bayesian analysis could provide answers and initiating the collaboration with the Bayesian statistician RS.

No competing interests declared

Data curation, Formal analysis, Writing - original draft, Writing - review and editing

Software, Writing - review and editing

Data curation, Writing - review and editing

Data curation, Writing - review and editing

Data curation, Writing - review and editing

Data curation, Writing - review and editing

Writing - review and editing

Writing - review and editing

Methodology, Writing - review and editing

Methodology, Writing - review and editing

Methodology, Writing - review and editing

Data curation, Software, Formal analysis, Supervision, Methodology, Writing - original draft, Project administration, Writing - review and editing

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Visualization, Writing - original draft, Project administration, Writing - review and editing

Excel spreadsheets with all patient data are included.

We give here a brief comparison of Bayesian methods with the more commonly encountered frequentist methods familiar to all scientists.

For context we consider a situation where we have collected some data _{0} (that _{1} (that

We describe first the primary characteristics of each method, and then consider the consequences for the further properties they display.

The Bayesian method answers the question:

What is the probability that

giving answers of the form

The probability given the data that

or (with the same meaning)

The posterior probability that

In order to answer this question, the Bayesian method requires as input (a) the data

Note that it is not possible to say ‘We know nothing about the unknown parameters’, and indeed this is never the case (e.g. if the unknown parameter is probability of survival at one year, then we know it is between 0 and 1, and unlikely to be either 0 or 1 exactly). However, one can almost always (and in particular in the present context) give distributions which leave wide open the range of values the parameters could take.

As a result, a Bayesian analysis, in addition to reporting the results, also reports the priors used, and what difference changing the priors makes to the results — in this case see sections 2.3 and 5 of Appendix 2 for these points. If priors are chosen to permit a broad range of parameter values (as in the present paper), it is usually the case that changing them to any other such prior has little effect on the results, so long as there is a reasonable quantity of data.

The frequentist method aims to (a) avoid having to consider what was known before collecting the data, and (b) to control the probability of deciding _{1} is true if in fact _{0} is true (the ‘type I error rate’). To achieve this aim it answers a more complicated question; we give a version of it specific to a particular level of frequentist ‘confidence’:

Did the data

If the answer is Yes, this fact is usually encoded in a statement such as

We are 96% (frequentist-)confident that

or just

Here the

In doing this the frequentist method treats _{0} differently from _{1}, and thereby replaces the Bayesian’s prior with an asymmetry on the hypotheses that doesn’t correspond to any prior.

The frequentist’s avoidance of using a prior and emphasis on controlling the type I error rate together incur a significant cost.

First, it is an absolute requirement of the frequentist method that the critical region be specified

The Bayesian method has no such restriction: a data set may be analysed multiple times

However, both methods can potentially be biased by omitting to report findings which are not to the investigator’s liking. In this paper, the only important omission is that analysis was also done of the HIV-positive patients in Vietnam, and this was not reported in this paper. This was because, although it tended to confirm similar findings to those from the HIV-negative cohort, there were some points that were less clearcut and harder to interpret, and its publication therefore awaits completion of a clinical trial in which the benefit of dexamethasone is being examined for HIV-positive patients of all three genotypes by randomizing them to get dexamethasone or not (

Second, repeated analyses at different stages of data collection and/or analyses of multiple subsets of the patients

Third, frequentist analysis fails to respect common-sense real-world laws of inference. For example, it is often the case that a frequentist analysis will find that it is e.g. 95%-confident that

Fourth, frequentist analysis gives priority to controlling the type I error rate (i.e. the probability that it concludes _{1} holds if in fact _{0} holds), and almost never controls the probability that it concludes that _{0} holds if in fact _{1} holds. (Some frequentist analyses do report ‘power calculations’, which calculate the probability that it concludes that _{0} holds if in fact _{1} holds, e.g. the part of _{1} where _{0} and _{1} identically.

We adopt the Bayesian paradigm. (For those who haven’t encountered Bayesian methods before, a comparison with frequentist hypothesis testing is given in Appendix 1.) Accordingly we define below a probabilistic generative model for patient lifetime

We then collect a data set of values

Since this distribution is hard to visualise, we draw samples of

An example resulting set of survival probability curves is shown in

The true survival probability curve is shown in green, with the Kaplan-Meier plot for the generated data in black. In blue are shown many samples from the posterior distribution on the survival probability curve, calculated from

The true hazard rate curve is shown in green. In blue are shown many samples from the posterior distribution on the hazard rate curve, calculated from

Now, given a set of such curves, we can calculate the mean posterior survival probability (resp. hazard rate) at each time point and plot the posterior mean survival probability (resp. hazard rate) against time. Similarly we can find the 2.5% and 97.5% centiles. Examples of the posterior mean and centile curves for survival probability corresponding to

The true survival probability curve is shown in green, with the Kaplan-Meier plot corresponding to the generated data in black. In blue is the posterior mean survival probability against time, calculated from

Further, given two such sets of survival probability or hazard rate curves (such as the survival probability curves shown in

The Kaplan-Meier plot for subset 1 is in solid black and that for subset 2 in dot-dashed black. Since there were many more patients in subset 2 than in subset 1, we expect greater variance in the inferred survival probabilities for subset 1 than for subset 2.

In order to carry out this process, we need to specifically define the lifetime distribution model

We suppose that there exist an unknown number

Let

Then the patient’s time of death is given by

In particular

We now drop the subscripts

Thus we will set

Here

Note that we have here a distribution which has both a discrete and a continuous part, so that

By way of very approximate intuition:

We specify the priors on the parameters in two stages. First, we specify their general form, and second we choose specific values for the hyperparameters that then specify a unique prior.

The total number

The prior for the parameters

We take the prior on

We take the prior on

We take the prior on each of the parameters

Similarly for parameters

Specific values were chosen for the hyperparameters by varying them and showing those users overseeing the analysis (namely LW, MT, PE, LR, of whom PE and LR are infectious diseases clinicians) the resulting distributions of samples of survival curves and hazard rates, then letting them choose the most appropriate prior given their background experience. The prior chosen was intentionally uninformative and very wide, while still being centred on the clinician-expected survival curves and hazard rates.

The specific parameter values chosen were as follows:

These result in the following depicted distributions for

Thus the parameters

In particular, when comparing two subsets A and B of patients, we use the same prior to infer the posterior distribution of

The basic Gamma model is a frequently used model of a failure mode that goes through several stages of failing, each with an exponentially distributed lifetime, before final failure occurs. The additional effect of the parameter

In particular, we specifically used independent priors on the parameters of the subsets being compared because:

When offered the choice the users (namely LW, MT, PE, LR, GT of whom PE, LR, and GT are infectious diseases clinicians with GT having extensive experience with TB meningitis patients) indicated unanimously that this correctly represented their prior beliefs;

Because we restrict our use of posterior probabilities to those of the form

If they did not believe the priors on the relevant disjoint subsets were independent, the clinicians involved would find it very hard to specify exactly how similar each pair of subsets being compared should be expected to be.

We introduce additional variables

We initialise the parameters

A thorough review of all the following methods is available either in

The key point is that if we resample each variable by a method that satisfies detailed balance, and given other weak conditions which are here fulfilled, Feller’s theorem (

Sampling from the posterior was done by the MCMC technique of Gibbs sampling, that is, sampling from the following distributions palindromically:

_{j}.

10,000 samples were drawn from the posterior for each subset of the data considered (e.g. for Indonesian TT patients). The first 1500 samples were discarded and the remainder kept for analysis. To check that the software was correct we undertook two types of check:

The inference code was reviewed by somebody (RFS) different from its author (JC) looking for bugs, and those found were removed after RFS and JC had conferred to reach agreement on them.

Multiple sets of synthetic data were generated (for which the true values of

In addition we checked for convergence of the Markov chains by starting them from different random initial values of

If the two distributions are identical (as they should be up to uncertainty caused by the non-infinite number of samples drawn during the MCMC runs), then at each time the probability that the ‘red’ distribution is greater than the ‘green’ (see

Because, in the Results section of this paper, one particular specific example of Bayesian inference occurs whose interpretation is slightly tricky, it seems appropriate to discuss it specifically here. This corresponds precisely to the comparison of TT and non-TT genotypes in Grade 1 Indonesia patients.

We refer to

See also

We now collect the TT patients’ data: there is, however, only 1 TT patient, who survives until 1 year before being censored. A single patient, however, has only a small effect on the prior (just as a single head-toss would not convince you a coin was biased): this shifts the posterior for the TT group upwards to the red lines, mean (solid) and centiles (dot-dash).

On the other hand when we collect the non-TT patients’ data, there are 33 of them, so they have a bigger effect, both raising the mean and narrowing the 95% posterior confidence interval to the corresponding green plots. Even though these 33 patients survive less well than the single TT patient, they lift the posterior mean more than does the single TT patient, but the green 95% posterior confidence interval is much narrower than the red one.

Finally, analogous to

See also

Of course, in most examples in the paper, there are more patients in both groups being compared, and we are more likely to get a more definite conclusion.

Specifically for comparisons of TT and nonTT subsets, where the subsets consist of very different numbers of patients, there is particular scope for otherwise unexpected sensitivity to the choice of uninformative priors. To assess this we initially checked, for one such comparison, the effect of using a different prior, namely:

As can be seen by comparing this with

The effect of this on a comparison of a small subset and a large subset would be expected to be to shift the posterior on the small subset upwards in the early period and downwards in the late period compared with the large subset, increasing the significance of the early comparison if the small subset survived better than the large subset at that time, and reducing it if in the other direction (and vice versa at late times).

In the case of the comparison shown in

We remark, however, that even with this significant change in the prior, the inferred comparison probabilities change remarkably little. We have therefore not reported detailed comparisons of alternative priors throughout the results.

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Your description of the potential role of the LTA4H TT variant of Leukotriene A4 Hydrolase in mediating outcomes of tuberculosis meningitis has important implications for developing precision medicine approaches for this and other related inflammatory diseases. The use of Bayesian analysis in your study allowed for a detailed investigation of the contribution of this genotypic variant and has also pointed to other factors that may mediate treatment and disease outcomes, thus creating potential new avenues of research.

Thank you for submitting your article "A Bayesian analysis of the association between Leukotriene A4 Hydrolase genotype and survival in tuberculous meningitis" for consideration by

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional analyses are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at

Summary:

Two Tuberculosis meningitis (TBM) studies conducted in Vietnam and Indonesia to assess the role of polymorphisms in the leukotriene A4 hydrolase (LTA4H) gene, implicated in modulating inflammation, yielded divergent conclusions. The submission by Whitman and colleagues utilizes a Bayesian approach to further investigate the data associated with these studies in an attempt to reconcile the observations. The underlying premise is that the Indonesian cohort was skewed towards more severe disease at presentation, which nullified the effect of the LTA4H genotype with patient survival in the Indonesian cohort.

1) By using a Bayesian approach in the analysis of both previous studies the authors are able to reconcile, in part, the divergent findings of an impact of LTA4H polymorphisms on TBM survival.

2) An unexpected finding was the protective effect of the LTA4H TT genotype on TBM response to glucocorticoid therapy in Vietnamese grade 3 cases. In contrast, in the Indonesian study the impact was found for grade 2 but not grade 3 patients.

3) The lack of effect by the TT genotype in the Indonesian Grade 3 TBM patients could be, as shown in the discussion, due to individuals who are genetically hyper susceptible to uncontrolled inflammation independently of LTA4H, differences in treatment and/or access to top-quality care.

The authors propose that LTA4H genotyping together with data on disease severity could be used to identify TB meningitis patients most likely to benefit from adjunctive glucocorticoid treatment. This is an important study and could shed light on the discordant results obtained from the studies conducted in Vietnam and Indonesia.

Essential revisions:

1) The authors state that the higher mortality in Indonesia was driven by a higher percentage of patients with Grade 2 disease (% with Grade 3 disease being equal in the two countries). They state that the LTA4H TT genotype effect does not extend beyond Grade 2 in Indonesia. In Vietnam though, the effect of TT is most pronounced in Grade 3. In people living with HIV, the TT effect was present in people with Grade 1 or 3 disease, but not with Grade 2 disease. The latter is dismissed as “likely spurious”. However, another explanation is that disparate results in different countries, different stages of disease, and different populations suggests that LTA4H is less useful than other factors (other genetics, environment, etc) in determining disease severity and possibility of treatment response with steroids. Can the authors provide a convincing argument otherwise?

2) Further to the point above, the MRC grade is generally thought to have good association with outcomes and thus, the similar mortality for Indonesia Grade 2 non-TT patients and Vietnam Grade 3 non-TT patients clearly suggests that grade and LTA4H genotyping may not always be able to identify TB meningitis patients most likely to benefit from adjunctive glucocorticoid treatment. Therefore, this conclusion should be toned down in the abstract and the manuscript. Perhaps LTA4H genotyping and mortality risk for each MRC grade (local to that setting) is most correlative of who is most likely to benefit from adjunctive glucocorticoid treatment? Importantly, it is still not clear why a grade-specific mortality difference was noted between the two cohorts. Was it related to the difference in critical / hospital care for these cohorts? Or other aspects? The authors speculate this but do not provide sufficient data, please address this.

3) In Indonesia, the average time to death was 8 days, and in Vietnam it was 50 days. The authors describe LTA4H TT effect as occurring over short or long periods of time, so this adds the element of longitudinal assessment and resulting effect to the mix. What effect would LTA4H play late in disease, once a patient is already on multidrug therapy as well as steroids, from a mechanistic point of view?

4) Was any analysis performed to test whether the patients removed from the cohorts did not skew the population? That is, were the demographic characteristics of the patients excluded from this analysis similar to those included in the study?

5) Provide p values for comparisons across the three sub-populations in Table 1 and Table 2 as well as Supplementary file 1?

6) While substantial details are presented in supplementary methods, a brief introduction (in simple language) on Bayesian methods would be useful for the reader to better understand the main methodology utilized in this study. For example, the statement in subsection “LTA4H TT genotype association with survival becomes stronger with increasing disease severity in Vietnam HIV-negative patients”, "Importantly, the model and priors used allowed us to incorporate our pre-existing knowledge that mortality risk to a population of TBM patients varies smoothly with time, rather than occurring at a number of discrete times common to all patients as is implied by the maximum likelihood solution illustrated by a Kaplan-Meier plot." helps understand the basis for the use of Bayesian analysis.

7) In most clinical trials, assumptions, sample size calculations and outcome measures are predefined to limit bias related to post-hoc analysis. Can the authors present some information on how bias was prevented? If this was not feasible, this should be listed as a study limitation.

8) These data are insightful, but as state above, it is still possible that another genotype or local characteristic of the population (Vietnam versus Indonesia) is / are confounding the results. For example, one of the co-authors of the current study has demonstrated that cerebral tryptophan metabolism (under strong genetic influence) is important for the outcome of TB meningitis (Lancet Infect Dis. 2018). Can the authors analyse the contribution other genetic factors based on currently available genotypic information from these cohorts or from other studies?

9) The authors suggest that “LTA4H genotyping together with disease severity assessment may target glucocorticoid therapy to patients most likely to benefit from it,” but they provide no road map for operationalising the finding. In whom should this genotyping be done? What role does the test result have in decisions regarding provision of steroids? Please provide a decision framework or a tool for use of this test plus data that a clinician would have.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "A Bayesian analysis of the association between Leukotriene A4 Hydrolase genotype and survival in tuberculous meningitis" for further consideration by

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Reviewers have accepted all revisions and explanations but request the following modifications:

You reword your conclusions to discuss additional limitations. For example, the statement, "Thus, LTA4H TT efficacy was limited by other factors that cause mortality. These factors appear independent of severity grade on presentation, and if they exceed a threshold (represented by about ~ 40% mortality) then the beneficial effect of LTA4H TT is lost." This is largely speculative and lacking in detail. What are these proposed factors? Carefully couching the conclusions in context of all limitations will enhance the value of your work and appeal to a broader audience who may be interested in these additional factors.

Essential revisions:

1) The authors state that the higher mortality in Indonesia was driven by a higher percentage of patients with Grade 2 disease (% with Grade 3 disease being equal in the two countries). They state that the LTA4H TT genotype effect does not extend beyond Grade 2 in Indonesia. In Vietnam though, the effect of TT is most pronounced in Grade 3. In people living with HIV, the TT effect was present in people with Grade 1 or 3 disease, but not with Grade 2 disease. The latter is dismissed as “likely spurious”. However, another explanation is that disparate results in different countries, different stages of disease, and different populations suggests that LTA4H is less useful than other factors (other genetics, environment, etc) in determining disease severity and possibility of treatment response with steroids. Can the authors provide a convincing argument otherwise?

First, we have decided to remove the HIV cohort data from the paper so as to not distract from the core finding of the paper, namely that the use of Bayesian methods has enabled us to see that the LTA4H TT genotype does improve survival in the context of dexamethasone in Indonesia also, and that this effect is affected by mortality driven by other factors. An ongoing dexamethasone trial in HIV-positive patients who will be assessed for

Second, we have no data on whether the

Finally, we respectfully differ from the reviewers’ suggestion that

2) Further to the point above, the MRC grade is generally thought to have good association with outcomes and thus, the similar mortality for Indonesia Grade 2 non-TT patients and Vietnam Grade 3 non-TT patients clearly suggests that grade and LTA4H genotyping may not always be able to identify TB meningitis patients most likely to benefit from adjunctive glucocorticoid treatment. Therefore, this conclusion should be toned down in the abstract and the manuscript. Perhaps LTA4H genotyping and mortality risk for each MRC grade (local to that setting) is most correlative of who is most likely to benefit from adjunctive glucocorticoid treatment? Importantly, it is still not clear why a grade-specific mortality difference was noted between the two cohorts. Was it related to the difference in critical / hospital care for these cohorts? Or other aspects? The authors speculate this but do not provide sufficient data, please address this.

We are puzzled by the reviewers’ reasoning that “… the similar mortality for Indonesia Grade 2 non-TT patients and Vietnam Grade 3 non-TT patients clearly suggests that grade and LTA4H genotyping may not always be able to identify TB meningitis patients most likely to benefit from adjunctive glucocorticoid treatment.” In both cohorts, there is a clear association between MRC grade and survival with the Indonesia cohort faring worse at every grade. This finding came as a surprise to all of us because we had all thought that the difference in overall survival between the cohorts was simply because Indonesia had more severe Grade patients. This surprising result first became apparent when we analysed the TT versus non-TT data in Figure 2. To validate this, we performed the Grade-specific survival analyses in Figure 3 and then the direct head-on comparison of grade-specific survival in the two cohorts in Figure 4. If we take all these results together, the most parsimonious conclusion is that the TT genotype provides a benefit up to a point of no return, i.e. up to Grade 2 in Indonesia. After that it cannot help, for the reasons described in the paper and reiterated in the following paragraph. It is not surprising therefore, that in the absence of the TT genotype, Indonesia Grade 2 patients have a similar mortality to Grade 3 Vietnam patients, just as they do in the overall cohort (Figure 3).

In response to the reviewers’ further queries, we have ruled out that there was a hidden increased severity in Grade 3 Indonesia as reflected by the GCS scores which are sensitive for smaller changes in severity in Grades 2 and 3. While Grade 2 disease in Indonesia was somewhat more severe in Vietnam, Grade 3 disease was less severe. These new analyses are presented in Supplementary file 1 and in the Discussion.

Furthermore, we don’t think that we have downplayed the genotype-independent mortality factors. Rather, we have discussed them as extensively as we can in the Discussion. The Abstract also explicitly specifies that genotype-based therapy has its limitations: “However, its benefit is nullified in the most severe cases where other factors cause early mortality.” Based on our understanding of the levels of care available in Indonesia and Vietnam our suspicion is that some or most of these differences are attributable to the ability to provide expert intensive respiratory and other critical care support at the study site hospitals. We know that all patients in Vietnam were cared for in hospitals with critical care expertise whereas in Indonesia this was only possible for a small fraction of the cohort. However, this is a hypothesis that can only be addressed by patient-chart level review and by determination of the mortality rates for other critical illnesses at the same participating hospitals. We do not have those data, and thus have to leave this as an unproven hypothesis. Therefore, all that we can say is what is already in the Discussion: “The more likely possibility is that better ancillary care was possible in Vietnam where all patients were enrolled into a clinical trial versus only 17% in Indonesia (Thuong et al., 2017; van Laarhoven et al., 2017). Optimized respiratory support, in particular, would be essential to keep patients alive through the early high-risk stage in order allow for anti-inflammatory effects of corticosteroids to benefit the TT patients.”

3) In Indonesia, the average time to death was 8 days, and in Vietnam it was 50 days. The authors describe LTA4H TT effect as occurring over short or long periods of time, so this adds the element of longitudinal assessment and resulting effect to the mix. What effect would LTA4H play late in disease, once a patient is already on multidrug therapy as well as steroids, from a mechanistic point of view?

As shown in Figure 2, the

4) Was any analysis performed to test whether the patients removed from the cohorts did not skew the population? That is, were the demographic characteristics of the patients excluded from this analysis similar to those included in the study?

This information is now included in Supplementary file 1. We have decided that a formal analysis is not appropriate given that the reason that the patients were excluded is because they had key information missing. We knew the ages of all the patients who were excluded and the Bayesian comparison probability of the mean of the excluded distribution being greater than the included on was not significant (P 0.85, inverse-Γ preferred distribution).

5) Provide p values for comparisons across the three sub-populations in Table 1 and Table 2 as well as Supplementary file 1?

Bayesian comparison probabilities have now been computed for these comparisons for each line of Table 1 and Table 2 and described in the Materials and methods. The original Supplementary file 1, which pertained to the HIV-positive patients has been removed. (The new Supplementary file 1 contains the information described in the point 4 response above.)

6) While substantial details are presented in supplementary methods, a brief introduction (in simple language) on Bayesian methods would be useful for the reader to better understand the main methodology utilized in this study. For example, the statement in subsection “LTA4H TT genotype association with survival becomes stronger with increasing disease severity in Vietnam HIV-negative patients”, "Importantly, the model and priors used allowed us to incorporate our pre-existing knowledge that mortality risk to a population of TBM patients varies smoothly with time, rather than occurring at a number of discrete times common to all patients as is implied by the maximum likelihood solution illustrated by a Kaplan-Meier plot." helps understand the basis for the use of Bayesian analysis.

Thank you for encouraging us to describe this methodology further. We have added this information in Appendix 1, comparing frequentist and Bayesian paradigms and refer to it at the end of the paragraph discussing Bayesian analysis (Introduction).

7) In most clinical trials, assumptions, sample size calculations and outcome measures are predefined to limit bias related to post-hoc analysis. Can the authors present some information on how bias was prevented? If this was not feasible, this should be listed as a study limitation.

A major strength of Bayesian analysis is that bias due to post-hoc subgroup analysis doesn't arise, unless one only provides a selected subset of the results that suit one’s thesis. In response to the reviewers’ comment, we have included in the Introduction the following sentence:

“Finally, relevant to this re-analysis of completed clinical studies, Bayesian paradigms have less potential for bias arising from post-hoc analysis (Appendix 2).” We have also included details of this aspect of Bayesian analysis in Appendix 2 section 1.3. Since, we have now decided to withhold the HIV-positive data for the time being, we note that in this section, we have mentioned in it that we left out the HIV-positive data with the following lines:

“In this paper, the only important omission is that analysis was also done of the

HIV-positive patients in Vietnam, and this was not reported in this paper. This was because, although it tended to confirm similar findings to those from the HIV-negative cohort, there were some points that were less clear cut and harder to interpret, and its publication therefore awaits completion of a clinical trial in which the benefit of dexamethasone is being examined for HIV-positive patients of all three genotypes by randomizing them to get dexamethasone or not.”

8) These data are insightful, but as state above, it is still possible that another genotype or local characteristic of the population (Vietnam versus Indonesia) is / are confounding the results. For example, one of the co-authors of the current study has demonstrated that cerebral tryptophan metabolism (under strong genetic influence) is important for the outcome of TB meningitis (Lancet Infect Dis. 2018 May;18(5):526-535). Can the authors analyse the contribution other genetic factors based on currently available genotypic information from these cohorts or from other studies?

This is a good idea but beyond the scope of this study which was focused on the Indonesia

9) The authors suggest that “LTA4H genotyping together with disease severity assessment may target glucocorticoid therapy to patients most likely to benefit from it,” but they provide no road map for operationalising the finding. In whom should this genotyping be done? What role does the test result have in decisions regarding provision of steroids? Please provide a decision framework or a tool for use of this test plus data that a clinician would have.

This is an interesting idea. Upon reflection, however, we think it is premature to suggest a decision algorithm until further information becomes available. In particular, we await the results of the ongoing trial randomizing CC and CT patients to dexamethasone or placebo. As with the HIV-positive patients, he analysis methods and computational programs developed for this study will be invaluable in analyzing the results of this trial.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Reviewers have accepted all revisions and explanations but request the following modifications:

You reword your conclusions to discuss additional limitations. For example, the statement, "Thus, LTA4H TT efficacy was limited by other factors that cause mortality. These factors appear independent of severity grade on presentation, and if they exceed a threshold (represented by about ~ 40% mortality) then the beneficial effect of LTA4H TT is lost." This is largely speculative and lacking in detail. What are these proposed factors? Carefully couching the conclusions in context of all limitations will enhance the value of your work and appeal to a broader audience who may be interested in these additional factors.

We thank the reviewers for their careful review and are very pleased that they have approved our revisions.

We have made the following changes in response to their one outstanding request (Discussion section):

1) We have explicitly stated that we do not know for certain what the cause of excess Grade 3 LTA4H deaths in Indonesia is.

2) We have more clearly divided up the possible causes into non-genetic, ancillary care related causes and

We hope this meets with the reviewers’ approval. The changes are here for the reviewers’ convenience:

“Thus the beneficial effect of dexamethasone to the