Wednesday, May 29, 2013

WHY PUBLISHED SIGNIFICANCE VALUES ARE (MOSTLY) LIES


Jim Wood

In my recent post advocating the abandonment of NHST (null-hypothesis significance testing), I skimmed over two important issues that I think need more elaboration: the multiple-test problem and publication bias (also called the “file-drawer” bias).  The two are deeply related and ought to make everyone profoundly uncomfortable about the true meaning of achieved significance levels reported in the scientific literature.

The multiple-test problem is wonderfully illustrated by this cartoon, xkcd's 'Significant', by the incomparable Randall Munroe.  The cartoon was subject of Monday's post here on MT (so you can scroll down to see it), but I thought it worthy of further reflection.  (If you don’t know Munroe's cartoons, check ‘em out.  In combination they’re like a two-dimensional “Big Bang Theory”, only deeper.  I especially like “Frequentists vs. Bayesians”.)  


xkcd: Frequentists v Baysians

If you don’t get the "Significant" cartoon, you need to study up on the multiple-test problem.  But briefly stated, here it is:  You (in your charming innocence) are doing the Neyman-Pearson version of NHST.  As required, you preset an a value, i.e. the highest probability of making a Type 1 error (rejecting a true null hypothesis) that you’re willing to accept as compatible with a rejection of the null.  Like most people, you choose a = 0.05, which means that if your test were to be repeated 20 times on different samples or using different versions of the test/model, you’d expect to find about one “significant” test even when your null hypothesis is absolutely true.  Well, okay, most people seem to find 0.05 a sufficiently small value to proceed with the test and reject the null whenever p < 0.05 on a single test.  But what if you do multiple tests in the course of, for example, refining your model, examining confounding effects, or just exploring the data-set?  If, for example, you do 20 tests on different colors like the jellybean scientists, then there’s a quite high probability of getting at least one result with p < 0.05 even if the null hypothesis is eternally true.  If you own up to the multiplicity of tests – or, indeed, if you’re even aware of the multiple-test problem – then there are various corrections you can do to adjust your putative p values (the Bonferroni correction is undoubtedly the best-known and most widely used of these).  But if you don’t and if you only publish the one “significant” result (see the cartoon), then you are, whether consciously or unconsciously, lying to your readers about your test results.  Green jellybeans cause acne!!  (Who knew?)

If you do multiple tests and include all of them in your publication (a very rare practice), then even if you don’t do the Bonferroni correction (or whatever) your readers can.  But what if you do a whole bunch of tests and include only the “significant” result(s) without acknowledging the other tests?  Then you’re lying.  You’re publishing the apparent green jellybean effect without revealing that you tested 19 other colors, thus invalidating the p < 0.05 you achieved for the greenies.  Let me say it again: you’re lying, whether you know it or not.

This problem is greatly compounded by publication bias.  Understandably wanting your paper to be cited and to have an impact, you submit only your significant results for publication.  Or your editor won’t accept a paper for review without significant results.  Or your reviewers find “negative” results uninteresting and unworthy of publication.  Then significant test results end up in print and non-significant ones are filtered out – thus the bias.  As a consequence, we have no idea how to interpret your putative p value, even if we buy into the NHST approach.  Are you lying?  Do you even know you’re lying?

This problem is real and serious.  Many publications have explored how widespread the problem is, and their findings are not encouraging.  I haven’t done a thorough review of those publications, but a few results stick in my mind.  (If anyone can point me to the sources, I’d be grateful.)  One study (in psychology, I think) found that the probability of submitting significant test results for publication was about 75% whereas the probability of submitting non-significant results was about 5%.  (The non-significant results are, or used to be, stuck in a file cabinet, hence the alternative name for the bias.)  Another study of randomized clinical trials found that failure to achieve significance was the single most common reason for not writing up the results of completed trials.  Another study of journals in the behavioral and health sciences found that something like 85-90% of all published papers contain significant results (p < 0.05), which cannot even remotely reflect the reality of applied statistical testing.  Again, please don’t take these numbers too literally (and please don’t cite me as their source) since they’re popping out of my rather aged brainpan rather than from the original publications.

The multiple-test/publication-bias problem is increasingly being seen as a major crisis in several areas of science.  In research involving clinical trials, it’s making people think that many – perhaps most – reported results in epidemiology and pharmacology are bogus.  Ben Goldacre of “Bad Science” and “Bad Pharma” fame has been especially effective in making this point.  There are now several groups advocating the archiving of negative results for public access; see, for example, the Cochrane Collaboration.  But in my own field of biological anthropology, the problem is scarcely even acknowledged.  This is why N. T. Longford, the researcher cited in my previous post, called the scientific literature based on NHST a “junkyard” of unwarranted positive results.  Yet more reason to abandon NHST.

Tuesday, May 28, 2013

Who, me? I don't believe in single-gene causation! (or do I?). Part IV. Do we need the probabilistic hypothesis?

In the earlier posts in this series on the nature of genetic causation we showed that, while people would routinely say that they don't 'believe' in genetic determinism or single-gene causation, there are clear instances (that everyone recognizes) in which specific genetic variants do seem to have essentially deterministic effects relative to some outcome -- cystic fibrosis is one example, but there are many others.  The data aren't perfect -- there is always measurement noise and other errors, and prediction is rarely perfect -- but in this situation if an individual has a specific single-locus genotype s/he is essentially destined to have the trait it codes for.  Going back to Mendel's famous peas, there is a wealth of animal and plant data for such essentially perfect predictive genotypes.

Yet even there, there are subtleties, as we discussed in earlier installments.  Much of the time, the identified genotype at the 'causal' gene does not predict the outcome with anything resembling certainty.  There is usually evidence for factors that explain this other than lab errors, including variants in other genes, environmental exposures, and so on.

Since most traits seem to be complex far beyond these single-gene subtleties, and are affected by many genes (not to mention environmental exposures), genetics has moved to multilocus causal approaches, of which the most autopilot-like approaches are the omics ones like GWAS (and expression profiling, mirobiomics and many others).  These approaches rest on exhaustively enumerative technology and statistical (inherently and explicitly probabilistic) analysis to find associations that might, upon experimental follow-up, turn out to have a mechanistic causal basis (otherwise, the idea is that the association is a statistical fluke or reflects some unmeasured but correlated variable).  In this context, everyone sneers at the very idea of genetic determinism.  But might that attitude be premature, even for complex traits?

Last week we tried to explain how probabilistic causation comes into the picture. Essentially, each tested variable--each spot along the genome--is screened for variants that are more common in cases than in controls (or in individuals with larger rather than smaller values of some quantitatively measured trait; taller, higher blood pressure, etc.).  But the association is rarely complete: there are cases who don't have the test variant, and unaffected controls who do.  That is a strange kind of 'causation' unless, like quantum mechanics, we invoke fundamental probabilism--but what would be the evidence for that and what does it mean?

The probability hypothesis in genetic causation
As the astronomer Pierre Laplace is famously quoted as saying to Napoleon when asked why his astronomical theory didn't invoke God's causal hand, "Sire, I had no need of that hypothesis." Likewise, we should ask whether, how, or where we have need for the probability hypothesis in genetic causation.  Is probabilism actually the best explanation for what we observe?  If so, how do we find it?

The conclusion is that a genetic variant that passes a statistical significance tests (a deeply problematic notion in itself) has some truly causative effect but one that is in some way probabilistic rather than deterministic.  We assign a 'risk' of the outcome to the presence of the risk factor.  Sample survey methods, which is what the studies are, essentially assume probabilistic outcomes, so of course the results look that way.  But how can a risk factor be probabilistic? We know of some essentially random processes, like mutation, but how do genetic alleles act probabilistically?  One issue is that statistical analysis is fundamentally about repeatable observations....but what does that mean?

In our previous installment, we likened views that seem to be held about multi-genic causation that result from such studies to a conceptual extension of single-gene causal views, but now transformed into single-genometype causation.  But there are serious problems because genomewide, every individual we sample (or who has ever lived, or ever will live) is genomically unique!  And each person is either a case or a control, or has a specific blood pressure level, or whatever.  In that sense, in our actual data, we do not have probabilistic outcomes!  We have one, and only one, specific outcome for each individual.  Yet we don't want to confess to believing in genetic (or genomic) determinism, so we look at one site at a time and assign probabilistic risks to its variants, as if they were acting alone.

Last time we tried to explain serious problems we see with treating each site in the genome (as in, say, GWAS analysis) separately and then somehow trying to sum up the site-specific effects, each treated probabilistically, to get some personalized risk estimate.  We routinely blithely overlook the fundamental issues of the important, usually predominant, effects of environmental factors (most of which are unknown or not accurately estimable).  But, forgetting that huge issue for the moment, surprisingly, maybe deterministic notions of causation are not so far off the mark, after all.

The case for determinism 
The problem is that since in survey sample (epidemiological) research, everyone's genotype is unique we're forced to use probability sampling methods that decompose the genome into separate units, treating it as a sack of loose variants that we can sort through and whose effects we can measure individually.  But to answer the basic questions about genomic epistemology, we do not need to rely so fundamentally on such approaches. Unlike epidemiology, in which each individual is genomically unique, we actually, even routinely do have observation on essentially unrestricted numbers of replications of essentially the exact same genometype!  And that evidence shows a very high level of genetic predictive power, at least relative to given environments.  We have thousands upon thousands of inbred plants and animals, and they have a very interesting tale to tell.

A week or so ago we posted about the surprising level of trait variation due to environmental variation among inbred animals.  But in a standardized environment most inbred strains have characteristic traits, that are manifest uniformly or at least in far higher than few-percent frequency.  This is true of simple inbred strains, or of strains based on selection by investigators for some trait, followed by inbreeding.  (We also have some suggestive but much less informative data on human twins, but for which we only have two observations of each pair's unique genotype, and these are confounded by notoriously problematic environmental issues).

Inbred strains provide close to true replications of genotype-phenotype observations.  There is always going to be some variation, but the fact that inbred strains can be very well characterized by their traits (in a standardized  environment) reveals what is essentially a very high level of genetic--or genometypic--determinism!  So, is the sneering at genetic determinism misplaced?

Informative exceptions
A trait that, even in a standard environment in inbred animals, only appears in a fraction of the animals is particularly interesting in this light.  Such a trait might perhaps aptly be described in probabilistic terms.   We have repeated observations, but only some positive outcomes.  If the environment is really being held constant, then such traits provide good reason to try to investigate what it is that adds the probabilistic variation.  Here, in fact, we actually have a focused research question, to which we might actually get epistemically credible answers, unlike the often total lack of such controllable focus in omic-scale analysis. 

So, when the Jackson Labs description of the strain says the mice 'tend to become hypertensive' or  'tend to lose hearing by 1 year of age', that seems hardly environmental and we can ask what it might mean. Where might such outcome variation come from?  Somatic mutation and the continued influx of germline mutation over generations of the 'same' strain in the lab may account for at least some of it.  Unfortunately, to test directly for somatic mutational effects, one would have to clone many cells from each animal, and that would introduce other sources of variation. But there's a way around this, at least in part:  we can compare the fractions of the variable outcome found in my lab's C57 mice to the fraction of that outcome in the C57s in your lab; this gives you observations on the inbred germline several generations apart, during which some mutations may have accumulated.  If the percentages are similar, they may represent proper and truly (stochastic) probabilities, and we could try to see why and where those effects arise.  If they differ, and your environments really are similar, then mutation would be a suspect.   Such effects could be evaluated by simulation and some direct experimental testing.  It may not be easy, but at least we can isolate the phenomenon, which is not the case in natural populations. 

Not an easy rescue from GWAS and similar approaches!
So, Aha!, you say, inbreeding shows us that genotypes do predict traits, and now all we have to do is use these mice to dissect the causal genes.  Won't that in a sense rescue GWAS approaches experimentally?  Unfortunately, not so!

Interestingly, and wholly consistent with what we've been saying is that if you intercross just two inbred strains, each with its own unvarying alleles so that at most all you have is two different alleles at any spot in the genome (in those sites where the two strains differ), you immediately regenerate life-like complexity:  each animal differs, the intercross population has a typical distribution of trait values (e.g., a normal distribution).  And if you want to identify the genotypic reasons for each animal's trait, you no longer have clonal replicates but must resort back to statistical mapping methods involving the same kinds of false replicability assumptions, and you get the same kinds of complex probabilistic effects, that we see in genomewide association studies in natural populations.  You don't find a neat set of simple additive effects that account for things.

In a sense, despite the simple and replicable genometypic determinism of the parental strains, each intercross offspring has its own unique genometype and trait. This means that the integrated complexity is built into the genometype and not as a rule dissectable into simple causative sites (though as in humans, the occasional strong effect may be found).

Yet, if you took and inbred (repeatedly cloned) each animal in this now-variable intercross population, you would recapture, for each new lineage, its specific fixed and highly predictable phenotype!  And among these animals essentially inbred from the varying intercross animals, with their individually unique genometypes, you would find the 'normal' distribution of trait values.  Each animal's genometype would be highly predictive and determinitive, but among the set you would have the typical distribution of trait values in the intercross population.

That is, a natural population can be viewed as a collection of highly deterministic genometypes with trait effects distributed as a function of the population's reproductive history.

This strong evidence for genometypic determinism might seem to rekindle dreams of personalized genometype prediction, and rescue omics approaches, but it doesn't do that at all.  That's because in itself even a 2-way intercross mapping does not yield simple (or even much simplified) genomic answers for the individuals'  specific trait variants.

Another kind of data is revealing in this light -- transgenic animals: the same transgene typically has different effects in different inbred host strains (e.g., C57, DBA, FVB, ... mice), with the strain-specific effect largely fixed within each strain.  The obvious reason is unique sets of variants in each host strain's background genometype.

Is it time to think differently and stop riding the same horse?  There's no reason here to think anything will make it easy, but there are perhaps nuggets that can be seized and lead to informative approaches and new conceptual approaches. Perhaps we simply have to force ourselves to stop thinking in terms of a genome as a sackful of statistical units, but instead as an integrated whole.

At least, inbreeding reveals both the highly predictive, and hence plausibly determinative, power of genometypes, and provides enormous opportunity for clever research designs to explore genomic epistemology without having to rely on what are the basically counter-factual assumptions of statistical survey analysis.  We can try to work towards an improved actual theory of genetic causation.

But we can't dream of miracles.  Environments could vary, even in the lab as they do in life, and in this sense all bets are off, a topic about which we've posted before.  There seems zero hope of exhaustive DNA enumeration and enumeration of all non-DNA factors, especially in human populations.

At least, things to think about
We hope we are not just luxuriating in philosophical daydreaming here.  There is a lot at stake in the way we think about causation in nature, both intellectually and practically in terms of public resources, public health, disease prediction, and so on. It's incumbent upon the field to wrestle with these questions seriously and not dismiss them because they are inconvenient to think about.

In some ways, the issues are similar to those involved in quantum physics and statistical mechanics, where probabilities are used and their nature often assumed and not questioned, the net result being what is needed rather than an enumeration of the state of each specific factor (e.g., pressure, in the case of the ideal gas law).  The human equivalent, if there is one, might be not to worry about whether causation is ultimately probabilistic or deterministic, nor to have to estimate each element's probabilistic nature specifically, but to deal with the population--with public health--rather than personalized predictions.  But this goes against our sense of individuality and the research sales pitch of personalized genomic medicine.

At least, people should be aware of the issues when they make the kinds of rather hyperbolic claims to the public about the miracles genomics is claiming to deliver, based on statistical survey epistemology.  Genomes may be highly predictive and determinitive, at least in specific environments, but the lack of repeatable observations in natural populations raises serious questions about the meaning or usefulness of assuming genetic determinism.

Monday, May 27, 2013

xkcd on significance testing


Significant


xkcd: Significant

Jim Wood's May 15 post, "Let's Abandon Significance Tests", illustrated.  As he says, "the real importance of the cartoon has to do with the whole issue of publication bias: only the green jelly bean test gets written up, submitted, and published (and turned into a press release). Hence (since the other tests remain invisible) no multiple-test correction is ever done or even suggested by editors or reviewers, who don't know about the other 19 tests. This is a serious problem that arguably didn't get enough attention in my post."

Friday, May 24, 2013

Who, me? I don't believe in single-gene causation! (or do I?). Part III. Probabilistic multifactor causation--what do we mean?

In the first two posts in this series we've discussed the notion of single-gene causation.  We don't mean the usual issues about genetic determinism per se, which is often a discussion about deeper beliefs rather than a biological one.  We are asking what it means biologically to say gene X causes disease Y.  Is it ever right to use such language?  Is it closer to right in some cases than others?

As we've tried to show, even those cases that are considered 'single gene' causation usually aren't quite.  A gene is a long string of nucleotides with regions of different (or no known) function.  Many different variants of that gene can arise by mutation so either the CFTR nor BRCA1 gene as a whole actually causes cystic fibrosis or breast cancer.  Instead, what seems to be meant in these cases is that the disease in someone carrying the variant has the disease because (literally) of that variant. Of course, that implies that the rest of the gene is working fine, or at least is not responsible for the disease. That is, only the particular short deletion or nucleotide substitution causes the disease and no other factors are involved. Even in BRCA and cancer, the 'real' cause is the cellular changes that the BRCA1 variant allows to arise by mutation. Thus, if you think at all carefully about it, it's clear that we know that the idea that gene X causes disease Y is a fiction most if not all of the time, but this can just be splitting hairs, because there are clear instances that seem reasonable to count as 'single gene' causation.   We'll come back to this.

But in this sense everyone 'believes in' single-gene causation.  But most of us would say that this is only when the case is clear....that is we believe in single-gene causation when there is single-gene causation, but not otherwise.  That's a rather circular and empty concept. Better to say that everyone believes that single-gene causation applies sometimes.

More often these days, we know that there is not one nucleotide, nor even one single gene, that causes a trait by itself.  There are many different factors that contribute to the trait.  But here we get into trouble, unless we mean multiple univariate causation, in which there are n causes circulating in the population, but each case arises because of only one of them, carried by the affected individual and the effect is due, in the usual deterministic sense, to that single factor in that person. Type 1 diabetes, nonsyndromic hearing loss, and the vision problem retinitis pigmentosa are at least partial examples.  But this doesn't seem to be the general situation.  Instead, what one means by multifactorial causation is that each of the 'causes' contributes some amount to risk, to the overall result, usually expressed as a risk or probability, of the outcome.  This seems clearly unlike deterministic, billiard-ball causation, but it's not so clear that that's how people are thinking.

Here are some various relevant issues:
Penetrance
A commenter on our first post in this series likened the idea of contributory or probabilistic causes to 'penetrance', a term that assigns a probability that a given genotype would be manifest as a given phenotype.  That is, the likelihood of having the phenotype if you've got the causal variant.  In response to that comment, we referred to the term as a 'fudge factor', and this is what we meant: The term was first used (as far as we remember) in Mendelian single-locus 'segregation' analysis, to test whether a trait (that is, its causal genotype) was 'dominant' or 'recessive' relative to causing qualitative traits like the presence or absence of a disease.  The penetrance probability was one of the parameters in the model, and its value was estimated.

An important but unstated assumption was that the penetrance was an inherent, fixed property of the genotype itself: wherever the genotype occurred in a family or population, its penetrance--chance of generating the trait--was the same. There was no specific biological basis for this (that is, a dominant allele that is only dominant sometimes!) and it was basically there to allow models to fit the data.  For example, if even one Aa person didn't have the dominant (A-associated) trait, then the trait could simply not fit a fully dominant model, but there could be lots of reasons, including lab error, for such 'non-penetrance'.

A more modern approach to variable outcomes in individuals with the same genotype in question, which leads us to the issues we are discussing here and can be extended to quantitative traits like stature or blood pressure, is that the probability of an outcome, given that the carrier has a particular genotype, is not built into the genotype itself, but is context dependent.  That is what is implicitly assumed to give the genotype its probabilistic causal nature.  An important part of that context is the rest of the person's genome.

The retrospective/prospective epistemic problem
Deterministic causation is easy:  you have the cause, you have its effect.  Probabilistic causation is elusive, and what people mean is first, that they estimated a fractional excess of given outcomes in carriers of the 'cause' compared to non-carriers (e.g., a genotype at a particular site in the genome).  This is based on a sample, with all the (usually unstated) issues that implies, but is expressed as if there were a simple dice-roll: if you have the genotype the dice are loaded in one way, compared to the dice rolled for those without the genotype.  Again, as in penetrance, although one would routinely say this is context-dependent, each genetic 'effect size' is viewed essentially as inherent to the genotype--the same for everyone with the genotype.  If you do this kind of work you may object to what we just said, but it is a major implicit justification for increased sample size, meta-analysis and other strategies.

The unspoken thinking is subtle, and built into the analysis, but essentially invokes context-dependence with the escape valve of the ceteris parabus (all else being equal) explanation:  the probability of the trait that we estimate to be associated with this particular test variant is net result of outcomes averaged over all the other causes that may apply; that is, averaged over all other contexts.  As we'll see below, we think this is very important to be aware of.

But there's another fact: when it comes to such other factors, our data are retrospective:  we see outcomes after people with, and without, the genotype have been exposed to all the 'other' factors,  even if we don't know or haven't measured them, that applied in the past--to the people we've sampled in our study, to their range of contexts.  But the risk, or probability, we associate with the genotype is by its very nature prospective.  That is, we want to use it to predict outcomes in the future, so we can prevent disease, etc.  But that risk depends on the genotype bearers' future exposures, and we have literally no way, not even in principle, to know what those will be, even if we have identified all the relevant factors, which we rarely have.

In essence, such predictions assume no behavior changes, no new exposures to environmental contaminants and the like, and no new mutations in the genome.  This is a fiction to an unknowable extent!  The risks we are giving people are extrapolations from the past.  We do this whether we know that's what we're doing or not; with current approaches that's what we're stuck with and we have to hope it's good enough.  Yet we know better!  We have countless precedents that environmental changes have massive effects on risk.  E.g., the risk of breast cancer for women with a BRCA mutation varies by birth cohort, with risk increasing over time, as shown by Mary-Claire King et al. in a Science paper in 2003. Our personalized genomic probabilities are actually often quite ephemeral.

The problem of unidentified risk factors and additive models
Similarly, we know that we are averaging over all past exposures in estimating genotype-specific risks.  We know that much or even most of that consists of unidentified or unmeasured factors.  That makes future projection even more problematic.  And the more factors, the more ceteris paribus simply doesn't apply, not just in some formal sense, but in the important literal sense:  we just are not seeing the given test variant at a given spot in the genome enough times, that is, against all applicable background exposures, to have any way to know how well-represented the unmeasured factors are.

The problem of unique alleles and non-'significant' causes
A lot of our knowledge about genetic causation in the human sense is derived from large statistical survey samples, such as genomewide mapping studies.  Because so many individual genomic variant sites (often now in the millions, sometimes now the billions) are being tested, we know that by chance we may see more frequent copies of a given variant in cases than we see in controls.  Only some of the time would the variant be truly causal (that is, in some mechanistic sense), because we know that in sampling unequal distribution arises just by chance.

In order not to be swamped by such false positive signals, we typically are forced to use very conservative statistical significance tests.  That essentially means we intentionally overlook huge numbers of genetic variants that may mechanistically, in fact, contribute to the trait, but only to a small extent.  This is a practical matter, but unless we have mechanistic information we currently have no way to deal with things too rare, no matter that they may be the bulk of causation in the genome that we are trying to understand (the evidence for this is seen, for example, in the 'missing' heritability, that mapping studies can't account for).  So, for practical reasons, we essentially define rare small effects not to be real, even if overall they constitute the bulk of genomic causation!

Many genetic variants are unique in the population, or at least so rare that they occur only in one person or, perhaps, a few of the person's very close relatives--or never even in huge mapping samples, even if they occur in the clinic or among people who seek 'personalized genomic' advice.  Yet if some single-gene causal variants exist--as we've seen--there is no reason not to think that the same variants with essentially single-site causal effects, arising as new or very recent mutations, must be very numerous.

A unique variant found in a case might suggest 100% causation--after all, you see it only once but in an affected person, don't you?  Likewise, if in an unaffected, you might attribute 100% protection to it.  Clearly these are unreliable if not downright false inferences.  Yet most variants in this world are like that--unique, or at least very rare.  So we are faced with a real epistemic problem.

Such variants are very hard to detect if probabilistic evidence is our conceptual method, if probabilistic evidence relies, as significance tests do, on replication of the observation.  Some argue that what we must do is use wholegenome sequence in families or other special designs, to see if we can find them by looking for various kinds of, again, replicated observation.  Time will tell how that turns out.

The intractability of non-additive effects (interactions), even though they must generally occur
We usually use additive models when considering multiple risk factors.  Each factor's net effect, as seen in our sample is estimated using the ceteris paribus assumption that we're seeing it against all possible backgrounds, and we estimate an individual's overall risk as a combination, such as the sum of these independently estimated effects.  But they may interact--indeed, it is in the nature of life that components are cooperative (in the MT sense; co-operative), and they work only because they interact.

Unfortunately, it is simply impossible to do an adequate job of accounting for interactions among many risk factors.  If there are, say, 100 contributing genes (most complex trait involve hundreds, it appears), then there are (100 X 99)/2=4950 possible interactions just counting those that involve only two factors.  And interactions need not be merely multiplicative.  Factor A squared times Factor B may be the way things work.  Worse, usually many, often tens, of gene products interact to bring about a function such as to alter gene expression.

We simply have no way to estimate or understand such an open-ended set of interactions.   So our 'probabilistic' causation notions of multiple causes is simply inaccurate, but to an unknown if not unknowable extent.  Yet we feel these factors really are, in some sense, causative.  This is not just a practical problem that should plague genotype-based risk pronouncements (and, here, we're ignoring environmental factors to which the same things apply), but is a much deeper issue in the nature of knowledge.  We should be addressing it.

What about additive effects--does the concept even make sense?  Genes almost always work by interaction with other genes (that is, their coded proteins physically interact).  So, suppose the protein coded by G1 binds to that coded by G2 to cause, say, the expression of some other gene.  Variants in G1 or G2 might generally add their effects to the resulting outcome, say, the level of gene expression which, if quantitative can raise the probability of some other outcome, like blood pressure or the occurrence of diabetes.  So the panoply of sequence variants in G1 and similarly G2, would generate a huge number of combinations, and this might seem to generate some smooth distribution of the resulting effect.  It seems like a fine way to look at things, and in a sense it's the usual way.  But suppose a mutation makes G2 inactive; then  G1 cannot have its additive effect--it can't have any effect.  There is, one could say, a singularity in the effect distribution. This simple hypothetical shows why additive models are not just abstract, but must be inaccurate to an unknowable extent.

The concept of a causal vector:  pseudo-single gene causation thinking?
Now, suppose we identify a set of some number, say n, of causal factors each having some series of possible states (like a spot in the genome with two states, say, T and C) in the population.  From this causal pool, each person draws one instance of each of the n factors.  Whether this is done 'randomly' or in some other way is important....but largely unknown. 

This set of draws for each person can be considered his/her 'vector' of risks:  [s1, s2, s3.....sn].  Each person has such a vector and we can assemble them into a matrix or list of the vectors in the actual population, the possible population of draws that could in principle occur, or the set we have captured in our study sample.  Associated with this vector, each person also has his/her outcome for a trait we are studying, like diabetes or stature. Overall, from this matrix we can estimate the distribution of resulting outcome values (fraction 'affected' or average blood pressure and its amount of variation).  We can look at the averages for each independent factor--this is what studies do typically).  We can thus estimate the factor-specific risk, and so on, as represented in the sample we have.

However, no sample or even any whole population can include all possible risk vectors.  If there were only 100 genetic sites (rather than the common GWAS finding of hundreds or thousands), each with 3 states, any person can have (AA, Aa, aa), there would be 3 to the hundredth power (5 x 10 to the 47th) possible genotypes!  Each person genotype is unique in human history.  Our actual observations are only a miniscule sampling--and next generation's (whose risk we purport to predict) will be different from our generation's.  The ceteris paribus assumption of multivariate statistical analysis that justifies much of the analysis is basically that this totally trivial sample captures the essential features of this enormous possible background variation. 

Each person's risk vector is unique, and we don't observe nearly all possible vectors (even that could be associated with our trait).  But since we know that each person is unique, how can we treat their risk probabilistically?  As we've tried to describe, we assign a probablity, and its associated variation among people, with each risk factor essentially as the set of those risk vectors that are in the 'affected' category.  But that has nothing to do with whether the individual factor whose risk we're providing to people is deterministic, once you know the rest of the person's vector, or is itself acting probabilistically.

If the probabilistic nature of our interpretation means that for your state at the test site, over all the backgrounds you might have and that we've captured in our studies, this fraction of the time you would end up with the trait, then that is actually an assumption that the risk vector is wholly deterministic.  Or that any residual probabilism is due to unmeasured factors.

Surprisingly, while this is not exactly invoking single-gene causation, it's very similar thinking.  It uses other people's exposure and results to tell us what your results will be, and is in a way at the heart of reductionist approaches that assume individual replicability of causal factors. Your net risk is essentially the fraction of people with your exact risk vector who would have the test outcome.  The only little problem with that is that there is nobody else with the same set of risk factors!

It may be that nobody believes in single-gene causation literally, but in practice what we're doing is to some extent effectively, conceptually, and perhaps even logically similar.  It is a way of hoping for fixed, billiard-ball (or, at least, highly specific distributional) causation: given the causal vector, the outcome will occur.  In the same way that single-gene causation isn't really single genes (i.e., a mutation in a specific nucleotide in the CFTR gene is treated causally as if it were the whole gene, even though we know that's not true), we tend now to treat a vector of causes, or more accurately perhaps each person's genome as a deterministic genetic cause.  The genome replaces the gene, in single-cause thinking.  That, in essence, is what 'personalized genomic prediction' really is promising.

Essentially, the doubt--the probabilistic statement of risks--arises only if we consider each factor independently.  After going through the effort to genotype on a massive scale, the resulting risk if we tried to estimate the range of outcomes consistent with the estimated probability (the variance around the point estimate), would in most cases be of very little use, since what each of us cares about is not the average risk, but our own specific risk.  There are those causal factors that are so strong that these philosophical issues don't matter and our predictions are, if not perfect, adequate for making life and death decisions about genetic counseling, early screening, and so on.  But that is the minority of cases that addicts us to the approach.

This is to us a subtle but slippery way of viewing causation that, because it is clothed in multi-causal probabilistic rhetoric seems to be conceptually different from an old-fashioned kind of single-gene causation view, but that leads us to avoid facing what we know are the real, underlying problems in our notions of epidemiological and genetic causation.  It's a way of justifying business as usual, that scaling up will finally solve the problem....and avoiding the intimidating task of thinking more deeply about the problem if, or when, more holistic effects in individuals, not those averaged over all individuals, are at work.

What we have discussed are basically aspects of the use of multiple variable statistics, not anything we have dreamed up out of nowhere.  But they are relevant, we believe, to a better understanding of what may be called the emergent traits of biological creatures, relative to their inherited genomes.

Let's think again about genetic determinism!
On the other hand, we actually do in fact have rather clear evidence that genetic determinism, or perhaps genomic determinism, might be true after all.  Not only that, we have the kind of repeatable observations of unique genotypes, that you want to have if you accept statistical epistemology!  And we have such evidence in abundance.

But this is for next time.

Thursday, May 23, 2013

Who, me? I don't believe in single-gene causation! (or do I?). Part II. Probabilistic causation--what do we mean?

Yesterday we discussed notions of determinism and complex causation, and the denial that every self-respecting scientists tries to make that s/he doesn't believe in deterministic causation.  They don't 'believe' it because, in fact, most of the time and with some possible exceptions in the physical sciences, determinism is at best inaccurate and to an unknown extent.  Knowing a putatively causal factor doesn't lead to anything close to perfectly accurate prediction (in biology).  Yet of course we believe this is a causal rather than mystic world, and naturally want to identify 'causes', especially of things we wish to avoid so that we can avoid them.  Being only human, accepting often rather unquestioningly the methods of science, and for various other reasons, we tend to promise miracles to those who will pay us to do the research to find them.

It is annoying to some readers of MT that we're constantly harping on the problem of determinism (and exhaustive rather than hypothesis or theory-driven approaches to inferring to causation).  Partly this annoyance is to be expected, because our criticisms do challenge vested interests.  And partly because if people had better ideas (even if one granted that our points are cogent), they might be trying them (if they thought they were fundable in our very competitive funding system). 

But largely, as a scientific community, we don't know seem to know what better to do.  But the issues about causation in this context seem very deep and we think not enough people are thinking seriously enough about them, for various pragmatic and other reasons. They aren't easy to understand. If we're right, then the critical test or experiment, or truly new concept, just doesn't seem to be 'in the air' the way, say, evolution was in Darwin's time, or relativity in Einstein's. 

Yesterday we described various issues having to do with understanding 'single-gene' diseases like Cystic Fibrosis or familial breast cancer, or smoking-induced lung cancer.  Our point was that causation isn't as simple as 'the' gene for disease, or 'the' environmental risk factor.  Indeed, instead of deterministic billiard-ball like classic cause-effect views, we largely take a sample-based estimation approach, and what we generally find is that (1) multiple factors--genes or environmental--are associated with different proportions of test outcomes in those exposed compared to those not exposed to a given factor, (2) few factors by themselves account for the outcome in every exposed person, (3) multiple different factors are found to be associated with a given outcome, and yet (4) the known factors do not account for all the instances of that outcome.

In the case of genetics, our particular main interest, there are reasons to think that inherited factors cause our biological traits, normal as well as disease, but when we attempt to identify any such factors by our omics approaches (such as screening the entire genome to find them), most of the apparent inherited causation remains unidentified.  So we have at least two different kinds of issue:  First, causation seems probabilistic rather than deterministic, and second, the methods themselves, rather than the causal landscape, affect or even determine what we will infer.  For example, when we use methods of statistical inference, such as significance testing, we essentially define influences that don't reach significance in our data as not being real.

Even though we have believable evidence that the collection of a great many minor factors account, somehow, for the bulk of the outcomes of interest, we keep investing large amounts of resources in approaches that are not getting us very far.  It is this which, we think, justifies continual attempts to pressure people to think more creatively and at least spend resources in ways that might have a better chance of inspiring somebody.  We try not to go overboard in voicing our comments, but of course that's what a blog is for--a forum for opinion, hopefully rooted in fact.

What the current research approach gets most of the time is not statistical associations between factor and outcome that can then be shown to be deterministic in a billiard-ball mechanistic sense, that is, in which a statistical exploration leads to a 'real' kind of cause.   That does happen and we mentioned some instances yesterday, but it's far from the rule.  The general finding is some sort of fuzzy probability that associates the presence of the putative causal factor--say, a given allele at a gene--and an outcome, such as a disease.  So, the explanation, which we think is fair to characterize as a retreat, is to provide what sounds like a knowing answer that the cause really is 'probabilistic'.  But what does this actually mean, if in fact it isn't just blowing smoke?  It is in this context that, we think, the explanations are in a relevant sense in fact deterministic, even if expressed in terms of probabilities or 'risk'.

And besides, everybody knows that causation is probabilistic!  So there!
Now what on earth does it mean to say that causation is probabilistic?  In technical terms, one can invoke random processes and, often (sometimes not in a self-flattering way) relate this to the apparently inherent probabilistic nature of quantum physics.  In my amateurish way, I'd say that the latter replaced classical billiard-ball notions of causation which were purely deterministic (if you had perfect measurements) with truly probabilistic causation even if you had perfect measurement.

Probably those who invoke the everyone-knows sense of causation being probabilistic is referring to our imperfect identification or measurement of contributing factors.  But then, to make predictions based on genotypes, we need to know the 'true' probability of an outcome given both the genotype and the degree of imperfect measurement.  The latter is estimated from data, but almost always in the ceteris paribus notion--that is, the probability of cancer given a particular BRCA1 mutation is p, averaging over all the range of other genetic and environmental variants that carriers of the observed genetic variant are individually exposed to.

But we must think carefully about where that p comes from.  It is treated exactly as if we were flipping a coin, but with probability p of coming up Heads rather than a probability of 1/2.  For quantitative traits like stature or blood pressure, some probability would be assigned to each possible trait value of a person carrying the observed genetic variant, as if these values were distributed in some fashion, say, like the usual bell-shaped Normal distribution around the mean, or expected effect on the trait for those who carry the variant.  Her probability of being 5'3" tall is .35 but .12 of being 5'6", e.g.

Sounds fine.  But how well known, or even how real, are such assumed, fixed, underlying probabilities?

Plausibility is not the same as truth
The probabilistic nature of smoking and lung cancer, or even BRCA1 variants and breast cancer, can be accounted for in meaningful ways related to the random nature of mutations in genes.  Of course, we could be wrong even here, since there may be other kinds of causal processes, but at least we don't feel a need to understand every molecular event that may generate a mutation.  We don't need to get into the molecular details or quantum mechanical vagaries to believe that we have a reasonable understanding of the association between BRCA1 mutations and breast cancer.  In such situations, a probabilistic causation makes reasonable sense, even if it doesn't lead to a clean, closed, classically deterministic billiard-ball causal way.

But that is not the usual story.  When we face what seems to be truly complex causation, in which many factors contribute, as in environmental epidemiology or GWAS-like omics studies, we face something different.  What do we do in these situations?  Generally we identify many different possible causal factors, such as points along the genome or dietary constituents, and estimate their individual statistical association with the outcome of interest (diabetes or Alzheimer's disease, etc.).  First, we usually treat them as independent, and do our best to avoid being confused by confounding; for example, we only look at variants along the genome that are themselves not associated with each other by what is called linkage disequilibrium.  Then, we see, factor by factor, whether its variation is statistically associated with the outcome, as described in our third paragraph above.  We base this on a statistical cutoff criterion like a significance test (not our subject today, but again, see Jim Wood's earlier post on MT).

Given a 'significant' result, we then estimate the effect of the factor in probabilistic ways, such as the probability of the outcome if you carry the genetic variable (for quantitative traits, like stature or blood pressure, there are different sorts of probabilistic effect-size estimates).  Well, this certainly is different from billiard-ball determinism, you say.  Isn't it a modern way to deal with causation, and doesn't that make it clear that careful investigators are in fact not asserting determinism or single-gene causation?  What could be clearer?

In fact, we think that, conceptually, what is happening is basically an extension of single-cause, or even basically deterministic thinking.  As one way to see this, if you identify a single risk factor, such as the CFTR gene in which some known variants are very strongly associated with cystic fibrosis, then if you have a case of CF and you find a variant in the gene, you say that the variant caused the disease.  But for many if not most such variants, the attribution is based on an assumption that the gene is causal and that therefore, in this particular instance the variant was the cause.  There are some rather slippery issues, here, but they get clearer (we think) and even much more slippery, when it comes to multiple-factor, including multi-gene, causation.  We will deal with that tomorrow.

Wednesday, May 22, 2013

Who, me? I don't believe in single-gene causation! (or do I?). Part I. What does it mean?

We were told in no uncertain terms the other day that no one believes in single gene causation anymore.  Genetic determinism is passé, and everyone knows that most traits are complex, caused by multiple genes, gene x environment interaction, or if they're really sophisticated, epigenetics or the microbiome. But is this the view that investigators actually follow?  That's not so clear.

We all throw around the word 'complex' as if we actually believe it and perhaps even understand it.  Of course, (nearly) everyone recognizes that some traits are 'complex' meaning that one can't find a single clear-cut deterministic cause, the way being hit on the side of the head with a baseball bat by itself can send one out for the count.  But in fact the hunt for the gene (or, alright if you insist!) genes 'for' a trait is still on. You name your trait: cancer, diabetes, IQ, ability to dunk a basketball, or get into Harvard without a Kaplan course, and someone's still looking for the gene that causes it.

It is this push to find genes for traits, despite all sorts of denials, that fuels the GWAS and similar fires.  Caveats notwithstanding (and usually offered just to provide technical escape lest one is wrong), that is what the promised 'personalized genomic medicine' in its various forms and guises is all about.

So let's take a careful, and hopefully even thoughtful look at the idea of genetic causation.  We are personally (it must be obvious!) quite skeptical of what we think are excessive claims of genetic determinism (or, now, microbiomial determinism), but they are still being made, so let's tease a few of them apart.

Single gene causation does exist, at least sometimes (doesn't it?)!
First, though, what do we mean by the word 'causation'?  Generally, we think people mean that gene X or risk factor Y is sufficient to cause trait Z.  But, it might also mean that gene X or risk factor Y are necessary but not sufficient causes of trait Z.  The baseball bat might have caused your concussion, but in fact someone had to swing it.

There are well-documented single risk factors, genetic and otherwise, that everyone accepts 'cause' some disease in a very meaningful sense. Examples are some alleles (variant states) of the CFTR gene and Cystic Fibrosis (CF), BRCA1 and 2 variants and breast cancer, or smoking and lung cancer.  Having the alleles associated with CF or breast cancer, or being a long time smoker do put people at high risk of disease.

But, for these and other examples there are usually healthy people walking around with serious mutations in the gene, or heavy smokers who enjoyed their cigarettes well into old-age.  Gene X or factor Y aren't sufficient to cause trait Z.  There are also hundreds of genetic variants found in patients that are assumed to be causal, but for elusive reasons (for example, mutations in non-coding regions near to the CFTR coding regions that have no known function).  What is it about gene X that causes trait Z?  We don't know, but gene X looks damaged in this person, so it must be causal.

The CF case is interesting.  This is an ion channel disease.  Ion channels are gated openings on the cell surface that pass sodium, potassium, calcium and other ions into and out of the cell in response to local circumstances.  CF is characterized by abnormal passage of chloride and sodium through ion channels, causing thick viscous mucous and secretions, primarily in the lungs but with involvement of the pancreas as well.  If the ion channel is badly built, or doesn't get to the surface of cells lining various organs like pancreas and lungs, then the cell cannot control its water content, secretion, or absorption, and the person with the malfunctioning channels has CF.  Again, gene X causes trait Z.

But there are gradations in channel malfunction, and gradations in severity of the disease, and we have no way to know how many people are walking around with variants but no actual disease.  Here, we can say that when it happens, CFTR mutations do cause the trait in the usual way.  But what about when there are mutations but no disease?  Gene X doesn't cause the disease after all?  Or disease and none of the known causal mutations?  Wait, we thought gene X caused the disease?  Could we be assuming single gene causation, and looking only at the CFTR gene, rather than at many other aspects of the genome that may affect ion channels in the same cells or, indeed, may cause the trait in a way we could understand if we but identified them?  This is an open question--but it applies to many other purportedly single-gene diseases.  Gene X and some other gene/s, or some environmental factor cause the disease in at least some instances.  Is it simple or isn't it?

The BRCA story is also interesting.  A BRCA1 variant associated with disease does not lead directly to cancer.  Instead, BRCA1 is a gene that detects and repairs genomic mutations in breast (and other) cells.  If you have a dysfunctional BRCA1 genotype, you are at risk of some one breast cell acquiring a set of mutations that don't get detected and repaired.  What causes those mutations?  Some happen when cells divide, so the activity of breast cells affects the rate of mutational accumulation.  Other lifestyle factors do as well (parity, age of childbearing, lactation and apparently things like diet and exercise). And a person with a causal BRCA mutation lives perfectly healthfully for decades, which if you think in classical Mendelian terms, would not happen if s/he had a 'bad' gene.  BRCA doesn't exactly cause cancer, but it allows it to be caused.  Gene X plus time plus environmental risk factors cause the disease.  Though, we all believe it's a single gene, BRCA1 or 2, that causes cancer.

The obvious non-genetic instance, smoking and lung cancer, is similar but not exactly the same.  Smoking is, among other things, a mutagen: it damages genes.  So one, if not the major, reason for the association is that the mutations caused by smoke can damage genes in lung cells that lead those cells to proliferate out of control.  The reason the risk is probabilistic -- that is, a smoker doesn't have a 100% chance of getting lung cancer -- is that it's impossible to know how many or which mutations a given person's smoking has led to.  In fact, smoking is only an indirect cause, since it is mutant genes in lung cells that, after accumulating in an unlucky way, start the tumor.  Still, in this case, knowing how much a person has smoked can allow one to estimate in some probabilistic way the relative risk of lung cancer due to enough mutations having arisen in at least one lung cell. Still, many who smoke don't get cancer, and many get cancer who don't smoke.  Since smoking, and a few other such risk factors (e.g., exposure to asbestos, and some toxic chemicals) have strong effects,  even if probabilistic, everyone is generally comfortable with thinking of them as causal.  Risk factor Y causes trait Z.  But, in fact, risk factor Y plus time plus unlucky mutations cause trait Z.

Deterministic genotype 'pseudo-single-gene' causation?
More problematic are 'complex' traits, that clearly are not due to simple single gene variants--they don't follow patterns of trait appearance in families that would be consistent with simple Mendelian inheritance.  The number of such traits is legion, and is driving the GWAS industry as we have many times commented on.  They are typically common in the population.  They are complex because we feel--know, really--that many different factors combine somehow to generate the risk.  Most instances are not due to one factor alone, though some may be in the sense of BRCA and CFTR.  So, we do something like genomewide association studies to try to identify all the potentially causal parts of the genome (and, similarly, but less definitively as a rule, lifestyle factors, too).  Here, we'll assume that all of the genomic variants that might contribute are known and can be typed in every person (this is, as everyone knows, far, far from being true at present, if it's even possible).

Advocates can deny any element of single-gene thinking in GWAS reports, where hundreds of loci are claimed to have been found, but these are treated as causal, and major journals are filled to the gills with papers with titles to the effect of "five novel genes for xyz-itis".  This is the slippery slope of simple causation thinking.

If multiple factors contribute, what we know is that most do so only probabilistically in the above senses.  That there are other things at work is rather obvious for the many traits that even in those at high risk don't arise until later or even late in life.  And, in reality, it is nearly always true that cases of the disease are associated with individually unique combinations of the risk factors.  So, your personalized risk is computed as some kind of combination, like the sum, of your estimated risk at each of the putative sites, R= R1+R2+R3...., where at each site one allele is given the minimal risk (or, perhaps, zero) and the other allele the risk estimate by the difference in prevalence of the trait in cases vs in controls.  This might be considered the very opposite of single-gene causation, but conceptually it's pretty much the same, because it treats your aggregate as a single kind of risk score, as if it were acting as a unit.  The idea would be completely analogous to the risk associated with a specific variant at the CFTR or BRCA gene.  Your genotype as a whole would be viewed essentially as a single cause.

These are examples of causal rhetoric.  But these causes are probabilistic.  What does that mean?  It means that you are not 100% protected from getting the trait, nor 100% doomed.  Your fortune is estimated by some number in between.  We call it a probability, estimated either from the presence of one risk-factor or from the fixed set that you inherited.  But what does that probability mean, and how do we arrive at the value and how reliable is it?  Indeed, how often can we not even know how reliable it is?

These are topics for tomorrow.

Tuesday, May 21, 2013

Microbiomes R Us -- another form of science marketing

The microbiome hits the big time 
A piece by food writer/journalist Michael Pollan, "Some of My Best Friends Are Germs", was the cover story of the New York Times magazine on Sunday.  Pollan says the interest he developed in fermented foods while he was writing his latest book -- beer, kimchi, cheeses -- naturally led into an interest in the fermentation that goes on in our large intestines with the help of resident microbes, and this led him to think generally about the interaction between microbes and us.

The current estimate is that microbial cells in and on our body outnumber our own cells ten to one.  The Human Microbiome Project, funded by the NIH, as yet another Big Science do it all and think about it later investment, was launched in 2008 with the goal of sequencing the microbes in the nasal passages, oral cavities, skin, and the gastrointestinal and urogenital tracts of a fairly small sample of men and women.  The project was completed last spring with sequences from samples from more than 240 people (completed means the authors can now go to the press and demand even more money because 'more research is needed' before we understand anything.....).

But, that's not the end.  In a drive to document the microbiome of America, The American Gut Project will sequence your microbiomes for $99.  It's an open source, open access project which allows participants to compare their data with data from people around the world. 

Why?  Because the microbiome is the new genetics.  Forget (our own) genes, the idea is that we are who we are, healthy or sick, because of the microbes we share our bodies with. Well, of course, it's the microbes' genes, so it's really just more genetics.  As Pollan puts it,
To the extent that we are bearers of genetic information, more than 99 percent of it is microbial. And it appears increasingly likely that this “second genome,” as it is sometimes called, exerts an influence on our health as great and possibly even greater than the genes we inherit from our parents.
Our microbiome may make us fat or keep us thin, predispose us to diabetes or heart disease or asthma and allergy. And, the microbiome apparently influences our immune system and trains it in how it responds to the world, and may be responsible for the increase in autoimmune diseases in the West. Indeed, Pollan writes of "an impoverished 'Westernized microbiome'" -- some researchers suggest that the microbiota in our gut should be restored to look more like the microbiomes of people who eat less processed food and take fewer antibiotics (and, by the way, are more likely to die early of infectious diseases than we are). 

But, happily for us, changing our second genome is going to be a lot easier than changing our first one.  We do it all the time, through our diet, the medicines we take, the people (and their microbes) we come into contact with.  Happily for Big Pharma and Big Food, once we know which microbes are best for us, they'll be able to sell you any number of products to reverse these changes and reduce your chances of getting all those diseases geneticists have been trying to find the causes of for so long.  You'll have personalized microbiomalgenomic medicine.  Enough to put another smile on Francis Collins' funds-securing face!

There has been a lot of talk lately about 'fecal transplants' which are just what they sound like -- the transfer of fecal bacteria from healthy people into the colons of unhealthy people, primarily people with Clostridium difficile infections, intestinal infections resistant to antibiotics.  And yes, there is a lot of information on the web for those who want to learn to do this kind of self-medicating at home.  

Overpromising yet?
Pollan (whose qualifcation for writing this piece is that he is a food journalist whose writing sells well) says a few times that people involved in researching the microbiome don't want to make the same mistake that geneticists did with the Human Genome Project, overpromising the extent to which their work will lead to the cure for everything that ails us.  But, if as Pollan claims, the microbiome community is actually talking about the Grand Unified Theory of Chronic Disease, there's apparently a fine line between hype and overpromising -- not to mention borrowing from physics' line that is used to justify Hadron.  And there are certainly plenty of 'probiotic' options in the grocery store these days, so someone's jumping the bandwagon. 

All satire aside, the point isn't that we doubt that there's a connection between microbes, health and disease.  Nor that there are indeed lots of nuggets of grain in the bin being described.  Instead, it's that these are early days yet, and the whole approach to this project is already sounding too reductionist for words.  Not only are microbiome researchers in danger of mimicking the genome project with overpromising to over-enumerate, but also by reducing everything to their favorite microbe (for 'microbe' we used to read 'gene'). The bugs for ability to play the bass or baseball, score well on IQ tests, or tolerate abuse with equanimity may be right around the corner.

Enough Just-So stories yet?
Indeed, the Just-So storytellers are already out in full force. Why are there complex carbohydrates in human breast milk, if babies can't digest them?  As Pollan puts it, "Evolutionary theory argues that every component of mother's milk should have some value to the developing baby or natural selection would have long ago discarded it as a waste of the mother's precious resources." So, it turns out they are there for a particular gut bacterium that breaks them down and uses them.  
“Mother’s milk, being the only mammalian food shaped by natural selection, is the Rosetta stone for all food,” says Bruce German, a food scientist at the University of California, Davis, who researches milk. “And what it’s telling us is that when natural selection creates a food, it is concerned not just with feeding the child but the child’s gut bugs too.”
And, we evolved this commensurate relationship with microbes because they evolve so much faster than we do, so can quickly evolve mechanisms to cope with new kinds of toxins in our environment and so forth.

And, the bacterium that causes ulcers and perhaps some stomach cancers, Helicobacter pylori, is an endangered species, writes Pollan.  But his informants tell him it shouldn't be.  H. pylori also has beneficial roles to play in our stomachs -- preventing acid reflux by regulating acidity, for example, which Pollan suggests they do to render the stomach inhospitable to competing microbes, or regulating levels of an appetite hormone -- and because they do these good things, they should be nurtured rather than killed off.  Why do they do both good and bad?  Well, they do the bad stuff when we're middle-aged or older, so Pollan's informant suggests "this microbe’s evolutionary role might be to help shuffle us off life’s stage once our childbearing years have passed."

Of course, among other curious aspects of this scenario, how something evolved to kill us off once we're no longer contributing genes to the human gene pool is not explained. In order for this to work, the fitness of the microbe that could do this would have to be increased by killing us, its host, and it doesn't work that way.

Stripping it down to the truth
Again, fine, it seems quite likely that our microbiome does make contributions to our health and disease.  That's interesting enough.  For various theoretical reasons, rapidly dividing microbes do present interesting evolutionary challenges, and there's no doubt that we, and our genomes, must respond successfully if we are to persist.  If infection, broadly defined, has early negative effects because of the host's genotype, then selection favoring the bacteria, and likewise selection favoring human resistance can both be strong.  Culture and climate and habits also contribute to this potentially very dynamic evolutionary mix.  Infectious diseases with strong effect on survival, like malaria and HIV and others have clearly demonstrable effects of this kind.

But that is not the same as invoking specific selective stories for complex, ephemeral, varying fluxes of bacteria, which must coexist as well as keep a host alive so they can stay alive.  It's not the same as inventing pat, closed Just-So stories about how this or that effect must have evolved, and it involves no subtleties that we know are applicable, including the range and mobility of humans, modes of transmission,  population size and so on.  We have had a difficult time, and often unsuccessful, in working out clear-cut examples of natural selection at work in humans, though our evolved defenses against malaria (which is not caused by bacteria but involve many relevant evolutionary issues) may be the best one. 

So why can't researchers, writers, the rest of us just concentrate on figuring out how and when microbes are harmful or beneficial, without the hyperbole, the suggestion that microbes now do everything that genes did not long ago, and the made-up stories about how this all evolved? 

This whole microbiome thing needs to slow down and let the science catch up.

Monday, May 20, 2013

Retirement harmful to health or... an uncertainty principle?

Years and years:  but who's counting?
Breaking news!  As reported by the BBC ("Retirement Harmful to Health"): "...the chances of becoming ill appear to increase with the length of time spent in retirement."  Even more astonishing, the effect is the same for men and women. 
The study, published by the Institute of Economic Affairs (IEA), a think tank, found that retirement results in a "drastic decline in health" in the medium and long term.
The IEA said the study suggests people should work for longer for health as well as economic reasons.
This is of course just as astonishing as the fact that having more birthdays increases your lifespan (someone must have won a Nobel prize for that discovery! or at least got a headline story in the NY Times Science supplement).

Retirement is, of course, highly correlated with aging, which is, obviously, highly correlated with length of retirement and, of course, aging is highly correlated with ill health.  Further, people still working but already in ill health are more likely to retire than people healthy and still able to work well into old age.  And, since the report considers mental as well as physical health, it's also relevant that people with an ill spouse may be more likely to retire, which may increase their chances of becoming depressed.  So if this study had reached any other conclusions than that retirement is correlated with ill health, that would have been worthy of headlines.

It turns out that the background to the report treats the question in a relatively nuanced way, even if the conclusions are much less nuanced. E.g., from the report:
...evidence suggests that poorer health increases the likelihood of retirement. When looking at health and retirement it is therefore very difficult to separate cause from effect. In addition, a plethora of variables that cannot be observed are likely to bias results in any empirical studies -- and it is difficult to predict the direction of the bias.
Further, "Theoretically, the impact of retirement on health is far from certain."  "Other mechanisms by which retirement can affect health appear equally ambiguous."  "...an observed correlation between retirement and health says nothing about causation."  "Overall, the most methodologically convincing research on the health effects of retirement is rather mixed. This is likely to be due to researchers employing different research strategies and data."

But, they do report, from interview data with 7000 - 9000 people after varying numbers of years of retirement, more self-reported mental illness, more prescription drug usage, more diagnosed physical problems, and so on among retired people than those still working, and finally conclude that retirement is harmful to health.  Indeed, the report is titled "Work Longer, Live Healthier" so there's no missing their point.

It's true that other studies have found positive effects of retirement, but the authors write, "The results have been cross-checked against the methodologies used in earlier research studies, and it has been found that the positive impact of retirement on health found in earlier studies is, at the very least, partly due to shortcomings in that research."

Well, in fact it's hard to know how to do a study that would properly answer the retirement/health question.  When poor health can 'cause' retirement and retirement can (says this report) 'cause' poor health, how do cause and effect get teased out? 

A study comparing cases and controls would be problematic; can 'controls' who are still working be assumed to match retired 'cases' if the variable being measured (health) can affect whether someone works or retires? And, aging and ill health are already highly correlated, as are retirement and aging.  So disentangling cause from effect is inherently difficult.  

And several other things.
The risks and histories clearly involve cultural and lifestyle factors.  These change all the time, and indeed are affected by stories like the current one that, in itself, might lead readers of the story not to retire, because they'll think they're committing suicide to do it.  And what about all sorts of other factors like smoking history, involvement in wars and economic crashes and their harmful effects, and who knows what else that affects our health and our attitudes on a daily basis?  One thing that is certain: our exposure to those things in the future, which would affect the issue of health after retirement, is uncertain, in principle: as any study that purports to project results into an unknown future environment, this study cannot have any easily knowable implications for years beyond the immediate future at best.

And, if you have a relative who died early from, say cancer or a coronary, you may be driven to retire early to have a chance at enjoying life before your number comes up, if you think your relative's experience reflects your own vulnerabilities.  Or conversely, if your father lived to 110, you might think you will too, so why not sock away a few extra years' pension funds, publish some more astonishing research papers, or whatever.  That is, what may be irrelevant or at least unmeasured factors can confound this type of study.  Even knowing that you don't have to retire, may affect what you decide.  Or seeing what happens to your peers as they drop out of the office and/or off their perch.

The analogy with quantum mechanics: the Heisenberg principle
In a sense, what we see here, at least potentially, is something like the phenomenon in quantum mechanics in which an electron or photon exists as a wave, until you measure it.  Then, it collapses to a point, but because you've measured, say, its location, you can no longer measure its momentum.  The reason is that the very act of observing and measuring it changes its behavior.  This, loosely speaking, is the Heisenberg uncertainty principle.

Here, too, there is a quite similar-seeming uncertainty principle:  the very act of doing the study and publishing its results will affect the future course of the very people whose future you're trying to predict with your data.  The relevant behavior of the people you studied, and others who read the research, is affected by the fact that you did the study.  How that alters behavior is uncertain and basically not knowable.

But, the bad news that retirement is harmful to people's health is good news for governments looking to save money. Raising the age at which people can begin to draw their pensions is one way to save a lot of money, because people will contribute to retirement funds for more years and draw it for fewer. We just wish the evidence were sturdier.