Statistically Speaking

The challenges of research and applying it as an individual

Apr 04, 2024

Ignoring the challenge the world is facing with respect to trust in “science” or the reporting of science in the media, there is a real challenge for science. No, this isn’t the replication crisis, fake or flawed studies or AI/bots outputting “research papers” it is the challenge of individuals in understanding and applying research to themselves.

The Challenges

Statistical vs Clinical/Meaningful Significance

Fundamentally science is about being less wrong, and within this, understanding chances that the observed result is not a spurious one. This leads us down the route of statistics - neither a strong point nor something I enjoy greatly but I will do my best here to be both accurate and helpful. To be overly simplistic, statistical significance boils down to the chance that the “real result” lies within the range of the results observed given the population sampled (we will get back to that in a second). For the record thresholds are usually less than 5% or less than 1% (or P values of >0.95 or >0.99) - don’t ask why this is just the convention.

Whilst this is great, we are relatively confident that the results we saw were '“real” so to speak, the question most non-researchers (clinicians, practitioners or indeed people living their lives like you are whilst reading this) ask is; “so what”? That’s usually not because they don’t understand as much as the clinical or perhaps “meaningful” significance of this may not be apparent (or present). For example, a new drug improving blood pressure by 5mmHg (ignore the units here they’re for completeness, this is just the unit for blood pressure) probably doesn’t have meaningful, clinical or real world impact for someone with high blood pressure - by definition these people would need to decrease blood pressure by somewhere between 10 and 20mmHg to go from high to normal blood pressure. Don’t get too hung up on the numbers as much as the concept here.

In short; just because we saw a “real” effect doesn’t make it particularly meaningful or useful in the real world, particularly when considering it in the context of costs (opportunity costs, financial costs etc). For more on lenses and decision making read this.

Field vs Lab

I don’t think there’s much debate that “the lab is not the field", the challenge though, is that this could bring into question the utility of lab data in the field at times. What do I mean? Well, how applicable is a very sterile lab scenario for real world happenings?
Many coaches or practitioners groan something to this effect regularly. Similarly, I have heard the groans of reviewers on articles about a lack of control of variables in real world studies.

To be very clear: this isn’t an “either/or”, it’s pointing out the absolutely necessity for “both/and” in the scientific realm.

We absolutely need the mechanistic and granular lab work, but it needs to go hand in hand with real world, bigger data sets to allow for an understanding of efficacy in “stress tested” situations so to speak. For the individual (the reader in this case) the real world data is probably more helpful, even with it’s blemishes and messiness - it’s probably going to be closer to what the reality of the situation is for you.

Sample Population

In the ideal world, the sample we take for any given study represents the population we want to apply the research to. The challenge with this is to get a truly representative sample is difficult, and numbers swiftly become large, thus ballooning costs, timelines etc. Representative is indeed a relative term, but probably the most publicised non-representative issue is around biological sex. That is, research done only on men and applied to women (let’s not even talk about people who fall into other categories such as “differences in sex development”). A recent example here is beetroot juice (aka dietary nitrate), which was almost universally considered to be effective (Group A on the Australian Institute of Sport’s supplement categories suggesting it has good evidence for efficacy), but more recently was found to be either no benefit to females vs males or worse yet, detrimental to performance!

A “non-representative sample” shouldn’t mean you discard the study straight away, it really depends on how the sample is or isn’t representative. For instance, much of the training literature is done on college students as they’re available on campus and willing (or willing enough) to participate. As an example, these studies probably shouldn’t inform too much of what elite athletes do, or perhaps even elderly populations. In fact, I’d go as far as to say that I certainly don’t know how representative my 20 year old self would be of who I am currently from a training intervention standpoint. Again, that is not to wholly discard these studies, but interpret them as such (that huge VO2max improvement is probably not coming for me for example, though there may be merit in the training none the less).

This problem is also apparent in the wearable industry, classically with PPG sensors (the lights that are used to read your heart rate on your favourite wearable). In essence these devices and the algorithms that allow them to sense pulse rate are usually built on white skin, meaning those with darker skin or tattoos could see impaired performance or inaccurate data.

The Group vs Its Constituents

Now, we get into something that is not talked about as frequently. That of the individuals in the groups. The good news: you are not average (you’re welcome), the problem is this means that your response to whatever intervention will not be either, though that’s mostly how we report it or think about it.
As a result of all of the noise that is physiology, we use a large enough sample to have a low chance of spurious results (see above statistical significance discussion), this sample allows us to be relatively certain that what we see in the group is a true effect.

To be overly simplistic, if we studied a group of restaurant goers on average people would love my local Thai restaurant (I may be an outlier - I really love it). However, in that group, there may be one or two people who absolutely hate it - probably because they use so much coriander aka cilantro (there is a known genetic difference whereby the herb tastes like soap!). So you see then, that to the individuals who hate coriander, the results of the average opinion are not helpful! However, if you were visiting my local area, the restaurant reviews are a great place to start.

This is akin to research and group averages: they’re a great starting point and are probably more right than wrong.

That said, individual responses, or outcomes, are all that the individual in question really cares about. A classic example is caffeine - I know I am on record saying something to the effect of “it helps everyone for everything” or similar. Sleep not withstanding, this is not necessarily the case with some suggestion that certain genotypes (the genes that code for caffeine metabolism in this case) do not benefit or may see a detrimental effect of caffeine on performance.

The Solution

Measure it.
The world of biohacking can be quite negatively viewed (and to be honest, this is earned to a degree), but at its origins and core the primary concept is helpful: test something - intervene - test again (or track consistently prior to and through the intervention). This is no sales pitch for the latest tracker, though I do love wearables, it’s about being able to measure something where it is relevant, in a free living situation (with all the messiness that come along with that) and thus understanding the efficacy of the intervention for you in your context.

This may be blood tests, a diary, a spreadsheet (boy do I love a spreadsheet) or a wearable. It could even be multiple. But you need some way to understand and quantify impact where it matters to you.

One thing to consider, especially in cases that are a little nosier from a data standpoint and thus harder to be sure of the interventions’s efficacy even with changes in some marker is that of removal and re-intervention. By this I mean; test - intervene - test - remove intervention - test - intervene - test etc etc as needed to try tease out the effect of the intervention. This may sound impractical, but remember science is a verb not a noun. Likewise, we all get to choose how we engage with our health and performance, my only hope is that these choices are active, mindful ones not ones made by default.

If you enjoyed this, please share it with someone else who you think will too.

Grgic J, Grgic I, Pickering C, Schoenfeld BJ, Bishop DJ, Pedisic Z. Wake up and smell the coffee: caffeine supplementation and exercise performance-an umbrella review of 21 published meta-analyses. Br J Sports Med. 2020 Jun;54(11):681-688. doi: 10.1136/bjsports-2018-100278. Epub 2019 Mar 29. PMID: 30926628.
Barreto, Gabriel¹; Esteves, Gabriel P.¹; Marticorena, Felipe¹; Oliveira, Tamires N.¹; Grgic, Jozo²; Saunders, Bryan^1,3. Caffeine, CYP1A2 Genotype and Exercise Performance: A Systematic Review and Meta-analysis. Medicine & Science in Sports & Exercise ():10.1249/MSS.0000000000003313, October 12, 2023. | DOI: 10.1249/MSS.0000000000003313
Grgic J, Pickering C, Del Coso J, Schoenfeld BJ, Mikulic P. CYP1A2 genotype and acute ergogenic effects of caffeine intake on exercise performance: a systematic review. Eur J Nutr. 2021 Apr;60(3):1181-1195. doi: 10.1007/s00394-020-02427-6. Epub 2020 Nov 2. PMID: 33137206.
Guest N, Corey P, Vescovi J, El-Sohemy A. Caffeine, CYP1A2 Genotype, and Endurance Performance in Athletes. Med Sci Sports Exerc. 2018 Aug;50(8):1570-1578. doi: 10.1249/MSS.0000000000001596. PMID: 29509641.
Barreto G, Grecco B, Merola P, Reis CEG, Gualano B, Saunders B. Novel insights on caffeine supplementation, CYP1A2 genotype, physiological responses and exercise performance. Eur J Appl Physiol. 2021 Mar;121(3):749-769. doi: 10.1007/s00421-020-04571-7. Epub 2021 Jan 5. PMID: 33403509.
Eriksson, N., Wu, S., Do, C.B. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1, 22 (2012). https://doi.org/10.1186/2044-7248-1-22
Joaquin Ortiz de Zevallos, Austin C. Hogwood, Ka’eo Kruse, Jeison De Guzman, Meredith Buckley, Arthur L. Weltman, and Jason D. Allen. Sex differences in the effects of inorganic nitrate supplementation on exercise economy and endurance capacity in healthy young adults. Journal of Applied Physiology. 2023. 135:5,1157-1166
Austin C. Hogwood, Joaquin Ortiz de Zevallos, Ka’eo Kruse, Jeison De Guzman, Meredith Buckley, Arthur Weltman, and Jason D. Allen. The effects of inorganic nitrate supplementation on exercise economy and endurance capacity across the menstrual cycle. Journal of Applied Physiology. 2023. 135:5,1167-1175

Nexus Health & Performance