Quantcast
Channel: veloclinic

Do Cyclists Cluster into Phenotypes ?

$
0
0

This post is a quick one asking the question whether the GC Open Data Project cyclists cluster into distinct phenotypes? I used the 2nd and 3rd principal component scores from the 3 component model (see Part 2 and Part 3).

The short answer is that it doesn’t look that way, to me anyway. I don’t see any worthwhile clustering to suggest that a single category label such as sprinter or all-arounder would better describe relative abilities better than the radar charts. If needed cutoffs could be chosen in some statistically sound manner. But in this case its better to just let the data speak for itself.

 

The post Do Cyclists Cluster into Phenotypes ? appeared first on veloclinic.


Accessing the Golden Cheetah OpenData

$
0
0

This post is a first for what I hope will be a productive new direction for sharing my tinkering and research. I have over the past month or so dove back into Python and am learning how to use Jupyter notebooks and Github. For those not in the familiar (such as me up until very recently) Jupyter notebooks are a hybrid between an internet browser and integrated development environment. The purpose seems to be that you can share both the underlying code and the output one user and internet friendly way. For someone coming from zero computer science or data science background the process is a bit uv a muther-fugger but is doable with the help of internet searches, free moocs, and generosity of people willing to help one through some really dumb/embarrassing and time consuming mistakes (see my reported (non) – issue for the OpenData library).

The idea behind sharing:

  1. The Golden Cheetah OpenData project is intended to foster open collaboration.
  2. My personal progress has always been far faster when I’ve engaged the Flock for feedback.
  3. Maybe what I do will be useful to someone else.

Below is a Github gist of the notebook I am working on in order to access the OD ride files to set up a data set for subsequent analysis.

The post Accessing the Golden Cheetah OpenData appeared first on veloclinic.

Exploring interval detection with the ruptures library

What is an interval ?

$
0
0

Its like porn, difficult to define but you know it when you see it…

Lets take a vote:

Example 1

or

Example 2

Flock, do your thing and make answers.

The post What is an interval ? appeared first on veloclinic.

EPO (sea) Levels the Playing Field; Who Would’ve Thought

Cobo Athlete Biological Passport Visualization and Discussion

$
0
0

Its blood data, so you know I have to take a look. The UCI broke its long biopassport sanction drought by going after a retired rider on a minor team best know for blowing things up at the Vuelta. The upshot is they handed the red (formerly gold) jersey to Froome who rode said Vuelta, after a failed sale to Bruyneel, and got shuffled into the starting roster when Sky ran out of other riders to slot in the race.

Above are my recreations of what the ABP would look like based on using the rolling mean and 2.3 SDs (99th percentile cuts) up to but excluding the plotted time point, ie what the ABP software would have pinged/not pinged at the time of the sample. See my work here: https://drive.google.com/file/d/1DHANbgQbOv3tbTsEOmh4uxKzAjiVK0vJ/view?usp=sharing

For the rationale of how a z-score model (the plotted thing above) effectively replicates the ABP once you get more than a handful of points see our paper:

This assumption is supported by previous work showing that z-score thresholds generated from an individual athlete’s data alone converge with the ABP model thresholds and demonstrate comparable classification performance once both models are trained on sufficient baseline data (Sottas et al., 2007).  https://www.frontiersin.org/articles/10.3389/fphys.2018.00643/full

Also, note that the plot for reticulocytes shows the square root values. This transformation is necessary to normalize the distribution so that the SD gives the same percentiles above and below the mean. And sub-note that the ABP software use Hgb and Off-score but not the reticulocyte percentage. The reticulocytes however are a go to for the experts reviewing the data as they are not affected by the plasma volume swings that occur with thing like a grand tour.

Disclaimer: I am not making promises that I didn’t make any transcription errors, but I did my best.

The UCI/WADA summed up its position as:

The ABP in the case at hand is based on the Expert Panel’s initial evaluation of 38 valid samples,12 the documentation of which was included as evidence in the UCI’s submissions. As reported by the Expert Panel, the main important abnormalities in the Rider’s profile are (i) the significant variability of haemoglobin concentration (HGB) with a 95% sequence abnormality, (ii) the variability of reticulocytes (0.28-1.43%), which, according to the Expert Panel, is “above the physiological range, with 99% sequence abnormality” and (iii) the 92% sequence abnormality of the OFF-score values, including several high values (122 in sample 46, 121 in sample 17, 120 in sample 18, 117 in samples 16, 31, 38, 42 and 47). see: https://www.uci.org/docs/default-source/clean-sport-documents/anti-doping-tribunal/uci-adt-03.2018-uci-v.-mr-juan-jos–cobo-acebo.pdf

So what I find interesting, assuming that I didn’t screw things up, is that the software, in real time would not have pinged a beyond threshold value until the low Hgb on July 16, 2012 which doesn’t seem to feature in the decision. Instead, their only sanction-able finding was:

… wait for it

… wait for it

they didn’t actually have one as far as I can tell according to WADA rules, which are kindly summarized for ABP data in the “Factual Background” section:

Haematological data is considered atypical if 1) a haemoglobin (HGB) and/or OFF-score (OFFS) marker value falls outside the expected intra-individual ranges, with outliers corresponding to values out of the 99%-range (0,5 – 99,5 percentiles) (1:100 chance or less that this result is due to normal physiological variation) or 2) when sequence deviations (a longitudinal profile or marker values) are present at specificity of 99,9% (1:1000 chance or less that this is due to normal physiological variation).


See the issue is that the most wildly abnormal thing in Cobo’s passport is the 0.28 reticulocyte percentage value on 9/26/2019. To get that low you are talking about coming off of old-school doses of EPO or a full bag or two of packed red blood cells. However, according to my most recent readings of WADA code, the reticulocyte values can only be used as supporting evidence. The main line of evidence from the ABP must either be the Hgb or Off-score with values or sequence outside of 99% or 99.9% respectively. Cobo only got to 95% and 92% sequence abnormality on the Hgb and Off-score respectively.

This case then raises two questions:

  1. Is “expert (gestalt) opinion” now good enough to sanction a rider.
  2. How the does Cobo full-tilt doping not flag the passport.

Question 1, hopefully someone with legal side knowledge can chime in on.

Question 2, the answer is this:

If you dope, dope consistently. Why? The model only knows what you teach it, again from the frontiers paper:

Interestingly, neither the performances (Figure 3C) nor the parameter estimates (Figures 3D,E) for “doped” 2008 fell outside the prediction intervals. This result highlights a limitation in “passport-type” detection methods in which the “doped” 2007 data were included in the model training and biased the means and increased the variance such that the “doped” 2008 performances and parameter estimates were not statistically detected.

(Image reproduced for educational purposes only.)

Its a problem that I’ve wondered about quite publicly for a while without much response from the official anti-doping community, and more formally illustrated it with the easter egg line/figure in the Frontiers paper. Now, we have a real live in the flesh illustration. Satisfying, to be proven right by a GT winner, but sad for cycling/sports etc.

Cheers.

The post Cobo Athlete Biological Passport Visualization and Discussion appeared first on veloclinic.

Athlete Biological Passport Standard Deviation ?

$
0
0

When I drill down in to data it’s no surprise when I catch small errors and discrepancies pop. But the thing that jumped out when drilling down into the Cobo ABP data http://veloclinic.com/cobo-athlete-biological-passport-visualization-and-discussion/ was that the Z-score model was apparently producing far wider cut offs than the actual ABP. Jeroen Swart pointed this out to me (hopefully I’m not getting him in trouble by dragging him in to this).

Doing some quick cut off hacking I found the Z-score model converged with the ABP output on the example he tweeted if I used 1 standard deviation cut offs rather than 2.3 standard deviation cut offs. See my Cobo post for the quick rationale why the Z-score and ABP models should basically converge given sufficient data points.

Interest piqued, I grabbed some published examples and the first to take on is Zorzoli and Rossi Figure 2 https://onlinelibrary.wiley.com/doi/full/10.1002/dta.173

Figure reproduced for education purposes only.

So first off, the numbers are hard to read due to poor resolution and overlap so I did my best to reproduce them. Not getting the number quite right can affect the work below. Then I plotted the figure date with the Z-score model with 1 standard deviation cut offs overlaid to see how they compare.

From the plots its clear that the models converge very closely on the OFF score and Reticulocyte % and fairly well on the Hgb.

This convergence is a problem for the paper because the paper uses this figure as an example of a likely doped profile:

ABP profile of an athlete considered as suspicious

And states:

In these profiles the Bayesian adaptive model has identified the Hb or Off‐hr score abnormal with a 99% probability (either for the single measurement as a function of previous results or for the complete sequence) or with normal or lower levels of probability.

Meaning that the figure is showing points that are outside the 99th percentile i.e. outside 2.3 standard deviations.

Recall however, I am using 1 standard deviation (67th percentile) as the cut offs for the Z-score model and that the Z-score models and ABP models should converge.

Given that the ABP software is not publicly available I can’t confirm what it statistically to generate the figure used by Zorzoli and Rossi, but I can show my work for the Z-score model: https://drive.google.com/file/d/1YqcRHieehucKumXG9QhcjAsC34Vnd7OH/view?usp=sharing

The question is whether this is a one off issue of stat hacking in a couple of figures used for “illustrative” purposes, or has the ABP black box output not been sufficiently vetted/replicated?

Or is something else entirely going on. For example, the published literature on the ABP is that the cut offs are based on specificity rather than probability and is there some undisclosed doping “prevalence” being passed in to the ABP model which happens to work out to probability cut offs with tighter bounds?

Don’t know, either way interesting…

Thanks for paying attention, cheers.

The post Athlete Biological Passport Standard Deviation ? appeared first on veloclinic.

Development and field validation of an omni-domain power-duration model

$
0
0

Michael J. Puchowicz, Jonathan Baker & David C. Clarke (2020) Development and field validation of an omni-domain power-duration model, Journal of Sports Sciences, DOI: 10.1080/02640414.2020.1735609

Purpose: To validate and compare a novel model based on the critical power (CP) concept that describes the entire domain of maximal mean power (MMP) data from cyclists.

Methods: An omni-domain power-duration (OmPD) model was derived whereby the rate of  expenditure is bound by maximum sprint power and the power at prolonged durations declines from CP log-linearly. The three-parameter CP (3CP) and exponential (Exp) models were likewise extended with the log-linear decay function (Om3CP and OmExp). Each model bounds  using a different nonconstant function, Wʹeff (effective ). Models were fit to MMP data from nine cyclists who also completed four time-trials (TTs).

Results: The OmPD and Om3CP residuals (4 ± 1%) were smaller than the OmExp residuals (6 ± 2%; P < 0.001). Wʹeff predicted by the OmPD model was stable between 120–1,800 s, whereas it varied for the Om3CP and OmExp models. TT prediction errors were not different between models (7 ± 5%, 8 ± 5%, 7 ± 6%; P = 0.914).

Conclusion: The OmPD offers similar or superior goodness-of-fit and better theoretical properties compared to the other models, such that it best extends the CP concept to short-sprint and prolonged-endurance performance.

 

 

The post Development and field validation of an omni-domain power-duration model appeared first on veloclinic.


Three ICU transfers and one death reported in hydroxycholoquine treatment group

$
0
0

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102549/

An under discussed part of the Guatret 2020 study is the somewhat buried data that the hydroxycholoroquine treatment group had 6 participants “lost to follow up” (a term referring to participants dropped or dropping out of the study) while the control group had no participants lost to follow up. The reasons for loss to follow up in the hydroxycholoquine treatment group included 3 transfers to the ICU, and 1 death.

I have just sent out the email below to Dr Raoult to seek clarification:

Dr Raoult,
I read your study on hydroxychlorquine treatment of COVID-19 with great interest. 
The reported results are very promising regarding viral clearance but I am concerned about 4 participants lost to follow up in the treatment group; 3 ICU transfers and 1 death. I am writing to clarify whether I am reading the study correctly. This information indicates that the treatment group had 1 death and the control group had no deaths, is this correct? Also, were all ICU transfers “lost to follow up” ? That is, were there 3 ICU transfers in the treatment group but none in the control group?
If my reading of this study is correct, then does this not indicate a risk for more severe COVID-19 course with hydroxychloroquine treatment? I realize that the groups where not well matched and the older age in the treatment group may explain the severe outcomes. However, given the potential harm suggested by this aspect of the study is caution not warranted?
Sincerely,
Mike

I would prefer to wait for clarification from the author before pushing this issue publicly. However, given the fast moving nature of the COVID-19 response and White House guidance pushing this drug I feel an early alert for caution is necessary.

The post Three ICU transfers and one death reported in hydroxycholoquine treatment group appeared first on veloclinic.

Peronnet and Thibault; the source model under the hood of WKO4 WKO5

$
0
0

I’ve raised the issue before with Andy Coggan. He has not given proper attribution to Peronnet and Thibault as the source model for his own model.

The Peronnet and Tibault model is a reasonable model that is an extension of Ward-Smith which a modification of LLoyd which is an extension of Hill.

Hill:

P(t) = AWC / t + MAP

AWC is anaerobic work capacity

MAP is maximal aerobic power

LLoyd:

P(t) = AWC / t * (1-exp(-t/tau)) + MAP

tau is the time constant of the exponential function. In plain terms what this function says is that AWC is not instantly available but increase exponentially with time so that the maximum power is constrained.

Ward-Smith:

P(t) = AWC / t * (1-exp(-t/tau)) + MAP * (1-exp(-t/tau2))

tau2 is again a time constant of the exponential function. In plain terms Ward-Smith again said that MAP is not instantly available but instead must increase with oxygen kinetics. If you do the math:

tau = AWC/Pmax (almost, there is technically a tiny MAP contribution since the exponential function does not start from zero so that starting value should be subtracted from Pmax).

Peronnet Thibault:

P(t) = AWC / t * (1-exp(-t/tau)) + MAP * (1-exp(-t/tau2)); t </= Tmap

P(t) = AWC / t * (1-exp(-t/tau)) + MAP * (1-exp(-t/tau2)) – a*Ln(t/Tmap): t > Tmap

a is the slope of the decline in MAP, which decreases log-linearly starting at t > Tmap. Tmap is the longest duration that MAP can be sustained.

Coggan WKO4/5:

P(t) = FRC / t * (1-exp(-t/tau)) + FTP * (1-exp(-t/tau2)); t </= TTE

P(t) = FRC / t * (1-exp(-t/tau)) + FTP * (1-exp(-t/tau2))) – a*Ln(t/TTE): t > TTE

As noted above tau can be substituted with FRC/Pmax. The practical difference between the two models is that P&T specified Tmap as a fixed parameter = 420s while it is a fitted parameter in WKO. (As an aside, to be consistent with oxygen kinetics, which are indeed described by an exponential rise in response to high intensity exercise, the model should actually used the integral of the function above if the modeler was working from first principles as claimed. Unfortunately, when the integral function is used the model does not perform well )

To illustrate that the models are in fact mathematically the same here is an overlay of P&T on top of the output from WKO:

To show that there are actually two models here, I offset P&T slightly:

As an example of how this model can get janky with very low AWC and long tau2 (causing the anaerobic power to fall off before the aerobic power ramps up):

Looks a bit ridiculous with model output (red) actually dipping, which is clearly not physiological as MMP is by definition monotonically decreasing. Likewise, the sharp inflection at Tmap indicates very sub-maximal long duration data.

To be clear, I am not intentionally putting bad numbers in to the model to make it look bad, rather I am just reproducing the curves off a WKO5 help page:

Here are some links to a spreadsheet with the P&T model and a ppt with these last two images overlaid.

Note, there is nothing wrong with using source models and prior work. It is what we did to arrive at a not very different end, source functions cited and all.

The post Peronnet and Thibault; the source model under the hood of WKO4 WKO5 appeared first on veloclinic.





Latest Images