Tour de France Performance Trends (2008-2015,2017)

July 22, 2017, 3:58 pm

≫ Next: “The Critical Power Model as a Potential Tool for Anti-doping”

≪ Previous: A Bayesian approach to boost individual anti-doping classification accuracy by transverse monitoring of the athlete network

With 10 years (almost as 2016 is missing) of Tour de France finishing climbs on the spread sheets, it seemed time to have a look back and see if there was an interesting trend.

First to normalize the performance I used the Martin model assuming a Froome size rider of 67 kg and about 50% of ideal drafting for a CdA of 0.3. Then I used Townsend’s Critical Power altitude correction polynomial and normalized the power to a mean mid altitude of 1208 m.

Taking a look at the normalized data:

We’ve got a curvilinear looking plot that is a bit too stretched for the CP model but can be transformed to linear with a log x axis:

So log-linear model it is. However, I was curious if we could do better for accounting for some other variables such as placing, what stage it was, or the length of the stage. So I ran a step-wise regression in R using the leaps package:

What we see here is that our best model based on the Bayesian Information Criteria which balances model fit versus over-parameterization includes log(Time) as expected, length, and plus minus stage. Paradoxically, length was positively correlated with climbing power so with some hesitation I kept that in but dropped place and stage from the model.

Fitting the model:

Things look pretty good. The black line is the model, the faint gray line is the confidence interval of the model and the dull red line the prediction interval.

With model in hand we can calculate the residuals of each performance and compare across years:

The black line is the mean, box and whiskers the quartiles, and circles are outlier points. As far as trends go, maybe the performances are more homogenous but otherwise the trend is effectively flat with year not being significant (P = 0.15).

I’m left with 2 or 3 take away questions here:

1. Why are finishing climbs faster at the end of longer stages? My guess is that the longer the slog up to the final climb the more cautious the GC racing is up to it… maybe?

2. Where are the motors ?

3. Why are bikes nearly twice the price; a top shelf 2008 Madone $7,700 vs 2017 Madone $13,000 ? I would expect 10 years of technological advancement in a man+machine type of sport to show up.

The good news, and I’ll call it that as much as my popularity seems to hinge on playing does it dope, is that over the past 10 years there does not seem to be an emergence of some new doping rocket-fuel. I’m by no means naive enough to see this as evidence of a clean sport. But it does at least support the idea that the illicit march has stalled for now.

The post Tour de France Performance Trends (2008-2015,2017) appeared first on veloclinic.

↧

“The Critical Power Model as a Potential Tool for Anti-doping”

June 5, 2018, 11:16 pm

≫ Next: Functional PCA of the Golden Cheetah Power Duration Data

≪ Previous: Tour de France Performance Trends (2008-2015,2017)

Herein, we review the basis by which performance models could be used for doping detection, followed by critically reviewing the potential of the critical power (CP) model as a prototypical performance model that could be used in this regard. Click the link for our paper on implementing the critical power model for doping detection published in Frontiers in Physiology.

The post “The Critical Power Model as a Potential Tool for Anti-doping” appeared first on veloclinic.

↧

Functional PCA of the Golden Cheetah Power Duration Data

November 27, 2018, 9:22 pm

≫ Next: Part 2: Functional Principal Component Analysis of the Golden Cheetah Power Duration Data

≪ Previous: “The Critical Power Model as a Potential Tool for Anti-doping”

With 2,445 athlete seasons (inclusion criteria: at least 100 power files per season and at least PD data out past 2 hours), it makes sense to let the data speak without a predetermined narrative. One tool that works without a priori assumptions is principal component analysis. The basic idea is to start with the data mean, then explain as much variability, in as few components as possible. For time-series data (which is sort of the case for PD data) you can use functions to achieve the same goal; ie start with no assumptions, find the mean function, then explain the variability in a minimal number of orthogonal functions.

Fortunately R has a package fdapace makes this quite easy once you figure out how to format the data. So here we go;

(edited to add figure with transparency)

Start with a mess of data.

Run your functional PCA.

From the scree-plot what you are shown is that 90% of the variability can be explained by eigenfunction 1. That’s a lot explained. Eigenfunction 2 explains an additional 5%, and by the time we get to eigenfunction 3 its down to explaining 2.5% of the variability (and I would argue could be dropped).

The mean function is straightforward, it is the mean of all the data and looks believable. In the final box lower right art the first 3 eigenfunctions.

Zooming in on these, the black line is the eigenfunction 1, it is all positive so will act like a “gain” function and is somewhat proportional to power. The red line is eigenfunction 2 and it captures the anti-correlation between all out sprint vs endurance ability, sort of a twitchedness function. And eigenfunction 3 is a bit of a weird one that is tied to sprint endurance or lack there of.

To visualize that a bit easier we can take the mean function and add/subtract the eigenfunction. Here is eigenfunction 1 showing that 90% of the variability in 2,445 PD curves is a simple “gain” function, ie most of the difference is just that some people are better than others.

Eigenfucntion 2 shows the trade off between sprint and endurance ability. Basically its the are you a sprinter or are you everyone else function.

And last, eigenfunction 3, which I would argue could be dropped is maybe a sprint endurance function.

So what does this mean ?

My take is that out to 2 hours, 95% of performance variability can be empirically explained with just 2 paramaters (which is incredibly parsimonious); a gain function that indicates overall ability, and a twitchedness function that indicates sprint vs endurance phenotype. I need to cross-validate this claim, but it makes sense that such a simple parameterization (ie a sufficiently stiff model) should be very robust which I think is key to dealing with PD data which is generally fairly shitty due to submax spans and serial auto-correlation.

The way that you would apply this then to run the FPCA on your reference data set, then introduce your new PD data set of interest and see how strongly (and whether positive or negative) it loads on eigenfunction 1 (ability) and egenfunction 2 (twitchedness). From there it would be easy to interpret whether gains/losses are due changes in ability or twitchedness.

The post Functional PCA of the Golden Cheetah Power Duration Data appeared first on veloclinic.

↧

Part 2: Functional Principal Component Analysis of the Golden Cheetah Power Duration Data

November 29, 2018, 1:59 pm

≫ Next: The small volume blood transfusion study

≪ Previous: Functional PCA of the Golden Cheetah Power Duration Data

After the first post on FPCA of the Golden Cheetah Open Data Dan Connelly (@djconnel) pointed out that since the FPCA uses basis functions the fit would improve after taking the log of power. Going back through the initial attempt there was heteroskedasticity in the residuals with errors increasing at long durations. Sure enough after taking the log things improved so here goes the updated post. The cleaned up code will be put up by Mark Liversedge in keeping with the open data collaborative approach.

Getting back to the post at hand, the GC data has 2,445 data sets with at least 100 power files and MMP extending past 7,200 seconds. These files were used for the analysis, all others were excluded. I ran the analysis in R using the fdapace package. The purpose of functional principal component analysis is to describe time series data in the fewest number of independent functions.

The data:

The results of the analysis:

The scree plot shows that the PC1 (the first function) accounts for about 75% of the variability in the data. PC2 is still notable at 15% of the variability, and PC3 falling off to 2.5% of the variability. For our purposes we stop after PC3 because from there improvements in fits are trivial and the functions become less intuitive or interpret-able in what they represent performance wise.

The mean function has a familiar sigmoid shape of the power duration curve in cyclists. This mean function is the starting point for the fits from which PC1, PC2, and PC3 are added or subtracted to optimize the fit.

Looking at PC1,2, and 3 may or may not be intuitive at first glance. So the mode of variance plots are generated which help visualize how they change the fit.

PC1 essentially raises or lowers the curve in a manner that is slightly greater at shorter than longer durations. PC1 can be understood as an overall ability. Higher PC1 indicates superior performance ability and PC2 indicates lower performance ability.

PC2 captures the anticorrelation between sprint and endurance ability or twitchedness (the ratio between fast twitch and slow twitch motor units). Higher PC2 indicate greater slow twitch and endurance while lower values indicate greater fast twitch and sprint ability.

PC3 appears to describe what might be described as sprint endurance. Higher values indicate a poorer sprint endurance (possibly anaerobic glycolysis or W’) relative to all out sprint and pure endurance while lower values indicate superior sprint endurance relative to all out sprint and endurance abilities.

Internal validation of the fits looks very good:

Note that the mean percent residual is near zero and flat with no weird hops or kinks. The blue line is 2 standard deviations of the percent residuals of model fits using the PC1, PC2, and PC3, while the red line uses just PC1, and PC2.

An outlier plot….

And how do the fits actually look on the individual level (below are 120 fits selected by random in R to give you some sense of how the individual fits look):

These fits look fantastic, particularly when it is considered that the models describes a span from 1 – 7,200 s with just 3 fitted parameters (and would work nearly as well with just 2 fitted parameters).

Some limitations of this approach (among others) are that we are not explicitly specifying an physiologic model so the model will at times produce physiologically implausible fits (for example a span where power increases at longer durations) for some poor quality data points. There isn’t really a way to prevent that other than to maybe throw a warning for non-monotonic predictions. Also, the quality of the model is dependent on the data set used to generate it. It may not generalize well to athlete phenotypes not represented in the reference data set. Likewise, interpretation of the PC values is relative to the reference data set. For example you can say an athlete is in the 90th percentile for PC1 but not that athlete has a critical power of 300 watts. The other thing I should have done here is to reserve a subset of the data for cross-validations or written a loop to check a leave one out cross validation.

The post Part 2: Functional Principal Component Analysis of the Golden Cheetah Power Duration Data appeared first on veloclinic.

↧

The small volume blood transfusion study

December 7, 2018, 9:08 am

≫ Next: Part 3: Reference ranges for the FPCA model Golden Cheetah open data

≪ Previous: Part 2: Functional Principal Component Analysis of the Golden Cheetah Power Duration Data

This study Time Trial Performance Is Sensitive to Low-Volume Autologous Blood Transfusion. by Beldjer et al has drawn some attention by a shocking finding of “ABT of only ~135 ml of RBCs is sufficient to increase mean power in a 650 kcal cycling time trial by ~5% in highly trained men.”

But you have to dive in to a couple details; the most relevant detail is that they took out 900 ml (2 units) of whole blood (washed it, spun it, stored it) and then 4 weeks later transfused back half a unit prior to a time-trial (TT). The result of the TT alone should cause some concern about the study design:

The first thing to notice is that TT performance was not better in the low-volume transfusion condition, it was the same as placebo (223 vs 224). And note that the baseline for the transfusion condition is lower than placebo (213 vs 223). The baselines should not be different as this is a cross-over design study so its the same subjects going through both conditions.

The authors report a true but somewhat misleading key finding, “The mean power was increased in time trials from before to after transfusion (P<0.05) in BT (213±35 vs. 223±38 W; mean±SD) but not in PLA (223±42 vs. 224±46 W),” and come to the wrong conclusion “CONCLUSION ABT of ~135 ml of packed RBCs increased mean power on a 650 kcal cycling time trial by ~5%.”

What they should be pointing out is something like, “eh we sort of screwed up the study design and we made the subjects anemic and didn’t allow sufficient time for the anemia to correct before giving back 1/4 of the RBCs we took out.”

Check out the retic prior to the transfusion 1.62, still well up from the control values (1.07, 1.18, 1.14) so the body is still churning away making new RBCs to try to correct the anemia.

Sure enough the direct measure of Hb mass is still down by 5% (45 gm) just prior to the low volume transfusion (or down 35 gm compared to placebo condition), which correlates well the baseline TT being down by 5%.

What the authors therefore found is that 5% ANEMIA results in a 5% decrement in TT performance. Makes sense.

So my question here is this: Did the authors actually get the participants to a super-physiologic state with their low-volume transfusion?

Backing up now to do some math:

The Hg concentration of the withdrawn blood was 150 gm/L and they took out 900 ml = 135 gm. They transfused back in 1/4 of that = 34 gm.

Hmm…

So a low-volume blood transfusion “doping” study never actually dopes the subjects but only just gets them back to normal physiology.

The problem with that, (besides my grammar here) is that the effect of correcting anemia may not represent what happens when the Hg mass is raised to a super-physiologic level. It does make sense that there will be some benefit but it also makes sense that there will be some saturation of potential benefit going super-physiologic vs correcting anemia.

Likewise, these “highly-trained men” are only putting out 220 watts for about 40 min. It makes sense that the benefit that a true elite athlete might saturate some vs these (sub-cat 4/5) participants.

How much benefit a half-unit transfusion would give out in the real world is a question not really addressed by this study. Maybe it indicates a ceiling to the effect around 5% or less.

The post The small volume blood transfusion study appeared first on veloclinic.

↧

Part 3: Reference ranges for the FPCA model Golden Cheetah open data

December 19, 2018, 10:42 am

≫ Next: Do Cyclists Cluster into Phenotypes ?

≪ Previous: The small volume blood transfusion study

This is the third post on applying functional principal component analysis to the open GC project power duration data. Part 2 explains the approach and the resulting model. The purpose of this post is now to make the model and data useful.

The model itself outputs a score for each principal component. That score is hard to interpret in isolation other than maybe looking at change over time for a given athlete. One way to make the score easier to interpret is to convert it to a percentage. To do this the first step was to look at the distributions of the scores and see if they were normal:

On the first principal component (PC1) the scores are skewed a little. I did try some basic transformations but they didn’t get the data looking sufficiently more normal to warrant the added complexity of converting (in my opinion anyway). PC2 and PC3 looked fine. So to get a percentage we can convert the PC to a z-score so that the mean is 0 and the standard deviation is 1. From there the pnorm function in are will give the percentile of the z-score. The percentile is still going to be relative to the reference data set but is intuitive. In terms of the scores then PC1 gives us the overall ability compared to the mean function. PC2 gives the ratio of type 1 (endurance) to type 2 (sprint dominance). For ease of interpretation, below I report both type 1 and type 2 separately even though one can be solved from the other: type1 = 1 – type2. PC3 is similarly a ratio that I interpret as anaerobic endurance vs sprint and endurance which seems most straightforward to report as a single anaerobic value.

To see how this looks, below are sets of 3 randomly selected fits with the power duration data in light gray circles, the fit line in red, and the mean reference function in black. Next to each PD plot is a corresponding radar chart showing overall ability and then the balance in relative abilities. Note that the radarchart is a bit imperfect in a couple of ways. The first is that it mixed the overall ability percentile with ratio percentiles. What I mean by that is that type 1 and type 2 show the individuals relative balance given their overall ability rather than their specific ability versus others. For example an athlete may be .99 on type 2 yet still be a terrible sprinter if their overall ability is very low. The other issue is treating anaerobic as independent from type 1 and 2. For most people this shouldn’t matter as the magnitude of scores on PC3 is much lower than PC2. But for outliers with a very high PC3 (anaerobic) it should squeeze in both type 1 and 2 on the radar chart a little and vice versa. I think this issue could be corrected with some math but since PC2 accounts for 15% of the variability and PC3 only 2.5% the correction would likely be trivial in cases other than extreme outliers on PC3. For most people the model and percentile values should be quite usable as is.

(update: it looks like we should be able to implement the model in GC as an Rchart)

The post Part 3: Reference ranges for the FPCA model Golden Cheetah open data appeared first on veloclinic.

↧

Do Cyclists Cluster into Phenotypes ?

December 21, 2018, 8:24 am

≫ Next: Accessing the Golden Cheetah OpenData

≪ Previous: Part 3: Reference ranges for the FPCA model Golden Cheetah open data

This post is a quick one asking the question whether the GC Open Data Project cyclists cluster into distinct phenotypes? I used the 2nd and 3rd principal component scores from the 3 component model (see Part 2 and Part 3).

The short answer is that it doesn’t look that way, to me anyway. I don’t see any worthwhile clustering to suggest that a single category label such as sprinter or all-arounder would better describe relative abilities better than the radar charts. If needed cutoffs could be chosen in some statistically sound manner. But in this case its better to just let the data speak for itself.

The post Do Cyclists Cluster into Phenotypes ? appeared first on veloclinic.

↧

Accessing the Golden Cheetah OpenData

January 25, 2019, 2:25 pm

≫ Next: Exploring interval detection with the ruptures library

≪ Previous: Do Cyclists Cluster into Phenotypes ?

This post is a first for what I hope will be a productive new direction for sharing my tinkering and research. I have over the past month or so dove back into Python and am learning how to use Jupyter notebooks and Github. For those not in the familiar (such as me up until very recently) Jupyter notebooks are a hybrid between an internet browser and integrated development environment. The purpose seems to be that you can share both the underlying code and the output one user and internet friendly way. For someone coming from zero computer science or data science background the process is a bit uv a muther-fugger but is doable with the help of internet searches, free moocs, and generosity of people willing to help one through some really dumb/embarrassing and time consuming mistakes (see my reported (non) – issue for the OpenData library).

The idea behind sharing:

The Golden Cheetah OpenData project is intended to foster open collaboration.
My personal progress has always been far faster when I’ve engaged the Flock for feedback.
Maybe what I do will be useful to someone else.

Below is a Github gist of the notebook I am working on in order to access the OD ride files to set up a data set for subsequent analysis.

The post Accessing the Golden Cheetah OpenData appeared first on veloclinic.

↧

Exploring interval detection with the ruptures library

January 31, 2019, 9:12 am

≫ Next: What is an interval ?

≪ Previous: Accessing the Golden Cheetah OpenData

This post follows from the Accessing the Golden Cheetah OpenData post. This gist starts the process of developing an interval detection algorithm utilizing statistical change point detection. For python ‘ruptures’ is a library that offer several options for change point detection.

The post Exploring interval detection with the ruptures library appeared first on veloclinic.

↧

What is an interval ?

May 9, 2019, 3:16 pm

≫ Next: EPO (sea) Levels the Playing Field; Who Would’ve Thought

≪ Previous: Exploring interval detection with the ruptures library

Its like porn, difficult to define but you know it when you see it…

Lets take a vote:

Example 1

Example 2

Flock, do your thing and make answers.

The post What is an interval ? appeared first on veloclinic.

↧

EPO (sea) Levels the Playing Field; Who Would’ve Thought

May 30, 2019, 2:02 pm

≫ Next: Cobo Athlete Biological Passport Visualization and Discussion

≪ Previous: What is an interval ?

https://journals.lww.com/acsm-msse/Abstract/2019/02000/Effects_of_EPO_on_Blood_Parameters_and_Running.10.aspx

First off, I’ve only read the abstract and from that I’m just thinking what the hell was the point? But I could not not write that title so here you go. Allez.

The post EPO (sea) Levels the Playing Field; Who Would’ve Thought appeared first on veloclinic.

↧

Cobo Athlete Biological Passport Visualization and Discussion

July 23, 2019, 3:58 pm

≫ Next: Athlete Biological Passport Standard Deviation ?

≪ Previous: EPO (sea) Levels the Playing Field; Who Would’ve Thought

Its blood data, so you know I have to take a look. The UCI broke its long biopassport sanction drought by going after a retired rider on a minor team best know for blowing things up at the Vuelta. The upshot is they handed the red (formerly gold) jersey to Froome who rode said Vuelta, after a failed sale to Bruyneel, and got shuffled into the starting roster when Sky ran out of other riders to slot in the race.

Above are my recreations of what the ABP would look like based on using the rolling mean and 2.3 SDs (99th percentile cuts) up to but excluding the plotted time point, ie what the ABP software would have pinged/not pinged at the time of the sample. See my work here: https://drive.google.com/file/d/1DHANbgQbOv3tbTsEOmh4uxKzAjiVK0vJ/view?usp=sharing

For the rationale of how a z-score model (the plotted thing above) effectively replicates the ABP once you get more than a handful of points see our paper:

This assumption is supported by previous work showing that z-score thresholds generated from an individual athlete’s data alone converge with the ABP model thresholds and demonstrate comparable classification performance once both models are trained on sufficient baseline data (Sottas et al., 2007). https://www.frontiersin.org/articles/10.3389/fphys.2018.00643/full

Also, note that the plot for reticulocytes shows the square root values. This transformation is necessary to normalize the distribution so that the SD gives the same percentiles above and below the mean. And sub-note that the ABP software use Hgb and Off-score but not the reticulocyte percentage. The reticulocytes however are a go to for the experts reviewing the data as they are not affected by the plasma volume swings that occur with thing like a grand tour.

Disclaimer: I am not making promises that I didn’t make any transcription errors, but I did my best.

The UCI/WADA summed up its position as:

The ABP in the case at hand is based on the Expert Panel’s initial evaluation of 38 valid samples,12 the documentation of which was included as evidence in the UCI’s submissions. As reported by the Expert Panel, the main important abnormalities in the Rider’s profile are (i) the significant variability of haemoglobin concentration (HGB) with a 95% sequence abnormality, (ii) the variability of reticulocytes (0.28-1.43%), which, according to the Expert Panel, is “above the physiological range, with 99% sequence abnormality” and (iii) the 92% sequence abnormality of the OFF-score values, including several high values (122 in sample 46, 121 in sample 17, 120 in sample 18, 117 in samples 16, 31, 38, 42 and 47). see: https://www.uci.org/docs/default-source/clean-sport-documents/anti-doping-tribunal/uci-adt-03.2018-uci-v.-mr-juan-jos–cobo-acebo.pdf

So what I find interesting, assuming that I didn’t screw things up, is that the software, in real time would not have pinged a beyond threshold value until the low Hgb on July 16, 2012 which doesn’t seem to feature in the decision. Instead, their only sanction-able finding was:

… wait for it

they didn’t actually have one as far as I can tell according to WADA rules, which are kindly summarized for ABP data in the “Factual Background” section:

Haematological data is considered atypical if 1) a haemoglobin (HGB) and/or OFF-score (OFFS) marker value falls outside the expected intra-individual ranges, with outliers corresponding to values out of the 99%-range (0,5 – 99,5 percentiles) (1:100 chance or less that this result is due to normal physiological variation) or 2) when sequence deviations (a longitudinal profile or marker values) are present at specificity of 99,9% (1:1000 chance or less that this is due to normal physiological variation).

See the issue is that the most wildly abnormal thing in Cobo’s passport is the 0.28 reticulocyte percentage value on 9/26/2019. To get that low you are talking about coming off of old-school doses of EPO or a full bag or two of packed red blood cells. However, according to my most recent readings of WADA code, the reticulocyte values can only be used as supporting evidence. The main line of evidence from the ABP must either be the Hgb or Off-score with values or sequence outside of 99% or 99.9% respectively. Cobo only got to 95% and 92% sequence abnormality on the Hgb and Off-score respectively.

This case then raises two questions:

Is “expert (gestalt) opinion” now good enough to sanction a rider.
How the does Cobo full-tilt doping not flag the passport.

Question 1, hopefully someone with legal side knowledge can chime in on.

Question 2, the answer is this:

If you dope, dope consistently. Why? The model only knows what you teach it, again from the frontiers paper:

Interestingly, neither the performances (Figure 3C) nor the parameter estimates (Figures 3D,E) for “doped” 2008 fell outside the prediction intervals. This result highlights a limitation in “passport-type” detection methods in which the “doped” 2007 data were included in the model training and biased the means and increased the variance such that the “doped” 2008 performances and parameter estimates were not statistically detected.

(Image reproduced for educational purposes only.)

Its a problem that I’ve wondered about quite publicly for a while without much response from the official anti-doping community, and more formally illustrated it with the easter egg line/figure in the Frontiers paper. Now, we have a real live in the flesh illustration. Satisfying, to be proven right by a GT winner, but sad for cycling/sports etc.

Cheers.

The post Cobo Athlete Biological Passport Visualization and Discussion appeared first on veloclinic.

↧

Athlete Biological Passport Standard Deviation ?

July 24, 2019, 4:36 pm

≫ Next: Development and field validation of an omni-domain power-duration model

≪ Previous: Cobo Athlete Biological Passport Visualization and Discussion

When I drill down in to data it’s no surprise when I catch small errors and discrepancies pop. But the thing that jumped out when drilling down into the Cobo ABP data http://veloclinic.com/cobo-athlete-biological-passport-visualization-and-discussion/ was that the Z-score model was apparently producing far wider cut offs than the actual ABP. Jeroen Swart pointed this out to me (hopefully I’m not getting him in trouble by dragging him in to this).

Doing some quick cut off hacking I found the Z-score model converged with the ABP output on the example he tweeted if I used 1 standard deviation cut offs rather than 2.3 standard deviation cut offs. See my Cobo post for the quick rationale why the Z-score and ABP models should basically converge given sufficient data points.

Interest piqued, I grabbed some published examples and the first to take on is Zorzoli and Rossi Figure 2 https://onlinelibrary.wiley.com/doi/full/10.1002/dta.173

Figure reproduced for education purposes only.

So first off, the numbers are hard to read due to poor resolution and overlap so I did my best to reproduce them. Not getting the number quite right can affect the work below. Then I plotted the figure date with the Z-score model with 1 standard deviation cut offs overlaid to see how they compare.

From the plots its clear that the models converge very closely on the OFF score and Reticulocyte % and fairly well on the Hgb.

This convergence is a problem for the paper because the paper uses this figure as an example of a likely doped profile:

ABP profile of an athlete considered as suspicious

And states:

In these profiles the Bayesian adaptive model has identified the Hb or Off‐hr score abnormal with a 99% probability (either for the single measurement as a function of previous results or for the complete sequence) or with normal or lower levels of probability.

Meaning that the figure is showing points that are outside the 99th percentile i.e. outside 2.3 standard deviations.

Recall however, I am using 1 standard deviation (67th percentile) as the cut offs for the Z-score model and that the Z-score models and ABP models should converge.

Given that the ABP software is not publicly available I can’t confirm what it statistically to generate the figure used by Zorzoli and Rossi, but I can show my work for the Z-score model: https://drive.google.com/file/d/1YqcRHieehucKumXG9QhcjAsC34Vnd7OH/view?usp=sharing

The question is whether this is a one off issue of stat hacking in a couple of figures used for “illustrative” purposes, or has the ABP black box output not been sufficiently vetted/replicated?

Or is something else entirely going on. For example, the published literature on the ABP is that the cut offs are based on specificity rather than probability and is there some undisclosed doping “prevalence” being passed in to the ABP model which happens to work out to probability cut offs with tighter bounds?

Don’t know, either way interesting…

Thanks for paying attention, cheers.

The post Athlete Biological Passport Standard Deviation ? appeared first on veloclinic.

↧

Development and field validation of an omni-domain power-duration model

March 5, 2020, 8:22 am

≫ Next: Vizualizing U.S. COVID-19 Data: Are we flattening the curve?

≪ Previous: Athlete Biological Passport Standard Deviation ?

Michael J. Puchowicz, Jonathan Baker & David C. Clarke (2020) Development and field validation of an omni-domain power-duration model, Journal of Sports Sciences, DOI: 10.1080/02640414.2020.1735609

Purpose: To validate and compare a novel model based on the critical power (CP) concept that describes the entire domain of maximal mean power (MMP) data from cyclists.

Methods: An omni-domain power-duration (OmPD) model was derived whereby the rate of Wʹ expenditure is bound by maximum sprint power and the power at prolonged durations declines from CP log-linearly. The three-parameter CP (3CP) and exponential (Exp) models were likewise extended with the log-linear decay function (Om3CP and OmExp). Each model bounds Wʹ using a different nonconstant function, Wʹeff (effective Wʹ). Models were fit to MMP data from nine cyclists who also completed four time-trials (TTs).

Results: The OmPD and Om3CP residuals (4 ± 1%) were smaller than the OmExp residuals (6 ± 2%; P < 0.001). Wʹeff predicted by the OmPD model was stable between 120–1,800 s, whereas it varied for the Om3CP and OmExp models. TT prediction errors were not different between models (7 ± 5%, 8 ± 5%, 7 ± 6%; P = 0.914).

Conclusion: The OmPD offers similar or superior goodness-of-fit and better theoretical properties compared to the other models, such that it best extends the CP concept to short-sprint and prolonged-endurance performance.

The post Development and field validation of an omni-domain power-duration model appeared first on veloclinic.

↧

Vizualizing U.S. COVID-19 Data: Are we flattening the curve?

April 2, 2020, 12:09 pm

≫ Next: Three ICU transfers and one death reported in hydroxycholoquine treatment group

≪ Previous: Development and field validation of an omni-domain power-duration model

A critical question when looking at the daily rate of new US COVID-19 cases is whether we are flattening the curve. I haven’t found an adequate visualization that accounts for a major confounder: Is the rate of new case growth slowing, or is our ability to test saturating?

To get a handle on the question the plot below shows the a plot of new deaths / new cases with the CDC assumption that the lag between diagnosis to death is 13 days. Then I plotted this ratio as a percent (red) against the cumulative confirmed cases. Cases confirmed less than 13 days ago are assumed active (blue) and plotted by extrapolating out the most recent deaths / cases percent value. (Edit: I changed to a 10 day lag given that most testing results are probably still not same day).

To interpret this chart, we need to consider what would happen under a couple of different scenarios:

New infection rates are actually decreasing; the red deaths/cases ratio would trend downward to some expected case fatality rate probably and the remaining active cases in blue would decrease. This would indicate a decreasing span between the current number of cumulative cases now and what that number was 13 days ago, and that testing is catching up so that you get a realistic rather than inflated case fatality rate.
New infection rates are not decreasing but testing can’t keep up; the blue span of active cases would decrease but the deaths/cases ratio would go up. The increase in the deaths/cases ratio would be the tip off that testing is falling behind artificially shrinking the blue span.
New infection rates are not decreasing but testing is catching up; the deaths/cases ratio would decrease but the blue span would continue to increase. The decrease in deaths/cases would be the clue that testing is catching up despite the widening span of active cases.

In effect the blue span of active cases tells us how far we still have to go, while the red deaths/cases ratio tells us whether this distance is over or under representative.

Scenario #1 is what we hope for an would allow better forecasting of the likely deaths. Scenario #2 would indicate that deaths are likely to be worse than might be expected from the data at face value. Scenario #3 is not good but at least a more realistic forecasts of the likely deaths could be made.

The post Vizualizing U.S. COVID-19 Data: Are we flattening the curve? appeared first on veloclinic.

↧

Three ICU transfers and one death reported in hydroxycholoquine treatment group

April 5, 2020, 11:00 am

≫ Next: Part 3: Reference ranges for the FPCA model Golden Cheetah open data

≪ Previous: Vizualizing U.S. COVID-19 Data: Are we flattening the curve?

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102549/

An under discussed part of the Guatret 2020 study is the somewhat buried data that the hydroxycholoroquine treatment group had 6 participants “lost to follow up” (a term referring to participants dropped or dropping out of the study) while the control group had no participants lost to follow up. The reasons for loss to follow up in the hydroxycholoquine treatment group included 3 transfers to the ICU, and 1 death.

I have just sent out the email below to Dr Raoult to seek clarification:

Dr Raoult,
I read your study on hydroxychlorquine treatment of COVID-19 with great interest.
The reported results are very promising regarding viral clearance but I am concerned about 4 participants lost to follow up in the treatment group; 3 ICU transfers and 1 death. I am writing to clarify whether I am reading the study correctly. This information indicates that the treatment group had 1 death and the control group had no deaths, is this correct? Also, were all ICU transfers “lost to follow up” ? That is, were there 3 ICU transfers in the treatment group but none in the control group?
If my reading of this study is correct, then does this not indicate a risk for more severe COVID-19 course with hydroxychloroquine treatment? I realize that the groups where not well matched and the older age in the treatment group may explain the severe outcomes. However, given the potential harm suggested by this aspect of the study is caution not warranted?
Sincerely,
Mike

I would prefer to wait for clarification from the author before pushing this issue publicly. However, given the fast moving nature of the COVID-19 response and White House guidance pushing this drug I feel an early alert for caution is necessary.

The post Three ICU transfers and one death reported in hydroxycholoquine treatment group appeared first on veloclinic.

↧

Part 3: Reference ranges for the FPCA model Golden Cheetah open data

December 19, 2018, 10:42 am

≫ Next: Peronnet and Thibault; the source model under the hood of WKO4 WKO5

≪ Previous: Three ICU transfers and one death reported in hydroxycholoquine treatment group

(update: it looks like we should be able to implement the model in GC as an Rchart)

The post Part 3: Reference ranges for the FPCA model Golden Cheetah open data appeared first on veloclinic.

↧

Peronnet and Thibault; the source model under the hood of WKO4 WKO5

September 14, 2020, 5:16 pm

≪ Previous: Part 3: Reference ranges for the FPCA model Golden Cheetah open data

I’ve raised the issue before with Andy Coggan. He has not given proper attribution to Peronnet and Thibault as the source model for his own model.

The Peronnet and Tibault model is a reasonable model that is an extension of Ward-Smith which a modification of LLoyd which is an extension of Hill.

Hill:

P(t) = AWC / t + MAP

AWC is anaerobic work capacity

MAP is maximal aerobic power

LLoyd:

P(t) = AWC / t * (1-exp(-t/tau)) + MAP

tau is the time constant of the exponential function. In plain terms what this function says is that AWC is not instantly available but increase exponentially with time so that the maximum power is constrained.

Ward-Smith:

P(t) = AWC / t * (1-exp(-t/tau)) + MAP * (1-exp(-t/tau2))

tau2 is again a time constant of the exponential function. In plain terms Ward-Smith again said that MAP is not instantly available but instead must increase with oxygen kinetics. If you do the math:

tau = AWC/Pmax (almost, there is technically a tiny MAP contribution since the exponential function does not start from zero so that starting value should be subtracted from Pmax).

Peronnet Thibault:

P(t) = AWC / t * (1-exp(-t/tau)) + MAP * (1-exp(-t/tau2)); t </= Tmap

P(t) = AWC / t * (1-exp(-t/tau)) + MAP * (1-exp(-t/tau2)) – a*Ln(t/Tmap): t > Tmap

a is the slope of the decline in MAP, which decreases log-linearly starting at t > Tmap. Tmap is the longest duration that MAP can be sustained.

Coggan WKO4/5:

P(t) = FRC / t * (1-exp(-t/tau)) + FTP * (1-exp(-t/tau2)); t </= TTE

P(t) = FRC / t * (1-exp(-t/tau)) + FTP * (1-exp(-t/tau2))) – a*Ln(t/TTE): t > TTE

As noted above tau can be substituted with FRC/Pmax. The practical difference between the two models is that P&T specified Tmap as a fixed parameter = 420s while it is a fitted parameter in WKO. (As an aside, to be consistent with oxygen kinetics, which are indeed described by an exponential rise in response to high intensity exercise, the model should actually used the integral of the function above if the modeler was working from first principles as claimed. Unfortunately, when the integral function is used the model does not perform well )

To illustrate that the models are in fact mathematically the same here is an overlay of P&T on top of the output from WKO:

To show that there are actually two models here, I offset P&T slightly:

As an example of how this model can get janky with very low AWC and long tau2 (causing the anaerobic power to fall off before the aerobic power ramps up):

Looks a bit ridiculous with model output (red) actually dipping, which is clearly not physiological as MMP is by definition monotonically decreasing. Likewise, the sharp inflection at Tmap indicates very sub-maximal long duration data.

To be clear, I am not intentionally putting bad numbers in to the model to make it look bad, rather I am just reproducing the curves off a WKO5 help page:

Here are some links to a spreadsheet with the P&T model and a ppt with these last two images overlaid.

Note, there is nothing wrong with using source models and prior work. It is what we did to arrive at a not very different end, source functions cited and all.

The post Peronnet and Thibault; the source model under the hood of WKO4 WKO5 appeared first on veloclinic.

↧