I wake up at 7AM, alarm clock blaring. As I get to my feet, I look around frantically. Just a moment earlier, I had been hunched in a bunker, preparing for rocket launch as the bomb sirens blared.
I had been dreaming.
Blinking, I realize I am in my bedroom. And the rocket sirens? My alarm clock.
My tensed shoulders relax and I exhale.
After going to the bathroom and grabbing coffee, I sit down at my computer, beginning my morning ritual of checking Twitter–and my Oura ring’s sleep tracking data.
Throughout the entire sleep cycle, the Oura ring had been tracking every heartbeat and every hand movement. And because the heart’s activity is modulated by the vagus nerve, so the theory goes, the Oura ring can track brain activity by tracking heart activity. And according to company claims, by tracking the brain’s activity via heart activity, it can track sleep stages.
As I look down at my Oura ring’s data about my sleep stages on my smart phone, I see a depressingly familiar sight:
- A sleep-scape pockmarked by white spikes, indicating night-time waking events–I have been nearly 2 hours awake while I had thought I had been sleeping
- 16 minutes of total REM sleep out of more than eight hours in bed “asleep”
- An impressive, nearly 3 hours of deep sleep. Well that’s nice at least
Great. I have a serious sleep disorder.
Then I looked at my overnight heart rate:
Not bad, except for the frequent gigantic spikes. (My wife claims that I often engage in somnolent, heated invective against dream opponents.)
Was that me thrashing about? Awake or asleep? Was I punching the air in the face? I muse.
Then I remember my dream and see that while I was dreaming, the ring recorded me as awake.
Hmm, I think, my brow furrowed.
This calls for PubMed
So I start trawling through PubMed. Here is what I found.
The Oura ring has been compared to polysomnography–the gold standard in sleep staging. While the company boasts that the ring is “scientifically validated” for sleep staging, we should use that term rather loosely. Scientifically validated just means scientifically studied. It actually pretty much sucks for sleep staging.
Here is a graph from a “validation” paper (link):
On the X-axis is the gold standard of polysomnography, and on the Y-axis is the deviation. N3 is deep sleep, and REM is, well, REM. What we see for N3 deep sleep is a nearly 200 minute range of deviation around the actual gold standard value. And these blue dots are not clustering around the 0 on the Y-axis with just a few outliers. No, most of the blue dots are significant outliers.
The same goes for REM sleep. In fact, two of the subjects showed literally 3 hours fewer REM sleep than they actually got. If one of those subjects was me, and I was receiving such values on a consistent basis, then my sleep architecture might in reality be dysfunctional for getting too much REM rather than too little.
In other words, without knowing which blue dot that I am, I have no idea how good or bad my sleep actually is.
From the abstract of the above study, we see rather meager figures:
“From EBE analysis, ŌURA ring had a 96% sensitivity to detect sleep, and agreement of 65%, 51%, and 61%, in detecting “light sleep” (N1), “deep sleep” (N2 + N3), and REM sleep, respectively. Specificity in detecting wake was 48%.”
Specificity in detecting wake was 48%! If this was a medical test, it would never be approved by FDA.
A specificity of 48% means that there is a 48% chance that someone is awake when the device says they are asleep.
That is horrible.
But is it reliably bad? We don’t even know that.
In a recent interview with Matthew Walker, podcaster Peter Attia asked whether, once establishing a baseline, the Oura ring was reliable at least for predicting changes in sleep. A user of the ring, Peter presumably wanted to be reassured that the ring data had some utility. Without elaborating–and I suspect to assuage Peter’s fears–Dr. Walker responded coolly in the affirmative.
But even this is not known. From my searches, nobody has ever actually scientifically studied how reliable the ring is from night to night versus polysomnography. That is to say, nobody knows whether the biases the ring shows on one night for one user are necessarily replicated the following night. Nobody knows whether what it is estimating as sleep is anymore than a very rough estimate that changes substantially from night to night based on factors that are irrelevant to sleep.
The bitter truth: all sleep trackers suck
What about compared to my other sleep tracking device: the Garmin Fenix 5S?
2 hours and 57 minutes of REM sleep! Or 11-fold more REM sleep than my Oura ring.
19 minutes deep sleep! Or 8-fold less deep sleep.
8 minutes awake! Or 14-fold fewer minutes awake.
Which one is right? The answer: they both suck. Because it turns out that many wrist sleep trackers have been validated as well. And they all suck. According to one study, the Fitbit Charge 2 is actually better than the Oura ring. Here are its data:
It still really sucks.
Again, the question isn’t even what the average agreement between these sleep trackers and polysomnography is. The question is WHICH BLUE DOT ARE WE?
Even if these trackers are, say, 60% accurate, that doesn’t mean it is going to be accurate 60% of the time for us. It could be much worse for us than average. Or better. How would we know?
We cannot trust the sleep tracker’s data independent of data from a sleep lab. We cannot even trust it to be biased in a consistent manner. Because those data do not exist either.
The science is clear: if you want to track your sleep, go to a sleep lab
Sleep trackers have the veneer of science. Thus we think the results they report are meaningful. But just because someone has studied a given sleep tracker does not mean that the sleep tracker is reliable. It might be (and in the case of the Oura ring is) shown to be terribly unreliable.
The science of HR tracking of sleep phases is not weak. In fact, at the current stage of technology, the science is that these trackers are demonstrably not reliable.
So unless you have access to a sleep lab that you can use for several nights over a period of time, you have zero idea how accurate your sleep tracker is for you. It might be accurate or it might be terribly inaccurate.
Nocebo is a health risk for using the Oura ring
Companies like Oura that offer sleep tracking should also be very clear about the serious if not disqualifying limitations of their technology. And I now believe that devices with sleep tracking should give the option to users to disable the sleep tracking feature.
Given the evidence of demonstrated nocebo from biomarker tracking in multiple scientific studies, the option to disable sleep tracking on these devices would be prudent indeed.
According to the above study, sleep trackers like the Oura ring can exert nocebo effects that affect our cognition and potentially our health. In the above study’s case, the nocebo affected cognition.
According to other studies, simply receiving genetic data causes our body’s physiology to change in the direction of what that genetic data would predict.
But because nocebo can also affect immunity and a diverse range of other physiological processes (for example, see Jo Marchant’s book Cure or take a look at the research of Harvard professor Ted Kaptchuk), nocebo from sleep tracking technology has the potential to cause chronic health harm on stress or immunity.
When we wake up feeling good, look at our sleep tracker data and see we have had a terrible night of sleep, we might suddenly start not feeling so good–and the ring data itself might be inducing these effects.
We should demand that Oura give the option to disable sleep tracking
Therefore, given this potential negative effects on a broad scale on users experiencing negative (and substantially false) sleep data, the Oura ring company should include the option to disable displaying sleep tracking data altogether.
Why do I keep using my Oura ring? Because nighttime HRV, resting heart rate, and temperature are awesome. But the sleep staging? Not so much.
Besides, even if we find out our sleep is in fact actually terrible, despite keeping a consistent schedule, etc.–what can we actually do about it? It is questionable to what degree the data–even if they were not fatally flawed–are even actionable.
Share this post and demand to Oura that they make sleep staging a feature that can be disabled. For many of us, it should be.
Enjoy this post? Help me smash the wellness industry by supporting me here: https://thedietwars.com/support-me/
Eternal graduate student,
The myth continues to be perpetuated that while the Oura ring may not be valid on an individual level, it can track directional changes from baseline in a way that provides meaningful insight on sleep.
Changes from baseline might indicate real changes in something–sure–but they might not represent actual changes in sleep.
While I cited studies showing that the Oura ring does a poor job compared to gold standard sleep studies, there have been zero studies conducted on the correlation between changes in sleep tracker output over time and gold standard sleep studies. The assertion that Oura ring reliably tracks meaningful changes in sleep quality relative to baseline over time is based on an assumption completely unsupported by any data at all.