Revisiting the maintenance of wakefulness test: from intra-/inter-scorer agreement to normative values in patients treated for obstructive sleep apnea.
Pierre TankéréJacques TaillardMarc-Antoine ArmeniThierry PetitjeanChristian BerthomierMélanie StraussLaure Peter-DerexPublished in: Journal of sleep research (2023)
The Maintenance of Wakefulness Test is widely used to objectively assess sleepiness and make safety-related decisions, but its interpretation is subjective and normative values remain debated. Our work aimed to determine normative thresholds in non-subjectively sleepy patients with well-treated obstructive sleep apnea, and to assess intra- and inter-scorer variability. We included maintenance of wakefulness tests of 141 consecutive patients with treated obstructive sleep apnea (90% men, mean (SD) age 47.5 (9.2) years, mean (SD) pre-treatment apnea-hypopnea index of 43.8 (20.3) events/h). Sleep onset latencies were independently scored by two experts. Discordant scorings were reviewed to reach a consensus and half of the cohort was double-scored by each scorer. Intra- and inter-scorer variability was assessed using Cohen's kappa for 40, 33, and 19 min mean sleep latency thresholds. Consensual mean sleep latencies were compared between four groups according to subjective sleepiness (Epworth Sleepiness Scale score < versus ≥11) and residual apnea-hypopnea index (< versus ≥15 events/h). In well-treated non-sleepy patients (n = 76), the consensual mean (SD) sleep latency was 38.4 (4.2) min (lower normal limit [mean - 2SD] = 30 min), and 80% of them did not fall asleep. Intra-scorer agreement on mean sleep latency was high but inter-scorer was only fair (Cohen's kappa 0.54 for 33-min threshold, 0.27 for 19-min threshold), resulting in changes in latency category in 4%-12% of patients. A higher sleepiness score but not the residual apnea-hypopnea index was significantly associated with a lower mean sleep latency. Our findings suggest a higher than usually accepted normative threshold (30 min) in this context and emphasise the need for more reproducible scoring approaches.