We used to wonder how accurate the calorie count on treadmills and ellipticals was. (The answer: not very.)

Now we have more sophisticated devices that track our every move, but the question is still the same: How much can we trust the numbers?

Researchers at Japan’s National Institute of Health and Nutrition recently ran an impressively detailed study to compare 12 different wearable devices, including the Fitbit Flex, Jawbone UP24, Misfit Shine, Garmin Vivofit and Withings Pulse O2, to two widely accepted methods of estimating calorie burn.

The results, published in the May issue of JAMA Internal Medicine, offer some interesting insights.

The best part of this experiment, from my perspective, is that the 19 volunteers in the study wore all 12 wearables at the same time.

There’s a picture in the article of what that looked like, which is pretty hilarious. There were six wrist bands, four waist-mounted devices, and two pocket devices.

One part of the experiment involved spending 24 hours in a metabolic chamber, following a standardized protocol that included “3 meals, deskwork, watching TV, housework, treadmill walking, and sleeping” while their caloric expenditure was measured by carefully monitoring the chamber’s temperature, gas composition, and so on.

On average, the subjects burned 2,093 calories during this 24-hour period. Here’s how far the various wearables deviated from that value:

wearables graph
JAMA Internal Medicine

There are consistent differences in the devices. For example, the Jawbone and Garmin devices underestimated calorie burn by a couple hundred calories on average, while the Fitbit and Misfit both overestimated.

The second part of the experiment took the subjects out of the lab and into the field.

Related: Build Your Best Body At Home With THE 21-DAY METASHRED—a Cutting-Edge Fitness Program From Men’s Health

The subjects drank “doubly labeled water,” which contains rare isotopes of both hydrogen and water; calorie burn can be estimated by tracking how long it takes for those isotope to appear in your urine.

The subjects then spent the next 15 days living their normal lives—while wearing all 12 devices at all times—and collecting their urine for later analysis.

They were allowed to take the devices off “when bathing, special activities in which wearing the devices would be difficult, or when charging the battery.”

Over those 15 days, the subjects burned an average of 2,314 calories per day, according to the doubly labeled water. Here’s how the wearables stacked up, in the same order as the previous graph:

wearables graph 2
JAMA Internal Medicine

Again, the Jawbone and Garmin produce lower values than the Fitbit and Misfit, so that seems like a consistent finding.

Overall, the wearables seem to be underestimating calorie burn this time—but that’s not surprising, since it’s far more likely that over a 15-day period you’d have a greater amount of time with the devices off for charging, sleeping, and those other “special activities.”

The conclusion of the study is that “the findings presented herein suggest that most wearable devices do not produce a valid measure of total energy expenditure.” From a research perspective, where you need to know the actual number of calories being burned, that may be true.

But for real-world use, where the most important thing is getting consistent relative results so you can tell whether you’re burning more or fewer calories compared to a previous day, the results actually seem pretty decent.

It wasn’t just that the averages were pretty close: Statistical analysis showed that for the most part, the devices correctly ranked the individual participants (i.e., if doubly labeled water showed that you were the third-highest calorie burner among the subjects, so did most of the wearable devices).

For self-tracking, that’s probably good enough.

The article How Accurate Is Your Wearable’s Calorie Count? originally ran on RunnersWorld.com.