Skip to main content
Iron Within Operations
Field Notes
All insights

The Measurement Stack

Every Iron Within Operations retainer runs on the same four-layer measurement stack. Daily inputs. Weekly aggregates. Monthly recalibrations. Quarterly recompetes. It is not proprietary, it is not complicated, and the entire thing fits in a Google Sheet. The reason it works is not that any single layer is clever; it is that all four layers run together, and skipping any one of them quietly breaks the other three.

What follows is the full stack, what each layer measures, why each layer matters, and what fails when an operator tries to run without one of them. At the end is a short rant about commercial tracking apps that should be read with appropriate skepticism.

Layer 1 — Daily inputs

The daily layer is five numbers, captured within five minutes of waking and one minute after each training session. The point is consistency, not completeness. Five numbers logged every day for a year are worth more than fifteen numbers logged inconsistently for six weeks.

The five are: sleep duration (hours, to the nearest quarter), sleep quality (a 1-5 subjective score, where 3 is normal), training session RPE (logged per top set, 1-10), stress (1-5 subjective score, where 3 is a typical day), and body weight (taken at the same time of day, ideally post-bathroom, pre-food).

Sleep duration is the most predictive single number across our client base. It correlates with HRV, with readiness, with session quality, with body composition, and with mood. If a wearable is in play we cross-reference its sleep estimate against the subjective number; if they diverge by more than 45 minutes we trust the subjective number, because the wearable thinks lying still in the dark counts as sleep.

Sleep quality matters because duration alone misses the difference between a fragmented 8 hours and a clean 7. The 1-5 score is fast, costs nothing, and gives us a second axis on the same variable.

RPE on top sets is the single best feedback signal training gives you. It tells you whether the prescribed load is actually what your body is producing, week over week. RPE drift — top sets creeping from 8 to 9 to 9.5 across a block — is the earliest indicator that recovery has fallen behind volume.

Stress is logged daily because it is the variable that determines whether the training prescription is conservative enough. A 4 or 5 day signals that recovery has to absorb both training load and life load; if stress is high three days in a row, the system recommends a session modification.

Body weight is the daily input people most want to skip. We log it anyway. Not because the daily reading matters — single-day variance is noise — but because the seven-day rolling average is signal, and the only way to get the average is to capture the daily.

Skipping the daily layer: you have no raw material. The other three layers are aggregates of this one. No daily inputs, no system.

Layer 2 — Weekly aggregates

Every Sunday, the daily numbers get rolled up into four weekly numbers. Weekly training volume (total tonnage, or total hard sets per muscle group, depending on the block design). Average sleep duration. Average readiness composite (a simple mean of sleep quality, stress inverse, and subjective recovery score). Training-stress balance — a comparison of acute load (last 7 days) against chronic load (last 28 days), which gives a rough read on whether you are accumulating fatigue or freshening up.

The Sunday review takes 30 minutes. The first fifteen are mechanical: open the sheet, sum the columns, look at the trends. The second fifteen are the part nobody does on their own: one honest question and one decision. The honest question is if I removed my own ego, what would I tell a friend who showed me these numbers? The decision is one change for next week.

The discipline of one change is non-negotiable. Operators who run a Sunday review and come out the other side with five changes are not running a system; they are running an opinion. The system only works if it changes one variable at a time, because that is the only way to know what the change did.

Skipping the weekly layer: this is the most common failure mode. Operators capture daily inputs religiously and never aggregate them. The daily log becomes a journal — interesting, decorative, useless. No weekly review, no feedback loop, no system.

Layer 3 — Monthly recalibration

Every four weeks, the weekly numbers get rolled into a one-page monthly recalibration. It looks at the four-week arc, not the snapshot: PR progression on key lifts (or session quality on accessories), body composition trend (4-week rolling weight average, plus measurements or photos if relevant), recovery trajectory (4-week HRV trend, average sleep, average readiness composite), and behavioral adherence (sessions completed versus planned, Sunday reviews done versus skipped, sleep targets hit versus missed).

The monthly call between client and us is built around this page. The structure: 15 minutes reviewing what the data says, 10 minutes discussing what we think it means, 10 minutes deciding what changes for the next four weeks. Same one-change rule applies. Sometimes the answer is no change — the block is working, hold the line. Saying "hold the line" on a call you are paying for feels strange the first time and right by the third.

Skipping the monthly layer: you catch tactical changes weekly but miss block-level drift. A program that is producing top-set PRs every week can simultaneously be eroding recovery capacity over six weeks, and that erosion is only visible at the monthly view. We have caught this on multiple clients who were thrilled with their weekly numbers and silently digging a hole that would cost them the next block.

Layer 4 — Quarterly recompete

Every 12 to 13 weeks, we run a quarterly recompete. The name is intentional. The original Audit established a baseline; the recompete re-runs a lightweight version of that audit and compares current data against the baseline.

For an athlete-client, the recompete is a meet, a test session, or a defined performance battery. For a founder-client running this as a business operator, the quarterly recompete is a KPI review — top-line revenue, gross margin, hours spent in the gym versus at the desk, and how training has tracked against business load.

The output is a 4-8 page recompete document. It compares the four priority gaps the Audit surfaced against current performance on those same gaps. It identifies new gaps that have surfaced. It defines the next block's optimization target. And it asks the only question that matters at the quarter mark: is this engagement still earning its retainer.

Skipping the quarterly layer: the engagement drifts. The first two months feel productive, the third month is comfortable, the fourth month is comfortable for the wrong reasons. The quarterly recompete is the forcing function that prevents comfortable mediocrity from extending into a year-long retainer that should have been ended at month four.

Tools we use, and why

The whole stack runs on a Google Sheet or a Notion database. That is a deliberate choice and we are tired of defending it. Sheets and Notion are free, exportable, durable, and your data leaves the relationship with you if the engagement ends. None of this is true of commercial tracking apps.

The commercial tracking ecosystem has, over the last five years, optimized for the wrong variables. The apps that win in the App Store win because they have streaks, gamification, ad budgets, and viral loops. They do not win because they help operators run a feedback loop. Several popular apps still cannot export a clean CSV. Several others lock the operator's data behind a subscription such that ending the subscription ends access to two years of training history. Several others have UIs designed around what looks good in a 30-second TikTok, which is precisely the wrong design constraint for a tool you are supposed to use on Sunday afternoons for thirty minutes for ten years.

We have nothing against wearables for what they do well — sleep estimates, HRV trends, resting heart rate. Use one if you have one. The pet peeve is not the wearables; the pet peeve is the people who use the wearable's app as their measurement layer and never aggregate the data into anything they actually look at on a Sunday. The wearable feeds the daily layer. The daily layer feeds the weekly review. The weekly review is the system. The wearable alone is not.

The closing line

The point of measurement isn't data — it's the willingness to course-correct on a Tuesday because Sunday's numbers said so.

Get Scoped — Start the AI