The following is a working draft.
In a 2005 working paper, later published in Management Science, the economists Cade Massey and Richard Thaler asked a simple question about the NFL draft. Consider every pair of consecutive players taken at the same position: the first and second quarterbacks drafted in a given year, the second and third offensive linemen, and so on. How often does the earlier pick go on to have a better career than the one taken right after him? If teams can accurately evaluate prospects, the earlier pick should win the majority of those comparisons. If teams are guessing, the rate should sit around 50%.
Measured across the 1991-2002 drafts, the earlier pick came out ahead 53% of the time when performance was tracked by career starts, and 51% when tracked by Pro Bowl selections. "The chance that a player is 'better than the next guy,'" they wrote, "remains very close to chance."
Two decades and more than $100 million a year in collective scouting spending later, we can assume that this number has improved, right?
We reran the test on the 2006-2020 drafts using pVAR as the performance metric. pVAR is a career-value metric that combines per-snap grading, approximate value, and individual honors like All-Pro selections; it plays a role similar to WAR in baseball. (For more detail, see our methodology writeup.) Applied to Massey and Thaler's within-position pairing, pVAR gives the earlier pick in pairwise position comparisons a win rate of 52.9%, nearly identically matching the Massey and Thaler's result from 2005! And if we drop the same-position restriction, comparing instead every pair of consecutive picks in a draft class, we see the earlier pick produce a win rate of only 50.2%. Even with hundreds of millions of dollars at stake, the league's ability to accurately order consecutive picks sits within a few points of random.
The top line tracks within-position accuracy year by year; the bottom tracks overall accuracy. Both stay flat. Neither trends upward across fifteen draft classes, and splitting the sample into an early era (2006-2012) and a late era (2013-2020) produces no difference. Whatever skill teams have at ordering adjacent picks, it has not improved.
A methodology note before moving on. The 2006-2020 sample contains 152 tied pairs where two adjacent picks produced identical pVAR. Most are late-round picks who both ended their careers with zero pVAR: 132 of the 152 are zero-zero pairs, and 91% of the ties come from rounds 5 through 7. Counting those ties as correct, as if the earlier pick still "won," lifts the overall rate from 50.2% to 52.2%, which happens to land almost exactly on Massey-Thaler's figure. We do not count them. When both players contributed nothing, declaring the earlier pick the winner is a measurement artifact, not a finding.
The scatter shows why the adjacent-pick number sits where it does. It plots every player drafted between 2006 and 2020 by their draft position (horizontal axis) against their career pVAR (vertical axis). The teal curve is the expected career value at each pick, smoothed across all years. Everything else is variance. A handful of top-ten picks produced almost nothing. A handful of late-round picks became stars. Most of this story is about the cloud, not the curve.
The simplest explanation is that projecting 21-year-old college players into NFL careers involves irreducible uncertainty. How a player develops physically, adapts to a professional system, responds to coaching, and avoids injury is not the kind of variance that more film study or combine data can eliminate.
Unsatisfied with this notion that the league hasn't gotten any better at evaluating prospects, we wanted to dig into other ways to measure the "in orderness" of draft picks.
Several things stand out with the Massey and Thaler methodology. The first is that pairwise win rates discard magnitude entirely. The 2017 draft put Mahomes eight picks behind Trubisky at the same position. By Massey and Thaler's test that is a single loss, scored the same as any other positional inversion in any other class. But the gap between Mahomes and Trubisky's careers is nearly a factor of ten in pVAR, and most of the expected value of drafting well comes from the tails rather than from the middle of the distribution. A measure that cannot tell a generational miss apart from a routine one will not detect league-wide improvement unless that improvement is spread uniformly across all picks, which is not how scouting gets better when it gets better at all.
The second issue is that pairwise comparisons are a local metric applied to a global question. They ask whether teams can pick between two similar players, not whether the overall ordering of a draft class is monotone. These come apart more often than intuition suggests. Consider two toy drafts of ten players each, where the "career rank" is the rank each player would have if we reordered the class by eventual career value.
| Pick | Draft A: career rank | Draft B: career rank |
|---|---|---|
| 1 | 5 | 2 |
| 2 | 6 | 1 |
| 3 | 3 | 4 |
| 4 | 4 | 3 |
| 5 | 1 | 6 |
| 6 | 2 | 5 |
| 7 | 9 | 8 |
| 8 | 10 | 7 |
| 9 | 7 | 10 |
| 10 | 8 | 9 |
Draft A gets six of nine adjacent pairs correct, a 67% hit rate that looks strong by the adjacent-pair yardstick. But the class's best player went fifth, the second-best went sixth, and picks #7 and #8 produced the two worst careers in the class. The Spearman correlation between draft order and career rank is 0.13. In decision-relevant terms, a team that traded up to pick #1 received the fifth-best player when the actual best was still available at pick #5.
Draft B inverts every adjacent pair: 0 of 9 correct, a 0% hit rate by the same yardstick. But every player finished within one rank of where they were drafted. The top two picks delivered the top two careers, just in swapped order. Spearman's ρ is 0.95. This is the draft every front office wants, and the adjacent-pair metric scores it as a catastrophe.
The point is that "are teams good at ordering the draft" and "do teams win their close local calls" are different questions, and they can point in opposite directions. The second is what Massey and Thaler answered. The first is what we are actually interested in.
A third issue is worth naming. Using games started or Pro Bowl selections as the performance signal, as Massey and Thaler did, mixes talent with opportunity. A first-round bust gets far more starts than a comparably talented late-round pick because coaches are incentivized to play their high picks, which inflates the apparent relationship between draft order and performance. pVAR is built from per-snap grading and individual honors, so it is less exposed to this feedback loop, but the general caution applies to any metric that uses playing time as an input.
One thing Massey and Thaler did get right is the within-position framing. When they compare "adjacent" picks, they do not mean picks #15 and #16 overall. They mean the first and second quarterback taken, or the third and fourth offensive lineman taken, regardless of where those players sit in the overall draft order. In their 1991-2002 sample, the average distance between two adjacent same-position picks is 8.26 overall slots, a number that lines up almost exactly with the 7.8-pick average move in top-two-round trades. The within-position restriction matters because it isolates the comparison teams actually make. Two consecutive overall picks at different positions are not really being evaluated against each other; the ordering between them reflects positional need and board-management as much as prospect quality. Two same-position picks separated by eight slots are a real head-to-head call: every team between them considered and passed on the later player in favor of someone else, and the team that eventually took him presumably would have taken him earlier if they could. The within-position test gives teams more signal to work with, not less. That is what makes the 52% result striking rather than trivial: scouts are getting their considered, competitive, same-position judgments right barely more often than chance.
None of this undermines the headline Massey-Thaler result that high picks are overvalued relative to later ones in surplus-value terms. That finding rests on the salary side of the ledger as much as on the performance side. What it does suggest is that the adjacent-pair test is the wrong instrument for measuring whether teams are getting better at the part of drafting the original paper was not designed to address. To answer that question we need to look at the full ordering of each draft, how much the top of the class matters relative to the middle, and how teams spend their picks across positions. The rest of this piece does exactly that.
Unsatisfied with this notion that the league hasn't gotten any better at evaluating prospects, we wanted to dig into other ways to measure the "in orderness" of draft picks.
Several things stand out with the Masser & Thaler methodology. Looking at adjacent
Beyond Adjacent Picks
In the sections below, we'll Looking at adjacent draft picks, e.g. pick #15 vs
Adjacent picks are the hardest case for any draft evaluation to get right. The two prospects usually look alike on paper, and teams have limited information to separate them. If teams have any ordering skill at all, it should show up more clearly as the gap widens, where the comparison shifts from "which of these two similar prospects is marginally better" to "which of these two very different prospects is the stronger player."
The signal rises, but slowly. Ten picks apart lifts accuracy from 50% to 52.5%, barely above the coin-flip line. Twenty picks apart: 54.6%. Accuracy passes 60% only once the gap reaches about 50 picks, and passes 69% at 100. Translated into round-level terms: a first-rounder beats a third-rounder 74% of the time, and beats a late-round pick closer to 90%.
The same pattern appears when the data is organized by tier rather than by exact gap.
Ordering Accuracy by Draft Tier
How often the earlier tier outproduces, 2006–2020
The diagonal cells, where both picks come from the same tier, all sit near 50%. Step off the diagonal and the accuracy climbs quickly. A Top 10 pick beats a Round 3 pick 79% of the time, and a Round 6-7 pick 89% of the time. Both views tell the same story from different angles. The draft separates categories of prospects reliably. It cannot separate neighbors.
The Full Draft Order
Adjacent-pair accuracy is a narrow test. It compares two picks at a time. A broader measure is Spearman rank correlation, which evaluates the entire ordering of a draft class at once. It is widely used to compare two rankings of the same items.
The idea is straightforward. Take every player in a given draft class and assign each one two ranks: a draft rank (1 for the first pick, 2 for the second, and so on) and an outcome rank (1 for the player with the highest career pVAR, 2 for the next, and so on). Spearman's coefficient, usually written with the Greek letter ρ (rho), measures how closely the two rankings match. If they are identical, ρ equals 1.0. If they have no relationship at all, ρ equals 0. If the ordering is exactly reversed, ρ equals -1.0.
A few concrete examples make the scale tangible. Imagine a draft class of five players, taken in the order A, B, C, D, E.
- If the career-value order turns out identical (A, B, C, D, E), ρ = 1.0.
- If A and B swap at the top (B, A, C, D, E), ρ = 0.9.
- If the fourth pick turns out to be the best player of the group (D, A, B, C, E), ρ = 0.4.
- If the order is random (over many trials), ρ averages 0.
- If it is exactly reversed (E, D, C, B, A), ρ = -1.0.
Applied to the 2006-2020 drafts, full-draft ρ averages 0.47. Restricted to rounds 1 through 4, where most front-office evaluation time is concentrated, ρ drops to 0.41. The full-draft figure benefits from something teams do not have to be especially skillful to capture: the obvious talent gap between a first-rounder and a seventh-rounder. Once that gap is taken out, the remaining correlation reflects the harder decisions teams make in the competitive middle of the draft. Those decisions produce a weaker signal than the headline number suggests.
Neither line trends upward across the fifteen years.
Spearman weights every pair of picks equally, but the draft is not played that way. A missed call on the first overall pick matters more than a missed call on the seventieth. A more decision-relevant measure is the top-K hit rate: of the first K players selected in a given draft class, how many end up among the top K producers of that class by career pVAR? Averaged across the 2006-2020 drafts, 25% of the first 10 drafted players turn out to be among the true top 10 by pVAR. The rate is 43% for the top 32 (round 1), and 64% for the top 100. Teams do much better than random at every level; the random baseline for top-10 overlap is about 4%. But even at the most scrutinized tier of the draft, three-quarters of the players who turn out to be the year's ten highest producers come from outside the first ten picks.
The 2017 draft is a concrete illustration of both measures at once. Mitchell Trubisky went second overall. Patrick Mahomes went tenth. Trubisky has produced 12 pVAR in his career; Mahomes has produced 109. Trubisky was traded twice and is now a backup. Mahomes has won three Super Bowls. One pair of picks captures the two failures together: an ordering scramble (the tenth pick turning out to be the class's most valuable player) and a top-10 miss (Trubisky inside the first ten by draft order, nowhere close to the top ten by career). Most draft classes rearrange things in smaller but comparable ways.
2017 First Round: Draft Order vs Career Value
First-round picks only. Hover to highlight. See the full 2017 redraft →
Evaluation Difficulty by Position
Ordering accuracy is not uniform across positions. The table below reports three complementary measures, each asking a slightly different question about a position's orderability. "% Correct Order" is the within-position adjacent-pair accuracy (excluding ties) across the full 2006-2020 sample, answering the local question of whether teams get consecutive same-position picks in the right order. "ρ (Top 100)" is the Spearman rank correlation between draft order and career pVAR within top-100 picks of that position, averaged across draft classes, answering the holistic question of whether teams rank a position's pool of top prospects correctly as a group. "Top-3 Hit" is the fraction of a position's first three top-100 picks in a given year that turn out to be among the three top pVAR producers at that position that year, averaged across classes. It answers a different question again: are teams good at identifying the elite tier, regardless of whether they get the internal ordering right?
| Position | Comparisons | % Correct Order | ρ (Top 100) | Top-3 Hit |
|---|---|---|---|---|
| QB | 149 | 56.4% | 0.28 | 64% |
| DE | 320 | 55.3% | 0.36 | 51% |
| TE | 198 | 55.1% | 0.35 | 71% |
| LB | 461 | 53.1% | 0.28 | 38% |
| DT | 292 | 52.7% | 0.43 | 51% |
| S | 279 | 52.7% | 0.33 | 56% |
| OL | 587 | 52.6% | 0.40 | 44% |
| DB | 413 | 52.1% | 0.43 | 51% |
| WR | 453 | 51.4% | 0.35 | 47% |
| RB | 295 | 51.2% | 0.36 | 53% |
Quarterback leads the adjacent-pair column at 56.4%, narrowly ahead of defensive end and tight end. This looks paradoxical at first. Quarterback is also the position with the widest range of career outcomes, which would suggest it should be the hardest to rank rather than the easiest.
The paradox dissolves once you notice that wide variance makes adjacent comparisons easier, not harder. The gap between the best quarterback in a draft class and the fifth-best tends to be enormous. The gap between the first wide receiver taken and the fifth is usually small. Ordering two players at a time is easier when their outcomes are far apart, and harder when they are bunched together.
The Spearman column tells a different story. QB is tied with linebacker at the bottom, ρ = 0.28, well behind interior linemen (0.43), defensive backs (0.43), and offensive linemen (0.40). The same bimodal distribution that helps QB win adjacent comparisons on average produces a scrambled overall ranking. Teams get the local direction right more often than not, but the elite quarterback can show up anywhere in the first 100 picks, which weakens the correlation between draft order and eventual value across the pool. Interior linemen and defensive backs look more predictable: the best performer tends to come early, the worst tends to come late, and the ordering in between runs closer to monotonic.
The Top-3 Hit column sorts the positions differently again. Quarterback rebounds to 64%, second only to tight end at 71%. Even though teams cannot order the full top-100 pool of quarterbacks, they do identify the elite group reasonably well: on average, two of the first three quarterbacks drafted each year end up among the three highest-producing quarterbacks from that class. The same holds for tight end, another position with widely spread outcomes. Linebacker is worst by every measure, including the weakest top-3 hit rate (38%). Teams are not only bad at ordering linebackers, they are bad at picking the right ones at the top of the position group. Offensive line is an interesting inversion: decent Spearman (0.40) but below-average top-3 hit (44%). The overall ranking is reasonable because the flat distribution makes the slope of value vs draft position consistent, but the elite offensive lineman is hard to pick out within that flat distribution.
The distribution of first-round outcomes by position makes the shape of each group's variance visible.
The QB panel is bimodal. Most first-round quarterbacks end up in one of two clusters: a pile of busts near zero pVAR, or a cluster of stars above 50. Only 5% of first-round quarterbacks land in the expected range for their draft slot. Offensive line produces a roughly normal bell curve. Wide receiver and running back outcomes cluster low. These shapes determine how hard each position is to rank. QB is not easier to rank because teams know more about quarterbacks. It is easier because most QB classes contain clear stars and clear busts, with little in the middle to confuse the ordering.
Running back, wide receiver, and defensive back sit at the bottom of the accuracy table, all between 51% and 52%. The talent gap between the first and fifth receiver in a typical class is often small enough that ordering them correctly requires distinctions teams have not been able to draw reliably. Offensive line is an interesting case. It has the lowest bust rate of any first-round position, and its outcomes form a roughly normal distribution. But within-position ordering accuracy is 52.6% overall and 52.4% in the top 100 picks. Teams reliably identify first-round offensive linemen as good prospects. They are not reliably picking the best ones first. Low variance helps with a team's floor at the position. It does not help with ordering.
First-Round Capital and Return
If teams cannot improve ordering accuracy within a position, the remaining strategic lever is capital allocation. Draft picks function as a currency: a first-round pick is worth more than a third-round pick, which is worth more than a sixth-round pick, because of the expected career value each slot provides. Every first-round pick used on a running back is a first-round pick not used on a quarterback or a guard. Deciding which positions to spend the most valuable picks on is a strategic choice, somewhat separate from the ordering question.
First-round allocation has shifted meaningfully between the 2006-2015 era and the 2016-2025 era.
Running back share of first-round picks fell from 7.2% to 4.4%. This tracks the analytical consensus that running back production is largely interchangeable: a useful running back can be found in the fourth round with reasonable frequency, which makes a first-round pick at the position an inefficient use of capital. Quarterback share rose from 8.1% to 10.9% and wide receiver from 11.3% to 13.4%, both consistent with the league's continued shift toward the passing game. Defensive tackle and safety both declined. Offensive line edged up from 18.8% to 20.0% and defensive end held at 14.1%, remaining the two largest first-round shares in both eras.
The year-by-year view shows the shape of each trend. Running back declined steadily across the period. Quarterback capital fluctuated from year to year based on which teams happened to hold top picks. Offensive line capital barely moved.
Whether any of these allocation shifts paid off depends on the return each position actually produced. The metric we use for return is value-over-expected, abbreviated VOE. For a given draft pick, VOE is the gap between the pVAR the player actually produced and the pVAR the average player at that draft slot produces. A VOE of +10 means the player outperformed his slot by ten pVAR points. A VOE of -10 means he underperformed by the same amount. Averaged across all first-round picks at a position, VOE indicates whether first-round spending on that position tends to pay off (positive), break even (around zero), or get wasted (negative).
The "Steal %", "Bust %", and "As Expected %" columns in the table below use specific thresholds. A steal is a pick whose VOE exceeds +10, meaning the player substantially outperformed his draft slot. A bust has a VOE worse than -10, meaning substantial underperformance. An "as expected" pick has a VOE within five points of zero. The three percentages do not add to 100 because picks with a VOE between five and ten in either direction fall in a "slight miss" zone that does not qualify for any of the three buckets.
| Pos | Rd 1 Picks | Avg VOE | Steal % | Bust % | As Expected % |
|---|---|---|---|---|---|
| OL | 88 | +5.3 | 45% | 32% | 16% |
| WR | 53 | +1.9 | 38% | 34% | 23% |
| QB | 44 | -0.2 | 41% | 43% | 5% |
| S | 29 | -0.4 | 34% | 41% | 10% |
| RB | 31 | -2.3 | 19% | 48% | 10% |
| DE | 64 | -3.9 | 31% | 45% | 9% |
| DT | 46 | -4.3 | 28% | 50% | 9% |
| DB | 57 | -6.7 | 28% | 46% | 16% |
| TE | 14 | -6.9 | 14% | 43% | 7% |
| LB | 54 | -7.3 | 19% | 54% | 15% |
Offensive line is the only position with a meaningfully positive average VOE. Every other position is negative or close to zero, meaning first-round picks at those positions underperform their draft slot on average. Linebacker is the worst at -7.3, with 54% of first-round linebackers busting outright. Teams appear to have correctly reduced their running back spending in the later era. But first-round spending on linebacker and defensive back has remained elevated despite poor returns over fifteen draft classes.
The 12-point gap between the best positional return (offensive line at +5.3) and the worst (linebacker at -7.3) has held steady across all fifteen classes. It has not closed.
Can Teams Detect Strong Classes?
Ordering correctly within a position is one question. Reading the overall strength of a position's class in a given year is another. Even if a team cannot rank the third-best quarterback against the fifth, it might still detect that one year's quarterback class is deeper than another year's and shift capital accordingly.
If teams can read class strength reliably, the total career value produced by a position's class in a given year should correlate with the total draft capital teams spent on the position that year. Years with deep classes should see more early-round investment. Years with thin classes should see less.
The chart below tests this. Each small panel plots one position. The horizontal axis is the total career value produced by the entire class at that position in a given year. The vertical axis is the total draft capital teams spent on the position that year, weighted by slot value so that a first overall pick counts for more than a third-round pick. Each dot represents one draft year. The r-value shown in each panel is the Pearson correlation between capital spent and value produced: higher means a tighter coupling, closer to zero means teams spent about the same on that position regardless of how strong the class turned out.
Quarterback shows the tightest relationship. In years when the quarterback class turned out well, teams had spent more capital on quarterbacks. A few defensive positions show moderate correlations. Linebacker, running back, and defensive back show almost no relationship: teams spend roughly the same draft resources on those positions whether the class is deep or thin.
Offensive line is a special case. Teams invest heavily in offensive line every year, and that steady investment produces the best first-round return of any position. But the spending does not vary with class strength. Offensive line works for teams not because they correctly identify the strong OL classes, but because they are always buying.
Implications
The ordering problem looks close to a hard ceiling. Accuracy numbers have not moved in thirty years despite enormous growth in scouting resources, analytics staff, and combine measurement. The most plausible explanation is that a large share of the variance in how college players translate to the NFL is simply unpredictable from pre-draft information. Better scouting can sharpen the edges. It cannot close the gap.
Capital allocation does not look like a hard ceiling. The running back correction of the past decade shows teams can respond to evidence when the evidence is clear enough and the correction is cheap enough. The persistence of first-round spending on linebacker and defensive back, two positions with poor returns for fifteen straight years, suggests the correction is incomplete.
Quarterback belongs in its own category. 84% of first-round quarterbacks are clear steals or clear busts, and only 5% land in the expected range. A first-round quarterback is a structurally different bet than a first-round offensive lineman. The expected value can be similar. The shape of the outcome distribution is not.
Teams that treat positional return and class depth as active inputs to their draft strategy, rather than drafting primarily by positional need, have room to gain. Ordering the players correctly may not be possible. Choosing which positions to spend on, and when, mostly still is.
For more on how pVAR is computed, see Introducing pVAR. For the expected value curve and model comparisons, see The Draft Pick Value Curve. To explore individual draft classes, see the season-by-season analysis.
We're just getting started.
Subscribe for more thoughtful, data-driven explorations.
