Authors: Vidit Tiwari, Navdeep Sirigiri
We’re using a dataset from Pro Football Reference, which includes ~12,000 player-season entries and around 35 features. It covers standard NFL stats (like targets, carries, and touchdowns), fantasy metrics (like points per game and rank), and advanced features like player grades and red zone usage over since 2004.
This matters because millions play fantasy football, and being able to spot breakout players or avoid busts can give players a huge edge.
FantPos
: Player’s fantasy position (RB, WR, etc.)
Age
: Age of the player during the season
Tgt
: Rec, Yds, TD: Receiving stats
Att
: Yds, TD: Rushing stats
FantPt
: PPR: Total and PPR fantasy points scored
PosRank
: OvRank: Player’s position and overall fantasy ranking (target)
To prepare the dataset for analysis, we first addressed symbolic annotations in player names—specifically asterisks (*
) indicating Pro Bowl selections and plus signs (+
) indicating All-Pro honors. These symbols were part of the original data collection process from Pro Football Reference, where such accolades are embedded directly into player names. To reflect this, we extracted them into two binary columns (is_probowl
and is_allpro
) and cleaned the Player
column to ensure consistent, identifier-friendly names for joins and tracking.
Ambiguous columns like Yds
, Yds.1
, and TD.1
stemmed from the original website’s formatting for multiple stat types (e.g., passing, rushing), so we renamed them to Pass_yd
, Rush_yd
, and Rush_TD
for clarity and usability in downstream analysis.
Team abbreviations were standardized (e.g., SDG
to LAC
, STL
to LAR
) to align with current franchise naming conventions, accounting for relocations that would otherwise fragment team-based aggregations.
Most missing statistics were filled with zeros, assuming these reflected zero performance rather than true missing data, as the original site omits stats when no play occurred. For example, the 2PM
column was often NaN, since many times players don’t record this stat. For the OvRank
column we imputed the maximum value for that given year, since all values that were NaN in this column fell below a baseline rank set by Pro Football Reference. We used the maximum value since we deemed players below the the threshold to be equal in rank.
Finally, we created a Next_PosRank
column by applying a group-wise shift based on player and position—mirroring the year-to-year progression of the NFL season—to support predictive modeling of future fantasy performance.
Rk | Player | Tm | FantPos | Age | G | GS | Cmp | Att | Pass_yd | Pass_TD | Int | Rush_Att | Rush_yd | Y/A | Rush_TD | Tgt | Rec | Rec_yd | Y/R | Rec_TD | Fmb | FL | TD.3 | 2PM | 2PP | FantPt | PPR | DKPt | FDPt | VBD | PosRank | OvRank | -9999 | Year | is_probowl | is_allpro | Next_PosRank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3804 | 34 | A.J. Brown | TEN | WR | 22 | 16 | 11.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 60.0 | 20.0 | 1.0 | 84.0 | 52.0 | 1051.0 | 20.21 | 8.0 | 1.0 | 0.0 | 9 | 0.0 | 0.0 | 165.0 | 217.1 | 220.1 | 191.1 | 36.0 | 9 | 34.0 | BrowAJ00 | 2019 | 0 | 0 | 9.0 |
601 | 29 | A.J. Brown | TEN | WR | 23 | 14 | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 106.0 | 70.0 | 1075.0 | 15.36 | 11.0 | 2.0 | 1.0 | 12 | 0.0 | 0.0 | 178.0 | 247.5 | 251.5 | 212.5 | 51.0 | 9 | 29.0 | BrowAJ00 | 2020 | 1 | 0 | 32.0 |
1331 | 102 | A.J. Brown | TEN | WR | 24 | 13 | 13.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 2.0 | 10.0 | 5.0 | 0.0 | 105.0 | 63.0 | 869.0 | 13.79 | 5.0 | 0.0 | 0.0 | 5 | 0.0 | 0.0 | 118.0 | 180.9 | 183.9 | 149.4 | 0.0 | 32 | 0.0 | BrowAJ00 | 2021 | 0 | 0 | 4.0 |
3134 | 13 | A.J. Brown | PHI | WR | 25 | 17 | 16.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 145.0 | 88.0 | 1496.0 | 17.00 | 11.0 | 2.0 | 2.0 | 11 | 0.0 | 0.0 | 212.0 | 299.6 | 304.6 | 255.6 | 91.0 | 4 | 13.0 | BrowAJ00 | 2022 | 1 | 0 | 8.0 |
2509 | 20 | A.J. Brown | PHI | WR | 26 | 17 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 158.0 | 106.0 | 1456.0 | 13.74 | 7.0 | 2.0 | 2.0 | 7 | 0.0 | 0.0 | 184.0 | 289.6 | 294.6 | 236.6 | 51.0 | 8 | 20.0 | BrowAJ00 | 2023 | 1 | 0 | 17.0 |
This pie chart displays the distribution of fantasy football positions in our dataset, with wide receivers making up the largest group. This suggests that WRs are the most common fantasy players, which could influence drafting depth and positional strategies.
This box plot shows the distribution of fantasy points by position, with quarterbacks having the highest median point totals. This suggests that quarterbacks are generally the most valuable fantasy players, followed by running backs.
## Fantasy Points Vs Year Histogram
his histogram shows how the number of fantasy points scored has changed over the years. It reveals an upward trend, indicating that players have been scoring more fantasy points in recent seasons.
This heatmap highlights how the top 10 NFL teams distribute Fantasy Points across positions, revealing which teams are especially strong at specific roles like QB, RB, TE, or WR. It helps identify team-position combinations that consistently produce high fantasy value.
Our goal is to build a regression model that predicts an NFL player’s final fantasy position ranking for the upcoming season, using their performance statistics from the previous season.
Next_PosRank
(next season’s fantasy ranking within position) is the response variable we aim to predict.
We chose Next_PosRank
because it offers a normalized, position-specific measure of a player’s fantasy value. This makes it especially useful for fantasy football managers when planning draft strategies or identifying breakout candidates within specific roles (e.g., WR, RB, QB).
This is a regression problem.
Although ranks are ordinal, we treat them as continuous for prediction purposes, since we aim to estimate the exact or near-exact rank value rather than classify into broad tiers.
We also measure performance not just by exact rank, but how close the prediction is to the actual outcome.
We use Mean Absolute Error (MAE) as our primary evaluation metric.
We chose MAE because:
We constructed a ordinal regression model to predict the Next_PosRank
(the next position rank) of a player using FantPt and Age.
The model used the following features:
FantPt
(Fantasy Points) — QuantitativeAge
— QuantitativeFantPos
(Fantasy Position)** — NominalThe target variable is Next_PosRank
, a numeric measure indicating a player’s projected future ranking within their position.
Feature | Type | Description |
---|---|---|
FantPt | Quantitative | Continuous numerical feature |
Age | Quantitative | Continuous numerical feature |
FantPos | Nominal | Categorical feature (e.g., QB, RB) |
FantPt
, Age
) were passed through without transformation using "passthrough"
.FantPos
) was encoded using One-Hot Encoding via OneHotEncoder
, with handle_unknown="ignore"
to avoid errors from unseen categories during testing.ColumnTransformer
was used to apply these transformations.LinearRegression
model inside a Pipeline
.A final MAE of 9.45 means that, on average, the model’s predicted player ranking is off by about 10 ranks.
The reported metrics demonstrate that the model’s predictions are generally accurate, with 58.60% of predictions within ±10 ranks, which is highly valuable for fantasy managers. The positional accuracy also shows strong performance, particularly for quarterbacks (64.75%), indicating the model is effective across most positions while offering opportunities for further improvement in others.
To better capture the underlying patterns in the data and enhance the model’s ability to predict player ranks, we introduced several new features:
Total_TD
: Total Touchdown summing from each category, rushing, passing etc.Touches
: Combines rushing attempts and receptions which are crucial for fantasy production.YardsPerTouch
: Measures efficiency per opportunity; better efficiency leads to better players.CatchRate
: Reflects how often a player successfully catches a target.TD_PG, RecYards_PG, RushYards_PG
: Normalizing stats per game accounts for variability in games played, which is especially important due to injuries.FantPt_PerTouch
: Measures fantasy productivity per touch.TeamYearRank
: Ranks a player’s fantasy points and how much they contributed to the team.We used the LogisticAT
(ordinal regression with an adjacent-category logistic model) as the model algorithm. This was selected because our target is ordinal. LogisticAT
is specifically designed for such tasks, making it more appropriate than standard regression or classification models.
To find the optimal hyperparameter (alpha
), we used cross-validation on the training set and selected the value that minimized the Mean Absolute Error (MAE) on validation folds. The best performing alpha
was 100.0, which likely provided the right balance between model complexity and regularization.
Metric | Baseline Model | Final Model | Improvement |
---|---|---|---|
MAE | 9.45 | 8.86 | Lower error |
Accuracy Score | 0.02 | 0.02 | No change |
Accuracy ±5 ranks | 33.12% | 34.56% | Higher |
Accuracy ±10 ranks | 58.77% | 65.52% | Higher |
Accuracy ±15 ranks | 78.25% | 85.12% | Higher |
WR ±10 ranks | 56.89% | 59.88% | Higher |
TE ±10 ranks | 56.77% | 63.23% | Higher |
RB ±10 ranks | 64.34% | 66.43% | Higher |
QB ±10 ranks | 71.07% | 74.84% | Higher |
FB ±10 ranks | 00.00% | 00.00% | Higher |
The final model demonstrated consistent improvements across all key metrics. While the raw accuracy score remains low due to the challenging nature of rank prediction, the model substantially improved on meaningful ordinal metrics like accuracy within 10 and 15 ranks. Notably, accuracy within 10 ranks increased across most positions, with significant jumps for QB, where contextual features like per-game efficiency and role-adjusted ranking were particularly impactful. We also found that since there were not as many FB, it was hard to get proper tests in that category.