Can Data Win Your Fantasy League?

Authors: Vidit Tiwari, Navdeep Sirigiri

Dataset Introduction and Question Identification

Data Overview

We’re using a dataset from Pro Football Reference, which includes ~12,000 player-season entries and around 35 features. It covers standard NFL stats (like targets, carries, and touchdowns), fantasy metrics (like points per game and rank), and advanced features like player grades and red zone usage over since 2004.

Main Question

Can we predict a player’s final fantasy football rank for the next season using prior-year stats and player grades?

This matters because millions play fantasy football, and being able to spot breakout players or avoid busts can give players a huge edge.

Key Columns

FantPos: Player’s fantasy position (RB, WR, etc.)
Age: Age of the player during the season
Tgt: Rec, Yds, TD: Receiving stats
Att: Yds, TD: Rushing stats
FantPt: PPR: Total and PPR fantasy points scored
PosRank: OvRank: Player’s position and overall fantasy ranking (target)

Data Cleaning and Exploratory Data Analysis

Data Cleaning

To prepare the dataset for analysis, we first addressed symbolic annotations in player names—specifically asterisks (*) indicating Pro Bowl selections and plus signs (+) indicating All-Pro honors. These symbols were part of the original data collection process from Pro Football Reference, where such accolades are embedded directly into player names. To reflect this, we extracted them into two binary columns (is_probowl and is_allpro) and cleaned the Player column to ensure consistent, identifier-friendly names for joins and tracking.

Ambiguous columns like Yds, Yds.1, and TD.1 stemmed from the original website’s formatting for multiple stat types (e.g., passing, rushing), so we renamed them to Pass_yd, Rush_yd, and Rush_TD for clarity and usability in downstream analysis.

Team abbreviations were standardized (e.g., SDG to LAC, STL to LAR) to align with current franchise naming conventions, accounting for relocations that would otherwise fragment team-based aggregations.

Most missing statistics were filled with zeros, assuming these reflected zero performance rather than true missing data, as the original site omits stats when no play occurred. For example, the 2PM column was often NaN, since many times players don’t record this stat. For the OvRank column we imputed the maximum value for that given year, since all values that were NaN in this column fell below a baseline rank set by Pro Football Reference. We used the maximum value since we deemed players below the the threshold to be equal in rank.

Finally, we created a Next_PosRank column by applying a group-wise shift based on player and position—mirroring the year-to-year progression of the NFL season—to support predictive modeling of future fantasy performance.

Rk Player Tm FantPos Age G GS Cmp Att Pass_yd Pass_TD Int Rush_Att Rush_yd Y/A Rush_TD Tgt Rec Rec_yd Y/R Rec_TD Fmb FL TD.3 2PM 2PP FantPt PPR DKPt FDPt VBD PosRank OvRank -9999 Year is_probowl is_allpro Next_PosRank
3804 34 A.J. Brown TEN WR 22 16 11.0 0.0 0.0 0.0 0.0 0.0 3.0 60.0 20.0 1.0 84.0 52.0 1051.0 20.21 8.0 1.0 0.0 9 0.0 0.0 165.0 217.1 220.1 191.1 36.0 9 34.0 BrowAJ00 2019 0 0 9.0
601 29 A.J. Brown TEN WR 23 14 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 106.0 70.0 1075.0 15.36 11.0 2.0 1.0 12 0.0 0.0 178.0 247.5 251.5 212.5 51.0 9 29.0 BrowAJ00 2020 1 0 32.0
1331 102 A.J. Brown TEN WR 24 13 13.0 0.0 2.0 0.0 0.0 0.0 2.0 10.0 5.0 0.0 105.0 63.0 869.0 13.79 5.0 0.0 0.0 5 0.0 0.0 118.0 180.9 183.9 149.4 0.0 32 0.0 BrowAJ00 2021 0 0 4.0
3134 13 A.J. Brown PHI WR 25 17 16.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 145.0 88.0 1496.0 17.00 11.0 2.0 2.0 11 0.0 0.0 212.0 299.6 304.6 255.6 91.0 4 13.0 BrowAJ00 2022 1 0 8.0
2509 20 A.J. Brown PHI WR 26 17 17.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 158.0 106.0 1456.0 13.74 7.0 2.0 2.0 7 0.0 0.0 184.0 289.6 294.6 236.6 51.0 8 20.0 BrowAJ00 2023 1 0 17.0

Univariate Analysis

Pie chart of Position distribution

This pie chart displays the distribution of fantasy football positions in our dataset, with wide receivers making up the largest group. This suggests that WRs are the most common fantasy players, which could influence drafting depth and positional strategies.

Bivariate Analysis

Points vs Position BoxPlot

This box plot shows the distribution of fantasy points by position, with quarterbacks having the highest median point totals. This suggests that quarterbacks are generally the most valuable fantasy players, followed by running backs.

## Fantasy Points Vs Year Histogram

his histogram shows how the number of fantasy points scored has changed over the years. It reveals an upward trend, indicating that players have been scoring more fantasy points in recent seasons.

Heat Map of Score output per team

This heatmap highlights how the top 10 NFL teams distribute Fantasy Points across positions, revealing which teams are especially strong at specific roles like QB, RB, TE, or WR. It helps identify team-position combinations that consistently produce high fantasy value.

The Prediction Problem

Our goal is to build a regression model that predicts an NFL player’s final fantasy position ranking for the upcoming season, using their performance statistics from the previous season.

Response Variable

Next_PosRank (next season’s fantasy ranking within position) is the response variable we aim to predict.

We chose Next_PosRank because it offers a normalized, position-specific measure of a player’s fantasy value. This makes it especially useful for fantasy football managers when planning draft strategies or identifying breakout candidates within specific roles (e.g., WR, RB, QB).

Type of Prediction

This is a regression problem.
Although ranks are ordinal, we treat them as continuous for prediction purposes, since we aim to estimate the exact or near-exact rank value rather than classify into broad tiers.

We also measure performance not just by exact rank, but how close the prediction is to the actual outcome.

Evaluation Metric

We use Mean Absolute Error (MAE) as our primary evaluation metric.

We chose MAE because:

Baseline Model Predicting Next Position Rank

Model Description

We constructed a ordinal regression model to predict the Next_PosRank (the next position rank) of a player using FantPt and Age.

Features Used

The model used the following features:

The target variable is Next_PosRank, a numeric measure indicating a player’s projected future ranking within their position.

Feature Types Summary

Feature Type Description
FantPt Quantitative Continuous numerical feature
Age Quantitative Continuous numerical feature
FantPos Nominal Categorical feature (e.g., QB, RB)

Encoding and Preprocessing

Model Evaluation

A final MAE of 9.45 means that, on average, the model’s predicted player ranking is off by about 10 ranks.

Positional Accuracy within ±10 Ranks:

The reported metrics demonstrate that the model’s predictions are generally accurate, with 58.60% of predictions within ±10 ranks, which is highly valuable for fantasy managers. The positional accuracy also shows strong performance, particularly for quarterbacks (64.75%), indicating the model is effective across most positions while offering opportunities for further improvement in others.

Final Model

Feature Engineering

To better capture the underlying patterns in the data and enhance the model’s ability to predict player ranks, we introduced several new features:

Modeling Algorithm and Hyperparameter Selection

We used the LogisticAT (ordinal regression with an adjacent-category logistic model) as the model algorithm. This was selected because our target is ordinal. LogisticAT is specifically designed for such tasks, making it more appropriate than standard regression or classification models.

To find the optimal hyperparameter (alpha), we used cross-validation on the training set and selected the value that minimized the Mean Absolute Error (MAE) on validation folds. The best performing alpha was 100.0, which likely provided the right balance between model complexity and regularization.


Final Model vs. Baseline Model

Metric Baseline Model Final Model Improvement
MAE 9.45 8.86 Lower error
Accuracy Score 0.02 0.02 No change
Accuracy ±5 ranks 33.12% 34.56% Higher
Accuracy ±10 ranks 58.77% 65.52% Higher
Accuracy ±15 ranks 78.25% 85.12% Higher
WR ±10 ranks 56.89% 59.88% Higher
TE ±10 ranks 56.77% 63.23% Higher
RB ±10 ranks 64.34% 66.43% Higher
QB ±10 ranks 71.07% 74.84% Higher
FB ±10 ranks 00.00% 00.00% Higher

The final model demonstrated consistent improvements across all key metrics. While the raw accuracy score remains low due to the challenging nature of rank prediction, the model substantially improved on meaningful ordinal metrics like accuracy within 10 and 15 ranks. Notably, accuracy within 10 ranks increased across most positions, with significant jumps for QB, where contextual features like per-game efficiency and role-adjusted ranking were particularly impactful. We also found that since there were not as many FB, it was hard to get proper tests in that category.