This Fall, I’m playing in an NFL Pick’em contest with some friends from college. Each week, I predict the outcome of every NFL game according to Yahoo’s point-spread. For example, during Week 1 when the Patriots hosted the Texans, Yahoo assigned the Texans a seven-point handicap. My prediction came down to whether the Patriots would beat the Texans by more than seven points, or whether I thought the Texans would lose by no more than seven points (or win outright). This post is about how I will be using data to make each of the 256 predictions over the course of the season.

I remembered reading a blog post by then-statistics professor Michael Lopez about evaluating the outcomes of sporting events against the gambling market.

I’d also read about Michael Beuoy’s Inpredictable, which publishes team rankings based upon the implications of the sports wagering market. The thesis is that the wagering market is an efficient enough economy to derive the relative strengths and weaknesses of each team.

The methodology for the implied rankings is on the Inpredictable, but the basis for the rankings is on Generic Points Favored (GPF). That is, the number of points a team would be favored by against an average opponent on a neutral field.

With a fairly minimal amount of code, I have been using this data to enter my predictions each week. This is a brief overview of how to do so:

Scrape the rankings table from Inpredictable

This creates a dataframe that looks like this:

Team GPF oGPF dGPF W-L Projected Seed pSOS fSOS
LA 7.9 5.6 2.3 3-0 0.996 -3.2 -0.4
KC 5.0 8.1 -3.2 3-0 0.936 3.1 -0.1
LAC 4.4 2.8 1.5 1-2 0.699 2.9 -1.3
PIT 4.3 3.2 1.1 1-1-1 0.538 1.9 1.3
BAL 4.1 2.3 1.8 2-1 0.667 -2.0 1.3
NE 3.7 3.3 0.4 1-2 0.717 1.4 -1.0

Now that I have the up-to-date market-based team ratings, all that’s left is to compare the ratings of teams that are playing each other in the coming week. To do so, I loaded a schedule table, which looks like this:

Week Date Time Away Home
1 September 6 8:20 PM ATL PHI
1 September 9 1:00 PM PIT CLE
1 September 9 1:00 PM CIN IND
1 September 9 1:00 PM TEN MIA
1 September 9 1:00 PM SF MIN
1 September 9 1:00 PM TB NO

I combined the ratings with the schedule data using the dplyr package, and calculated each matchup’s offensive, defensive, and overall ratings delta:

This applies the market ratings to all matchups throughout the season. At this point, the table looks like this:

Date Away Home Favorite Margin
September 20 NYJ CLE NYJ 0.6
September 23 NO ATL ATL 0.1
September 23 CIN CAR CAR 0.1
September 23 NYG HOU HOU 1.8
September 23 TEN JAC JAC 6.4
September 23 SF KC KC 7.0

Since I am only interested in the coming week, I created a visualization with ggplot2 to quickly illustrate the predictions I will make. I wrap this into a function called getMarketRatings, which takes the NFL Schedule week as an input, and provides a match-up ratings plot, and table as output:

The table is displayed within RStudio, while the plot is saved to my computer, and looks like this:

Alt text