Behind the Tennis Recruiting Rankings
by Dallas Oliver, 13 July 2016
|Share: || || |
Since August 2005, Tennis Recruiting has put out weekly rankings of American junior boys and girls. This week marks the 569th consecutive week with Tennis Recruiting rankings.
Rankings are front and center on the TennisRecruiting.net website, and we field questions every week about how our rankings work. Today we describe our ranking system in some detail, providing insight into its interesting mathematical properties - and hopefully addressing these questions for everyone.
Ranking System Overview
The basis of our rating and ranking system is the Bradley-Terry model. Bradley-Terry calculates rating values for each player, and these rating values have interesting mathematical properties:
- If you consider ratings for any pair of players, the ratio of those ratings produces an Expected Win Percentage (EWP).
- Summing up all the EWPs for a player gives you his/her Expected Number of Wins.
- If you consider any player's past results used to calculate a rating, the player's Expected Number of Wins in those matches will be exactly equal to his/her Actual Number of Wins.
That sounds a bit technical, so what does it all mean? Let's take each of these three points in turn.
(1) If you consider ratings for any pair of players, the ratio of those ratings produces an Expected Win Percentage (EWP).
Consider two fictitious players - Jane Doe and Wendy Indigo. Doe is rated 1,000, while Indigo has a 3,000 rating. When these two players meet, Jane Doe's EWP will be:
1,000 ÷ (1,000 + 3,000) = 0.25, or 25%
Likewise, Indigo's EWP will be:
3,000 ÷ (1,000 + 3,000) = 0.75, or 75%
Given two player ratings, we can easily come up with an Expected Win Percentage for each player.
(2) Summing up all the EWPs for a player gives you his/her Expected Number of Wins.
We now consider four matches that Jane Does has against opponents where she has a 75% EWP. The win percentage for each match is 75%, or 0.75, so the Expected Number of Wins in those four matches will be:
0.75 + 0.75 + 0.75 + 0.75 = 3.0
Even though Jane Doe is a favorite in all four matches, we expect only three total wins.
(3) If you consider any player's past results used to calculate a rating, the player's Expected Number of Wins in those matches will be exactly equal to his/her Actual Number of Wins.
This statement points out the true power of the Bradley-Terry model. If Jane Doe goes 8-6 in 14 matches, then the ratings for all players will be set such that the sum of Jane's EWPs for those 14 matches will be exactly 8.0. Note that this property holds for all players in the system.
The next section walks you through a more complete scenario ...
Let's take a look at a year's worth of results for fictitious player Jane Doe. Jane has an 8-6 record, and, to keep things simple, let's again assume that her raw rating is 1,000.
The following table shows information about Jane's record against her 14 fictitious opponents over the past year:
|W-L ||Opponent ||Raw Rating ||EWP ||Expected Wins |
|W ||Bertha Red ||818 ||55% ||0.55 |
|W ||Dolly Blue ||538 ||65% ||0.65 |
|W ||Fay Green ||538 ||65% ||0.65 |
|W ||Hanna Yellow ||333 ||75% ||0.75 |
|W ||Josephine Orange ||333 ||75% ||0.75 |
|W ||Laura Purple ||333 ||75% ||0.75 |
|W ||Nadine Brown ||176 ||85% ||0.85 |
|W ||Paulette White ||53 ||95% ||0.95 |
|L ||Sally Black ||19,000 ||5% ||0.05 |
|L ||Vicky Pink ||5,666 ||15% ||0.15 |
|L ||Wendy Indigo ||3,000 ||25% ||0.25 |
|L ||Andrea Gray ||1,222 ||45% ||0.45 |
|L ||Chantal Magenta ||1,222 ||45% ||0.45 |
|L ||Erin Cyan ||333 ||75% ||0.75 |
|8-6 || ||8.0 |
In the table, the first two columns are self-explanatory with win-loss outcome and opponent name. We shade the wins and losses in the first column based on the EWPs - darker green indicates a stronger win, while darker red indicates more unexpected losses.
The third column in the table shows raw player ratings, while the fourth and fifth columns show Jane's EWP and Expected Number of Wins, respectively, against each opponent. The first eight rows of the table show Jane's wins (which have the green shadings in the left-most column), while the next six rows show her losses (which have the red shadings).
The final row of the table illustrates the fact that the Expected Wins equals the Actual Wins. Note in Column 1 that Jane went 8-6. If we sum up her Expected Wins in Column 5, we get:
0.55 + 0.65 + 0.65 + 0.75 + 0.75 + 0.75 + 0.85 + 0.95 + 0.05 + 0.15 + 0.25 + 0.45 + 0.45 + 0.75 = 8.0
Again, the Expected NUmber of Wins (8.0) is exactly equal to the Actual Number of Wins (8).
We hope this section provided an overview of how our ratings and rankings work - and hints at how they can be used. The next section shows some of the advantages of our rating system.
We spent a lot of time exploring various rating systems to use for the Class Rankings at Tennis Recruiting, and we are very happy with the Bradley-Terry system that we have now. We believe that our ranking system has a number of advantages:
- The ratings are more predictive than any other system we have explored.
- The ratings have interesting mathematical properties with the EWPs.
- The system is straightforward to implement, producing ratings and rankings in a reasonable amount of time that are easy to verify.
Again, let's take these one at a time ...
(1) The ratings are more predictive than any other system we have explored.
As a company, we are happy with the ratings and rankings produced by the Bradley-Terry model. As a predictor of wins and losses - which we believe is a solid measure for any system - it consistently outperforms other models we have evaluated.
While it is impressive that our favorites win 78% of the time, we think it is even more impressive that tournament upsets are in line with our EWPs. As you can see in this analysis of last year's USTA National Championships
, the actual win percentages are in line with our expected win percentages.
(2) The ratings have interesting mathematical properties with the EWPs.
This article has spent time discussing the EWPs and how the Expected Wins perfectly match Actual Wins. We have taken advantage of these interesting properties in several ways - including our forecasts for junior tournaments like the Asics Easter Bowl in April.
We also make use of these properties by producing EWP tables - like the one in the example section above for Jane Doe - for every single player in our system. These EWP tables are already available to college coaches, and we are working on making them available to our premium subscribers as well ... stay tuned.
(3) The system is straightforward to implement, producing ratings and rankings in a reasonable amount of time that are easy to verify.
Although the computation of the Bradley-Terry ratings would be hard to do without the help of a computer, the implementation of the algorithm is straightforward. Our system uses an iterative process: each week, the system starts by assigning all players a rating value of 1,000. (Note that when all players have the same rating, the system would expect all players to have equal numbers of wins and losses.) The system then determines which ratings need to be adjusted up or down based on the differences between their Expected and Actual Wins. A player whose Expected Wins is below Actual Wins needs to be adjusted up - and vice versa. After modifying all the player ratings - which modifies the Expected Wins - we check again. This process is repeated over and over until the Expected Wins equals the Actual Wins for all players - each iteration of the process brings player ratings closer and closer to their correct values.
As we mentioned above, it easy to verify that the answer produced by our system is correct. Like we showed for Jane Doe in the previous section, we can add up the EWPs for any player and it should equal the player's Actual Wins.
Arguing that a player is rated or ranked "wrong" would be the same thing as arguing that the player's win percentage or USTA PPR total is wrong. All metrics are what they are. One could argue that a player record is incomplete or incorrect (which is easily addressed) or that a certain metric is not a good measure of a player's quality, but those arguments are different from claiming that the answer is incorrect.
We close by addressing several questions.
(1) Does your rating system predict the winner for any given match?
The Bradley-Terry model says nothing about which player will win. It expects the higher-rated player to win more often than not, but it also expects the lower-rated opponent to win some percentage of the time. Unless the EWP is 100%, the model does not choose a winner.
Even in the case where the higher-rated player has a 90% EWP, the system makes no definitive claims. Imagine 1,000 matches where the higher-rated player has a 90% EWP. The system expects the lower-rated player to win 100 of those matches. Which 100? The system has no idea.
Predicting which matches the underdog will win would be like predicting a roll of 6 on a standard die. Obviously the 6 will come up one-sixth of the time, but it is impossible to know which rolls.
One interesting thing to note is that every time an underdog with a 10% EWP wins, that result becomes part of the official record - at which point the ratings are recalculated and adjusted based on what actually happened.
Passing judgement on any rating or ranking algorithm simply because a higher-rated player loses is misplaced - there will always be upsets, and there is always some percentage chance the lower-rated player will win.
(2) Can I independently implement Bradley-Terry and reproduce your ratings?
The Bradley-Terry model was first developed in 1952, and variants of it have been used in many different sports. Almost every implementation includes variations on the "vanilla" model - in particular because the vanilla implementation has issues with undefeated players, winless players, and players with very short records.
We have experimented with a variety of techniques to address these problems, and we are satisfied with the implementation that is live on our website today.
As we mentioned above, it is difficult to produce the rankings without the aid of a computer. But there are many websites that discuss implementations of Bradley-Terry for the technically savvy.
(3) The rating numbers you list for the Jane Doe example do not seem to match up with the numbers you list in your forecasts. Why the difference?
We list raw ratings in those tables, and the numbers can fluctuate wildly. Note in that table that the lowest rating is 53 while the largest rating is 19,000. That difference is due to the EWP properties. There are players who have a 99.99% chance of beating an opponent, and those raw ratings differences are astronomical - and kind of depressing for the underdog.
For these reasons, we transform the raw ratings using a logarithmic scale to more palatable Power Ratings. You can see examples of Power Ratings for players in our analysis of 2016 Wimbledon by clicking here.
(4) What data do you use to come up with these numbers?
The Tennis Recruiting database has many years of data from USTA and ITF junior tournaments. The current rankings we display on the TennisRecruiting.net website use the past twelve months of data from tournaments outlined in our FAQ.
Why twelve months? We use that time frame because most tournaments are annual events, and the twelve-month window allows players to replace matches that fall off their records with matches of similar quality. For example, the twelve-month window ensures that highly-ranked American high school players will always have results from the most recent USTA National Championships from Kalamazoo or San Diego counting towards their ratings and rankings.
(5) This whole article seems to be about ratings. What are the rankings?
As we discuss in this ratings and rankings article, rankings are simply orderings of players by ratings. At Tennis Recruiting, we rank by graduation year, so we create one rank list for each class for the boys and girls.
We have answered many questions about our rankings over the years via email, but this article is our attempt at a first-class decription of what we do and how we do it. We welcome your comments below ...
Leave a Comment
More Ranking Articles
What Is An Upset?
Tennis Recruiting is a website that rates and ranks junior tennis
players. One of the questions we get most often from our users is,
"What exactly is an upset?" There are many possible
definitions of an upset - this article explores the question and
puts forward an answer.
An Overview of Ratings and Rankings
Tennis Recruiting is a website that rates and ranks junior tennis
players, and because of that, we field many questions about how to
interpret our lists. Questions like, "Are rankings better than
ratings? Which is more important?" Or, "Since your system
ranks by graduation year, are you able to compare players from
different classes?" This article addresses the simple distinction
between ratings and rankings.
Revisiting the US Open Forecast
The US Open is the premiere event on American soil, and so Tennis
Recruiting pulled out its heat maps once again to forecast the
tournament. There were lots of upsets (Wait, Serena lost?), but let's
take a look at how our predictions played out...