Reproducing the Elo rating algorithm in Ruby is a little challenge that I took upon myself recently for a small hack. The same Elo rating system that was scrawled upon the glass in "The Social Network" as the algorithm that Mark Zuckerberg used on rank people on FaceMash. As an exercise, here's a pass at it with a little BDD thrown in for good measure.
A new file - elo.rb:
require 'minitest/spec'
require 'minitest/autorun'
describe "result" do
describe "when both players start with rating of 100 and k factor of 30" do
it "must return 115 for the winner" do
new_rating.must_equal 115
end
end
end
Starting a new algorithm is always tricky. We know that the Elo rating system is essentially concerned with assigning ratings to players based on the outcome of their games or matches. In fact, it is widely used as a chess ranking algorithm. At a first glance then, I thought I might want to jump in and start modelling some objects - players, games, ratings, player ranking. But that sounds like a big bite. A lot can go wrong with hunches.....bunches of hunches and hunches in bunches.
Small steps then - what are we doing? The big picture. The major output we want is a new rating! Our first test tries to check that the new rating is equal to 115. Why 115? Because the excellent online elo calculator told me so when I typed a bunch of numbers in there.
The numbers
To get some numbers, I first read wikipedia's entry on the Elo rating system. The algorithm relies on each player having a rating, each game delivering a score to each player and the mysterious "k" value.
I decided, rather arbitrarily, to assume that in my two player match, each player starts with a rating of 100. We could, equally have started with 0, but 100 felt right. Similarly, each player starts with a k value of 30. Using any method you fancy (the aforementioned online elo calculator, or even, working it out by hand), a winner in the above scenario, would have a new rating of 115.
Of course, running our code fails miserably:
Loaded suite elo
Started
E
Finished in 0.001000 seconds.
1) Error:
test_0001_must_return_115_for_the_winner(ResultSpec::WhenBothPlayersStartWithRatingOf100AndKFactorOf30Spec):
NameError: undefined local variable or method `new_rating' for #<ResultSpec::WhenBothPlayersStartWithRatingOf100AndKFactorOf30Spec:0x28a8830>
elo.rb:8:in `block (3 levels) in <class:TestElo>'
1 tests, 0 assertions, 0 failures, 1 errors, 0 skips
Test run options: --seed 62642
Player
We can get this to pass by declaring new_rating = 115. That would be very baby steps. I'm going to go a little faster. We are dealing with a "new rating". What has a rating? A player has a rating. Let's go with that for a moment and create a Player class.
class Player
attr_accessor :rating
end
describe "result" do
describe "when both players start with rating of 100 and k factor of 30" do
it "must return 115 for the winner" do
winner = Player.new
new_rating = winner.rating
new_rating.must_equal 115
end
end
end
Run this and we still fail. However, we fail because our new rating is not 115 as we expect in our test.
1) Failure:
test_0001_must_return_115_for_the_winner(result::when both players start with rating of 100 and k factor of 30) [elo.rb:19]:
Expected: 115
Actual: nil
Getting rid of our nil is easy - we can define a new Player to default their rating to 100 if it isn't specified.
class Player
attr_accessor :rating
def initialize(rating=100)
@rating = rating
end
end
And our test starts to produce some more meaningful failure
1) Failure:
test_0001_must_return_115_for_the_winner(result::when both players start with rating of 100 and k factor of 30) [elo.rb:19]:
Expected: 115
Actual: 100
Current rating, New rating
Implied in all this Elo business is the idea that at some point, we will calculate the new rating based on a game between our players. To that end, we want the ability to update a player's rating based on a game outcome. That sounds like a pretty tall order. It starts me thinking towards game objects and....tooooooo complicated. Let's rather put some functionality into Player in the meantime - just to get it working.
class Player
attr_accessor :rating
def initialize(rating=100)
@rating = rating
end
def calculate_new_rating
@rating = 115
end
end
describe "result" do
describe "when both players start with rating of 100 and k factor of 30" do
it "must return 115 for the winner" do
winner = Player.new
winner.calculate_new_rating
new_rating = winner.rating
new_rating.must_equal 115
end
end
end
Putting the above code together, our tests pass! Great - now all we have to do is write the code to actually do the calculation.
Calculating Elo
Looking back through Wikipedia and translating the algorithm into Ruby, we might end up with something like the following:
def calculate_new_rating
new_rating = rating + k_value * (score - expected_outcome)
end
The k value, as we mentioned earlier, is going to be 30. Score is either 1 or 0 (win or lose - we ignore draws for now). That just leaves us with expected outcome.
Expected rating
We need to have expected outcome in place before we can calculate our actual new rating based on the outcome of our game. Time to code up the equation for expected outcome. I'm going to put it into Player for the moment. It might not actually belong there in the end, because an expected outcome sounds like something that should belong to a Game. But it is a start. Simply copying the formula from Wikipedia, we get the following piece of code:
def expected_outcome
1.0/(1 + 10**((opponent_rating - @rating)/400))
end
Things to note
- We are raising to a power - in Ruby we use ** to achieve this.
- We need the opponent's rating
- We should probably think about floating point arithmetic
We've meandered away from our tests at this point. Let's get back to them and start testing our expected outcome, we should have written this first really:
describe "expected outcome" do
it "must return 0.5 when ratings are equal"
winner = Player.new
winner.expected_outcome.must_equal 0.5
end
end
Even if you don't know the Elo algorithm inside and out - this makes sense - given two opponents who are the same level, the chances that one will win should be 0.5. The algorithm is fairly well documented though, and if you are inclined, you can hunt around and find tables of expected outcomes based on differences between opponent ratings. Running the above crashes - an undefined variable 'opponent_rating'. I'll make it a parameter
def expected_outcome(opponent_rating)
....
Which gives us a wrong number of arguments error. We need to update our test and pass in an opponent rating
describe "expected outcome" do
it "must return 0.5 when ratings are equal"
winner = Player.new
winner.expected_outcome(100).must_equal 0.5
end
end
Passing tests. Good. Now we can flesh out that new rating algorithm. Feeling confident, I'm changing a fair amount in one go here, let's hope it works:
def calculate_new_rating(score, opponent_rating)
@rating = rating + 30* (score - expected_outcome(opponent_rating))
end
and fixing our test
it "must return 115 for the winner" do
winner = Player.new
winner.calculate_new_rating(1, 100)
new_rating = winner.rating
new_rating.must_equal 115
end
Wow - it's good! Let's discuss the code a bit - I changed the calculate_new_rating method to take two parameters - a score and the opponent rating. Then I substituted 30 in for the k_value - at some point this should become a constant perhaps, or a parameter. I also set our rating instance variable upon calculating the new rating. This means that any time I call calculate_new_rating on a Player, it will update that player's rating to be their new rating. I guess I could start running around and adding in variables for current rating, old rating and such, but this feels a bit cleaner to me - you have a rating - as soon as a game is played, your rating should change to reflect that. I am passing in score - currently 1. In our implementation, it will only ever be 1 or 0 - games can be won or lost, and never tied (we are, after all, implementing a facemash like algorithm - there are no draws).
Improvements
This code is a first draft - a good working example - it has its drawbacks though, which become apparent when you start adding in more tests, and especially when you test the very real scenario of calculating the ratings of two players, one after the other. I'll leave you to play with it a bit and see if you can spot the gotchas. I'll pick it up again next time.
Comments
Post a Comment