Predict Scoring FAQ

Here is everything you wanted to know about the Predict competition’s scoring, in FAQ style. Have a question that’s not answered here? Post it in Discussion and I’ll add it to this list.

First things first: How did my predictions turn into points?

Here’s the simple version. We score you based on how much closer to the eventual outcome you were than the crowd at the time of any of your predictions. You gain points for being more right than the crowd and lose them for being more wrong. The penalty for being wrong and the gain for being right increases quickly, so the further you are away from the crowd in the wrong direction the more you lose — but if you’re right, the more you gain.

Results are weighted by the time from your first prediction to the end of the question. So the longer the question and the earlier you make a prediction, the more total points you can gain or lose. If you jump in at the end of a question you won’t gain or lose much, and short questions are much easier so they won’t swing scores much. But nail a season-long forecast from the beginning, and you will rack up the points.

Now for the nitty gritty version.

The process of scoring works like this. We go back in time and look through every hour a question was open, from the start of the question to its finish. At each hour we look at all the most recent predictions made by every user up to that point and find the median of all of those, which we call the “crowd prediction”.

For every user who made a prediction at that point or earlier, we compare their most recent prediction at that point to the crowd prediction at that point, using something called a Brier score. Brier scoring is the most common form of scoring for probabilistic forecasts. You can read more about Brier scoring for all the details, but for our purposes I’ll just say this: one reason we use a Brier score is that it’s what is called a proper scoring rule, which means that if an event would truly happen 70% of the time, you can’t do better than predicting 70%. You can’t game the system by putting 100% for a set of 70% questions and end up with more points than if you had put 70% for all of them. Brier scores also heavily penalize overconfidence. If you predict 100% or 0% and you’re wrong, you will pay a steep price. So gamble at your own risk. Brier scores essentially measure the error of your prediction, so the further off you are the higher your Brier score.

We subtract the user’s Brier score at that time from the crowd Brier score at that time to get the net Brier score. And then we multiply that net Brier score by a set amount of points to make it more legible. In our case we are using the equivalent of 10 points per day, meaning that for every 0.1 point edge you have on the crowd in Brier score per day, you’ll gain a point. The points from this process are added up, so that you gain or lose more points for predicting earlier and on longer questions.

OK, I think that makes sense. So why don’t I have a rank on the leaderboard? I made predictions!

When I introduced Predict in October, I noted that Phil Tetlock, one of the foremost proponents of forecasting competitions like this one, wrote in Superforecasting that: “Forecasts must have clearly defined terms and timelines. They must use numbers. And one more thing is essential: we must have lots of forecasts.”

That last point is key. With a small number of predictions we can’t really judge how good of a forecaster someone is. They might not have known what they were doing, they might just have gotten lucky. So you need to have predicted at least five completed questions before you will appear on the leaderboard.

As you’ll see, even with five questions the results are still subject to randomness. As of this writing, the top of the leaderboard is filled with users who put 100% or 0% for their questions. They got them right so they got a lot of points, but as we just saw, that’s not a sound strategy over the long run. Predicting 100% or 0% and getting it wrong will lose you a lot of points. But if you manage to get lucky in a small sample (or you have a crystal ball and get it right every time) you will do quite well.

As we add more questions and users keep making predictions, that should take care of itself. Meanwhile, nobody will be able to squat at the top of the leaderboard because you have to keep making predictions to gain more points.

I get that longer questions are worth more. But doesn’t that mean questions about individual games are worth very little, and season-long questions will matter a ton?

Yep, that’s by design. Very short questions won’t swing your score much, but they’re a good exercise in calibration, i.e. getting used to what a 60% chance really means and the price you pay for saying 100% if it doesn’t happen. So don’t sweat too much on those, but approach them more as practice that can get you a few points here and there.

The longest questions, meanwhile, are the hardest and matter the most. If you can predict the Lakers’ playoff chances accurately over a whole season, or who is going to go where in the draft from the start of the college season, that’s worth a lot more as proof of what you know and when you know it. And you’ll be rewarded accordingly.

What’s with the usernames? Where did they come from?

The usernames are the user’s first name and some random digits at the end. Hopefully this is the right balance between anonymity for those that want it with some way to verify that the person is who they say they are. We will ideally add more methods for verifying identity as we move forward.

Again, have a question that’s not answered here? Post it in Discussion and I’ll add it to this list.