如何使用众包方式对一百万张图片进行排名

我想通过制作一个游戏,让网站访问者可以评价他们的风景图片集排名,以找出哪些图片人们觉得最吸引人。

有什么好办法吗?

  • 性感还是不性感?也就是说,显示一张图片,要求用户对其进行1-10的排名。在我看来,这可以让我平均分数,我只需要确保我得到一个平均分布的投票在所有的图像。实现起来相当简单。
  • 选 A 或 B ?例如,显示两张图片,要求用户选择较好的一张。这是有吸引力的,因为没有数字排名,这只是一个比较。但我该怎么实施呢?我的第一个想法是将其作为快速排序,由人类提供比较操作,一旦完成,只需无限地重复排序即可。

会怎么做?

如果你需要数字,我说的是一百万张图片,在一个每天有两万人访问的网站上。我想一小部分人可能会玩这个游戏,为了方便讨论,假设我每天可以生成2000个人工排序操作!这是一个非营利性网站,好奇的人可以通过我的个人资料找到它:)

15143 次浏览

You may want to go with a combination.

First phase: Hot-or-not style (although I would go with a 3 option vote: Sucks, Meh/OK. Cool!)

Once you've sorted the set into the 3 buckets, then I would select two images from the same bucket and go with the "Which is nicer"

You could then use an English Soccer system of promotion and demotion to move the top few "Sucks" into the Meh/OK region, in order to refine the edge cases.

I don't like the Hot-or-Not style. Different people would pick different numbers even if they all liked the image exactly the same. Also I hate rating things out of 10, I never know which number to choose.

Pick A-or-B is much simpler and funner. You get to see two images, and comparisons are made between the images on the site.

Ranking 1-10 won't work, everyone has different levels. Someone who always gives 3-7 ratings would have his rankings eclipsed by people who always give 1 or 10.

a-or-b is more workable.

Pick A-or-B its the simplest and less prone to bias, however at each human interaction it gives you substantially less information. I think because of the bias reduction, Pick is superior and in the limit it provides you with the same information.

A very simple scoring scheme is to have a count for each picture. When someone gives a positive comparison increment the count, when someone gives a negative comparison, decrement the count.

Sorting a 1-million integer list is very quick and will take less than a second on a modern computer.

That said, the problem is rather ill-posed - It will take you 50 days to show each image only once.

I bet though you are more interested in the most highly ranked images? So, you probably want to bias your image retrieval by predicted rank - so you are more likely to show images that have already achieved a few positive comparisons. This way you will more quickly just start showing 'interesting' images.

Most naive approaches to the problem have some serious issues. The worst is how bash.org and qdb.us displays quotes - users can vote a quote up (+1) or down (-1), and the list of best quotes is sorted by the total net score. This suffers from a horrible time bias - older quotes have accumulated huge numbers of positive votes via simple longevity even if they're only marginally humorous. This algorithm might make sense if jokes got funnier as they got older but - trust me - they don't.

There are various attempts to fix this - looking at the number of positive votes per time period, weighting more recent votes, implementing a decay system for older votes, calculating the ratio of positive to negative votes, etc. Most suffer from other flaws.

The best solution - I think - is the one that the websites The Funniest The Cutest, The Fairest, and Best Thing use - a modified Condorcet voting system:

The system gives each one a number based on, out of the things that it has faced, what percentage of them it usually beats. So each one gets the percentage score NumberOfThingsIBeat / (NumberOfThingsIBeat + NumberOfThingsThatBeatMe). Also, things are barred from the top list until they've been compared to a reasonable percentage of the set.

If there's a Condorcet winner in the set, this method will find it. Since that's unlikely, given the statistical nature, it finds the one that's the "closest" to being a Condorcet winner.

For more information on implementing such systems the Wikipedia page on Ranked Pairs should be helpful.

The algorithm requires people to compare two objects (your Pick-A-or-B option), but frankly, that's a good thing. I believe it's very well accepted in decision theory that humans are vastly better at comparing two objects than they are at abstract ranking. Millions of years of evolution make us good at picking the best apple off the tree, but terrible at deciding how closely the apple we picked hews to the true Platonic Form of appleness. (This is, by the way, why the Analytic Hierarchy Process is so nifty...but that's getting a bit off topic.)

One final point to make is that SO uses an algorithm to find the best answers which is very similar to bash.org's algorithm to find the best quote. It works well here, but fails terribly there - in large part because an old, highly rated, but now outdated answer here is likely to be edited. bash.org doesn't allow editing, and it's not clear how you'd even go about editing decade-old jokes about now-dated internet memes even if you could... In any case, my point is that the right algorithm usually depends on the details of your problem. :-)

As others have said, ranking 1-10 does not work that well because people have different levels.

The problem with the Pick A-or-B method is that its not guaranteed for the system to be transitive (A can beat B, but B beats C, and C beats A). Having nontransitive comparison operators breaks sorting algorithms. With quicksort, against this example, the letters not chosen as the pivot will be incorrectly ranked against each other.

At any given time, you want an absolute ranking of all the pictures (even if some/all of them are tied). You also want your ranking not to change unless someone votes.

I would use the Pick A-or-B (or tie) method, but determine ranking similar to the Elo ratings system which is used for rankings in 2 player games (originally chess):

The Elo player-rating system compares players’ match records against their opponents’ match records and determines the probability of the player winning the matchup. This probability factor determines how many points a players’ rating goes up or down based on the results of each match. When a player defeats an opponent with a higher rating, the player’s rating goes up more than if he or she defeated a player with a lower rating (since players should defeat opponents who have lower ratings).

The Elo System:

  1. All new players start out with a base rating of 1600
  2. WinProbability = 1/(10^(( Opponent’s Current Rating–Player’s Current Rating)/400) + 1)
  3. ScoringPt = 1 point if they win the match, 0 if they lose, and 0.5 for a draw.
  4. Player’s New Rating = Player’s Old Rating + (K-Value * (ScoringPt–Player’s Win Probability))

Replace "players" with pictures and you have a simple way of adjusting both pictures' rating based on a formula. You can then perform a ranking using those numeric scores. (K-Value here is the "Level" of the tournament. It's 8-16 for small local tournaments and 24-32 for larger invitationals/regionals. You can just use a constant like 20).

With this method, you only need to keep one number for each picture which is a lot less memory intensive than keeping the individual ranks of each picture to each other picture.

EDIT: Added a little more meat based on comments.

The defunct web site whatsbetter.com used an Elo style method. You can read about the method in their FAQ on the Internet Archive.

These equations from Wikipedia makes it simpler/more effective to calculate Elo ratings, the algorithm for images A and B would be simple:

  • Get Ne, mA, mB and ratings RA,RB from your database.
  • Calculate KA ,KB, QA, QB by using the number of comparisons performed (Ne) and the number of times that image was compared (m) and current ratings :

K

QA

QB

  • Calculate EA and EB.

EA

EB

  • Score the winner's S : the winner as 1, loser as 0, and if you have a draw as 0.5,
  • Calculate the new ratings for both using: New Rating

  • Update the new ratings RA,RB and counts mA,mB in the database.

I like the quick-sort option but I'd make a few tweeks:

  • Keep the "comparison" results in a DB and then average them.
  • Get more than one comparison per view by giving the user 4-6 images and having them sort them.
  • Select what images to display by running qsort and recording and trimming anything that you don't have enough data on. Then when you have enough items recorded, spit out a page.

The other fun option would be to use the crowd to teach a neural-net.

I know this question is quite old but I thought I'd contribute

I'd look at the TrueSkill system developed at Microsoft Research. It's like ELO but has a much faster convergence time (looks exponential compared to linear), so you get more out of each vote. It is, however, more complex mathematically.

http://en.wikipedia.org/wiki/TrueSkill

Wow, I'm late in the game.

I like the ELO system very much so, but like Owen says it seems to me that you'd be slow building up any significant results.

I believe humans have much greater capacity than just comparing two images, but you want to keep interactions to the bare minimum.

So how about you show n images (n being any number you can visibly display on a screen, this may be 10, 20, 30 depending on user's preference maybe) and get them to pick which they think is best in that lot. Now back to ELO. You need to modify you ratings system, but keep the same spirit. You have in fact compared one image to n-1 others. So you do your ELO rating n-1 times, but you should divide the change of rating by n-1 to match (so that results with different values of n are coherent with one another).

You're done. You've now got the best of all worlds. A simple rating system working with many images in one click.

If you prefer using the Pick A or B strategy I would recommend this paper: http://research.microsoft.com/en-us/um/people/horvitz/crowd_pairwise.pdf

Chen, X., Bennett, P. N., Collins-Thompson, K., & Horvitz, E. (2013, February). Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining (pp. 193-202). ACM.

The paper tells about the Crowd-BT model which extends the famous Bradley-Terry pairwise comparison model into crowdsource setting. It also gives an adaptive learning algorithm to enhance the time and space efficiency of the model. You can find a Matlab implementation of the algorithm on Github (but I'm not sure if it works).