Cost-effective HITs for Relative Similarity Comparisons
Similarity comparisons of the form “Is object a more similar to b than to c?” are useful for computer vision and machine learning applications. Unfortunately, an embedding of n points is specified by n3 triplets, making collecting every triplet an expensive task. In noticing this difficulty, other researchers have investigated more intelligent triplet sampling techniques, but they do not study their effectiveness or their potential drawbacks. Although it is important to reduce the number of collected triplets, it is also important to understand how best to display a triplet collection task to a user. In this work we explore an alternative display for collecting triplets and analyze the monetary cost and speed of the display. We propose best practices for creating cost effective human intelligence tasks for collecting triplets. We show that rather than changing the sampling algorithm, simple changes to the crowdsourcing UI can lead to much higher quality embeddings. We also provide a dataset as well as the labels collected from crowd workers.
To build rich embeddings, we must ask our Mechanical Turk workers many questions. Though it pays to get the most “bang for the buck” by asking questions that yield as much information as possible, we must still take the user effort into consideration. Rather than ask one “triangle” question at a time (“Select either b or c”), we present a grid of items to the user and ask them to select the k food items that taste most similar to the probe. An example is shown above.
Food Dataset
We collected a food dataset from Amazon Mechanical Turk. This dataset contains 100 images of food sourced from Yummly.com; see our paper for details. We also distribute 190376 triplet answers from human workers, which represent about 39% of the total possible triplets that we could collect.
FAIR USE STATEMENT:This dataset contains copyrighted material under the educational fair use exemption to the U.S. copyright law.
Download the Food-100 Dataset
Embedding Quality
To demonstrate the impact of our proposed question UI, we collected two embeddings of food from our Yummly dataset. Both of the embeddings below cost $5.10 to collect, but their quality is markedly different.
See the full-size pictures and explore the final embedding results by visiting our embedding explorer!
The embedding below cost $5.10 to collect, but the result is not very good, with poor separation and less structure. Salads are strewn about the right half of the embedding and a steak lies within the dessert area. From our experiments, we know that an embedding of such low quality would have cost us less than $0.10 to collect using our grid strategy.
The embedding below took 408 screens, but yielded 19,199 triplets. It shows good clustering behavior with desserts gathered into the top left. The meats are close to each other, as are the salads. From our experiments, we know that an embedding of this quality would have cost us $192.00 to collect using individually-sampled triplets.
Papers
2020 | |
A Metric Learning Reality Check European Conference on Computer Vision (ECCV), Glasgow, Scotland, 2020. |
|
2017 | |
Conditional Similarity Networks Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017. |
|
2016 | |
Bayesian representation learning with oracle constraints International Conference on Learning Representations (ICLR), San Juan, PR, 2016. |
|
2015 | |
Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, (*Equal Contribution). |
|
Learning Concept Embeddings with Combined Human-Machine Expertise International Conference on Computer Vision (ICCV), 2015. |
|
PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface International Conference on Information and Knowledge Management (CIKM) , Melbourne, 2015. |
|
2014 | |
Cost-Effective HITs for Relative Similarity Comparisons Human Computation and Crowdsourcing (HCOMP), Pittsburgh, 2014. |