Cost-effective HITs for Relative Similarity Comparisons

Similarity comparisons of the form “Is object a more similar to b than to c?” are useful for computer vision and machine learning applications. Unfortunately, an embedding of n points is specified by n3 triplets, making collecting every triplet an expensive task. In noticing this difficulty, other researchers have investigated more intelligent triplet sampling techniques, but they do not study their effectiveness or their potential drawbacks. Although it is important to reduce the number of collected triplets, it is also important to understand how best to display a triplet collection task to a user. In this work we explore an alternative display for collecting triplets and analyze the monetary cost and speed of the display. We propose best practices for creating cost effective human intelligence tasks for collecting triplets. We show that rather than changing the sampling algorithm, simple changes to the crowdsourcing UI can lead to much higher quality embeddings. We also provide a dataset as well as the labels collected from crowd workers.

grid2choose1-UI-example

grid-UI-example

To build rich embeddings, we must ask our Mechanical Turk workers many questions. Though it pays to get the most “bang for the buck” by asking questions that yield as much information as possible, we must still take the user effort into consideration. Rather than ask one “triangle” question at a time (“Select either b or c”), we present a grid of items to the user and ask them to select the k food items that taste most similar to the probe. An example is shown above.

Food Dataset

We collected a food dataset from Amazon Mechanical Turk. This dataset contains 100 images of food sourced from Yummly.com; see our paper for details. We also distribute 190376 triplet answers from human workers, which represent about 39% of the total possible triplets that we could collect.

example-yummly-small

FAIR USE STATEMENT:This dataset contains copyrighted material under the educational fair use exemption to the U.S. copyright law.
Download the Food-100 Dataset

Embedding Quality

To demonstrate the impact of our proposed question UI, we collected two embeddings of food from our Yummly dataset. Both of the embeddings below cost $5.10 to collect, but their quality is markedly different.

See the full-size pictures and explore the final embedding results by visiting our embedding explorer!

The embedding below cost $5.10 to collect, but the result is not very good, with poor separation and less structure. Salads are strewn about the right half of the embedding and a steak lies within the dessert area. From our experiments, we know that an embedding of such low quality would have cost us less than $0.10 to collect using our grid strategy.

human-cvpr-badembedding

The embedding below took 408 screens, but yielded 19,199 triplets. It shows good clustering behavior with desserts gathered into the top left. The meats are close to each other, as are the salads. From our experiments, we know that an embedding of this quality would have cost us $192.00 to collect using individually-sampled triplets.

human-cvpr-goodembedding

Papers

2020

A Metric Learning Reality Check

Musgrave, Kevin; Belongie, Serge; Lim, Ser-Nam

A Metric Learning Reality Check

European Conference on Computer Vision (ECCV), Glasgow, Scotland, 2020.

(Links | BibTeX)

2017

Conditional Similarity Networks

Veit, Andreas; Belongie, Serge; Karaletsos, Theofanis

Conditional Similarity Networks

Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017.

(Links | BibTeX)

2016

Bayesian representation learning with oracle constraints

Karaletsos, Theofanis; Belongie, Serge; Rätsch, Gunnar

Bayesian representation learning with oracle constraints

International Conference on Learning Representations (ICLR), San Juan, PR, 2016.

(Links | BibTeX)

2015

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Veit*, Andreas; Kovacs*, Balazs; Bell, Sean; McAuley, Julian; Bala, Kavita; Belongie, Serge

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, (*Equal Contribution).

(Abstract | Links | BibTeX)

Learning Concept Embeddings with Combined Human-Machine Expertise

Wilber, Michael; Kwak, Iljung; Kriegman, David; Belongie, Serge

Learning Concept Embeddings with Combined Human-Machine Expertise

International Conference on Computer Vision (ICCV), 2015.

(Links | BibTeX)

PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface

Yang, Longqi; Cui, Yin; Zhang, Fan; Pollak, John; Belongie, Serge; Estrin, Deborah

PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface

International Conference on Information and Knowledge Management (CIKM) , Melbourne, 2015.

(Links | BibTeX)

2014

Cost-Effective HITs for Relative Similarity Comparisons

Wilber, Michael; Kwak, Sam; Belongie, Serge

Cost-Effective HITs for Relative Similarity Comparisons

Human Computation and Crowdsourcing (HCOMP), Pittsburgh, 2014.

(Links | BibTeX)