This paper presents our work on “SNaCK,” a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. Both parts are complimentary: human insight can capture relationships that are not apparent from the object’s visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints.
As input, our SNaCK algorithm takes two sources:
Several “relative similarity comparisons.” Each triplet has the form \((a,b,c)\), meaning that in the lower-dimension embedding \(Y\), \(Y_a\) should be closer to \(Y_b\) than it is to \(Y_c\). Experts can generate many of these constraints using crowdsourcing.
Feature vector representations of each point. For instance, such features could come from HOG, SIFT, a deep-learned CNN, word embeddings, and so on.
SNaCK then generates an embedding that satisfies both classes of constraints.
@conference{Qian*2021,
title = {Spatiotemporal Contrastive Video Representation Learning},
author = {Rui Qian* and Tianjian Meng* and Boqing Gong and Ming-Hsuan Yang and Huisheng Wang and Serge Belongie and Yin Cui},
url = {https://vision.cornell.edu/se3/wp-content/uploads/2021/03/CVRL.pdf},
year = {2021},
date = {2021-06-21},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
address = {Virtual},
note = {*Equal Contribution},
keywords = {}
}
@conference{VanHorn2021,
title = {Benchmarking Representation Learning for Natural World Image Collections},
author = {Grant {Van Horn} and Elijah Cole and Sara Beery and Kimberly Wilber and Serge Belongie and Oisin {Mac Aodha}},
url = {https://vision.cornell.edu/se3/wp-content/uploads/2021/03/NeWT.pdf},
year = {2021},
date = {2021-06-21},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
address = {Virtual},
keywords = {}
}
@conference{Hsieh2017,
title = {Collaborative Metric Learning},
author = {Cheng-Kang Hsieh and Longqi Yang and Yin Cui and Tsung-Yi Lin and Serge Belongie and Deborah Estrin},
url = {https://vision.cornell.edu/se3/wp-content/uploads/2017/03/WWW-fp0554-hsiehA.pdf},
year = {2017},
date = {2017-04-03},
booktitle = {International Conference on World Wide Web (WWW)},
address = {Perth},
keywords = {}
}
@conference{concept-embeddings,
title = {Learning Concept Embeddings with Combined Human-Machine Expertise},
author = {Michael Wilber and Iljung S. Kwak and David Kriegman and Serge Belongie},
url = {http://vision.cornell.edu/se3/wp-content/uploads/2015/09/main.pdf},
year = {2015},
date = {2015-12-13},
booktitle = {International Conference on Computer Vision (ICCV)},
keywords = {}
}
A Python implementation of SNaCK is freely available. View SNaCK code and documentation on Github. If you are using Anaconda an Linux or Mac OS X, SNaCK is easy to install. Run $ conda install snack
Otherwise, please follow the instructions in the README file on Github.
Click the button below to launch an IPython Notebook that contains SNaCK and Food-10k:
Data
Download the Food-10k dataset here: Food-10k.tar.xz. This dataset includes 10,000 Yummly IDs of foods and 958,479 triplet constraints collected using crowdsourcing.
The CU-Birds 200 dataset can be found at the Caltech-UCSD Birds 200-2011 webpage. In our experiments, we used the “Birdlets” subset, consisting of the following 14 classes: