- COCO-Text: Dataset for Text Detection and Recognition COCO-Text is a new large scale dataset for text detection and recognition in natural images.
Version 1.0 of the dataset is out!
63,686 images, 173,589 text instances, 3 fine-grained text attributes.
This dataset is based on the MSCOCO dataset.
Text localizations as bounding boxes
Text transcriptions for legible text
Multiple text instances per image
More than 63,000 images
More than 173,000 text instances
Text instances categorized into machine ...
- Learning Visual Clothing Style with Heterogeneous Dyads ‘What outfit goes well with this pair of shoes?’ To answer this type of questions, one has to go beyond learning visual similarity and learn a visual notion of compatibility across categories. In this paper, we propose a novel learning framework to help answer this type of questions.
With the rapid proliferation of smart mobile devices, ...
- Concept Embeddings with SNaCK This paper presents our work on “SNaCK,” a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. Both parts are complimentary: human insight can capture relationships that are not apparent from the object’s visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints.
- Assistive Technology for the Visually Impaired The contemporary urban environment is brimming with rich visual cues that provide valuable directional and informational content to sighted individuals. The goal of the this project is to make these visual cues universally accessible in a variety of real-world domains.
- Microsoft COCO Microsoft COCO is a new image recognition and segmentation dataset that will be released in Summer 2014. Microsoft COCO has several features:
Recognition in Context
Multiple objects per image
More than 300,000 images
More than 2 Million instances
For more details, see http://mscoco.org/
- Computer Vision Methods For Coral Reef Assessment Across the world, coral reefs, with their delicate ecological balance, are suffering from the effects of climate change and pollution. In order to influence decisions makes to take action, accurate large scale monitoring systems need to be in place. With the proliferation of digital cameras and automatic acquisition system, scientists around the world is already ...
- Visipedia Visipedia, short for “Visual Encyclopedia,” is a network of people and machines that is designed to harvest and organize visual information and make it accessible to anyone anywhere. Visipedia machines can learn from experts how to discover and classify animals, plants and objects in images. Communities of scientists and interested citizens may use Visipedia software ...
- GrOCR GrOCR is an ongoing research project for word recognition in unconstrained images. The name is derived from the original impetus of the project, OCR for reading text on products found in grocery stores. While the focus of the project has moved beyond just that domain, the name has remained the same.
More info on: http://vision.ucsd.edu/~kai/grocr/
- Perception of Reflectance We design and implement a comprehensive study of the perception of gloss. This is the largest study of its kind to date, and the first to use real material measurements. In addition, we develop a novel Multi-Dimensional Scaling (MDS) algorithm for analyzing pairwise comparisons. The data from the psychophysics study and the MDS algorithm is ...
- Facial Attractiveness and Relative Ranking Automatic evaluation of human facial attractiveness is a challenging problem that has received little attention from the computer vision community. Here, we approach beauty from a relative ranking perspective. Our training data are faces sorted based on a individuals’s personal preference, which we use to learn how to rank novel faces according to that person’s ...
- Urban Tribes Image sharing via social networks has produced exciting opportunities for the computer vision community in areas including face, text, product and scene recognition. In this work we turn our attention to group photos of people at different social events. People can guess plenty of implicit information from the visual aspect of a group of people, ...
- Image-based Geolocalization Image-based Geo-localization is a relatively new and challenging problem in Computer Vision. It is simply defined as: given a photo, where was it taken? In this project, we are interested in: 1) localizing a ground-level image with an aerial imagery gallery by cross-view image matching 2) designing a human-in-the-loop system to learn and match the ...
- Cost-effective Perceptual Similarity Spaces Recently in machine learning, there has been a growing interest in collecting human similarity comparisons of the form “Is a more similar to b than to c?” Each answer provides a constraint, $d(a, b) < d(a, c)$, for some perceptual distance function $d$. In computer vision, perceptually-based embeddings can be constructed from many of these ...