COCO-Text: Dataset for Text Detection and Recognition

The COCO-Text V2 dataset is out. Check out our brand new website!

Check out the ICDAR2017 Robust Reading Challenge on COCO-Text

COCO-Text is a new large scale dataset for text detection and recognition in natural images.
Version 1.3 of the dataset is out!
63,686 images, 145,859 text instances, 3 fine-grained text attributes.
This dataset is based on the MSCOCO dataset.

  • Text localizations as bounding boxes
  • Text transcriptions for legible text
  • Multiple text instances per image
  • More than 63,000 images
  • More than 145,000 text instances
  • Text instances categorized into machine printed and handwritten text
  • Text instances categorized into legible and illegilbe text
  • Text instances categorized into English script and non-English script



COCO-Text annotations 2017 v1.4
63,686 images, 145,859 text instances (training: 43,686/118,309 training, validation: 10,000/27,550 validation, test: 10,000/no public annotations)

By downloading the annotations, you agree to our Terms of Use.


COCO-Text tools 2016 v1.3
Data API v1.1, Evaluation API v1.3 (update: make sure to use Evaluation API version 1.3)

Also download the images, coco tools and object annotations and from the MSCOCO website 

Terms of Use

The annotations in this dataset belong to the SE(3) Computer Vision Group at Cornell Tech and are licensed under a Creative Commons Attribution 4.0 License.



COCO-Text Explorer

Su, Philip

COCO-Text Explorer

Cornell University CS Department MEng Report, 2016.

(Links | BibTeX)

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

Veit, Andreas; Matera, Tomas; Neumann, Lukas; Matas, Jiri; Belongie, Serge

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

arXiv preprint arXiv:1601.07140, 2016.

(Links | BibTeX)

Annotations and API

For every image, we annotate each text region with an enclosing bounding box. For legible text we aim for one bounding box per word, i.e. an uninterrupted sequence of characters separated by a space, and for illegible text we aim for one bounding box per continuous text region, e.g. a sheet or paper. For the details of our crowdsourced annotation procedure, please see the COCO-Text paper.

Please first download the annotations as well as the API.


The COCO-Text API assists in loading and parsing the annotations in COCO-Text. For details, see and also the coco_text_Demo ipython notebook.

getAnnIds             Get ann ids that satisfy given filter conditions
getImgIds             Get img ids that satisfy given filter conditions
loadAnns               Load anns with the specified ids.
loadImgs               Load imgs with the specified ids.
loadRes                 Load algorithm results and create API for accessing them.

The annotations are stored using the JSON file format. The annotations format has the following data structure:

“info”                         :   info,
“imgs”                       :   [image],
“anns”                       :   [annotation]
“version”                  :   str,
“description”           :   str,
“author”                    :   str,
“url”                           :   str,
“date_created”        :   datetime
“id”                            :   int,
“file_name”             :   str,
“width”                     :   int,
“height”                    :   int,
“set”                          :   str      # ‘train’ or ‘val’

Each text instance annotation contains a series of fields, including an enclosing bounding box, category annotations, and transcription.

“id”                            :   int,
“image_id”               :   int,
“class”                       :   str     # ‘machine printed’ or ‘handwritten’ or ‘others’
“legibility”                :   str     # ‘legible’ or ‘illegible’
“language”               :   str     # ‘english’ or ‘not english’ or ‘na’
“area”                        :   float,
“bbox”                      :   [x,y,width,height],
“utf8_string”            :   str,
“polygon”               :   []

COCO-Text Evaluation API

The COCO-Text Evaluation API assists in computing localization and end-to-end recognition scores with COCO-Text. For details, see and also the coco_text_Demo ipython notebook.

The results format mimics the format of the ground truth as described above. Each result produced by an algorithm is stored in its own struct. The results struct must contain the id of the image for which is was generated. A single image can have multiple results associated to it. All results for the whole dataset are aggregated in a list of result structs. The entire list of structs is stored as a single JSON file.

“image_id”               :   int,
“utf8_string”            :   str,
“bbox”                      :   [x,y,width,height],
“score”                     :   float