Thursday, August 7, 2014

Text Classification on ICDAR

We evaluated our text detection algorithm on ICDAR benchmark. More information can be found here: http://dag.cvc.uab.es/icdar2013competition

We run our algorithm on Challenge 1, Task 2.  Our text detection results can be found in this webpage:
https://dl.dropboxusercontent.com/u/20022261/reports/text_segmentation_benchmark_ICAR.html

The baseline excludes the characters that are merged as missed results. However, we have no interest in making that computation. We compute precision and recall given:

Precision=tp/(tp+fp)
Recall= tp/(tp+mr)

where tp is true positive count, fp is false positive and mr is missed rate. In that case the accuracy of the other algorithms would be as follows:


  precision   recall   f-measure
    0.9252    0.9515    0.9384
    0.8566    0.9094    0.8830
    0.8544    0.8689    0.8617
    0.7708    0.9737    0.8722
    0.8248    0.9638    0.8943
Ours
    0.9083    0.8837    0.8960


Format to read the segmentation of a graphic design

For each image named filename.png I will make a text file named filename.txt with the following format that would keep the detection information:

4
480 640
345  TL 71231 71232 71234 ...
509  TL 87643 87723 87724 ...
5675 GD 35443 35444 35445 ...
6578 IM 45064 45065 45066 ...

The first line indicated the number of elements. The second line indicates the height and the width of the image. Each line after the text line describes an elements. Entries in each line are separated with a space.

The first entry in each line indicates N, the number of pixels in that segment, the second line indicates the type of the segment (TL: Text Line, GD: Graphic Design, IM: Natural Image). This is followed by N integers each indicating the position of a pixel in the segment. These integers are 1-based linear indicators of the pixels. For example, in a 640x480 (horizontal) image, the top left corner is 1, the bottom right corner is 640*480=307200, the bottom left corner is 480, and the top right corner is 640*480-480+1=306721.

Meeting on August 6th, 2014

After seeing the results for Text detection, we discussed the following points during our meeting:

1- Aseem presented a fuzzy matching technique. According to his technique, the elements of a graphic design are finalized after a query is presented. He proposed a graph matching technique (with MRF flavor) that performs a matching between query elements and the graphics elements. Some graphics elements could be merged or discarded during this process.

2- We discussed about how to wrap up the work and discussed the folowing TODO list. 

TODO:
1- Finalize Text detection and report it on ICDAR dataset.
2- Finalize Graphics Annotation
3- Perform training on Graphics Detection
4- Run the algorithm on a larger dataset
5- Write up a description of how our system works
6- Write up technical details of the algorithm for filing a patent.
7- Make a presentation and a poster for the last two days
8- Clean up the code so that it is more readable.
9- Put the detection pieces back in the pipeline and make a final demo.
10- Output detections in an easily readable format that Aseem could later experiment with.