We have a user interface to provide per pixel labels for text regions. We have annotated 107 images so far. Both our annotation and text detection output group text regions into text lines.
The next step is to come up with a measure for text detection accuracy. We discussed a few measures:
- Ignore text group labels to get a zero-one map of pixels that indicates text pixels. Compute intersection area, false positive area, and missed area.
- Normalize according to text size
- In a graph framework, measure the similarity in connected component graph rather than in pixels. Measure the intersection of the edges that are on and off.
- To take into account text group labels as well, perform a bipartite matching between text regions and define the similarity of each group with its corresponding group according to one of the above measures.
- http://en.wikipedia.org/wiki/Rand_index is the combination of the above techniques.
We also discussed strategies to group connected component:
- We will focus on graph partitioning for now even though having overlapping regions may later improve performance. For graph partitioning we have a number of techniques:
- One could merge components in a pairwise fashion.
- One could think of a ransac style technique that scores text proposals. Those text proposals could be potentially overlapping.
No comments:
Post a Comment