A Protocol for Performance Evaluation of Algorithms for Text Segmentation from Graphics-Rich Documents

Liu Wenyin and Dov Dori

Faculty of Industrial Engineering and Management, Technion--Israel Institute of Technology, Haifa, Israel

liuwy@ie.technion.ac.il, dori@ie.technion.ac.il

We propose an objective, comprehensive, and complexity independent metric for performance evaluation of text segmentation algorithms. The metric includes a positive set and a negative set of indices, at both the character and the character string (text) levels, and it evaluates the detection accuracy of the location, width, height, orientation, skew, string length, and the fragmentation of both characters and strings. Assigning a Segmentation Difficulty (SD) value to the ground truth characters, the performance indices are normalized with respect to the character SD and are therefore independent of the ground truth complexity. The evaluation provides an overall, objective, and comprehensive metric of the text segmentation capability of various algorithms aimed at performing this task.

Keywords: Performance Evaluation, Text Segmentation, Segmentation Difficulty, Document Analysis and Recognition


GREC'97 program