A String Based Method to Recognize Symbols and Structural Textures in Architectural Plans

Josep Lladós, Gemma Sánchez and Enric Martí
Computer Vision Center - Dep. Informàtica.
Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain

E-mail: josep@cvc.uab.es, gemma@cvc.uab.es, enric@cvc.uab.es

Abstract:

One of the promising areas within the field of graphics recognition is the interpretation of maps and plans. An accurate vectorization constitutes a first approach to solve this goal. But vectorization only gives the segments constituting the document and their geometrical attributes. Interpreting a document requires an additional stage: understanding the document in terms of its structural elements. Usually maps and plans contain three types of elements: symbols, structural textures and dimensioning elements. This work focuses on two of these three entities: symbols and textures, proposing a structural method that is able to recognize both of them. This method has been applied to architectural drawings, either printed or hand drawn, consisting of floor plan sketches.

Symbols represent building elements in the plan such as doors, windows, furniture, etc. They have an a priori defined pattern and their recognition is performed by a matching against a set of models stored in a library. In contrast, textures do not represent particular entities but show regions with a particular meaning: tiled floors, level, solid regions, etc. Textures do not have a fixed pattern but are characterized by a repetitive element, called texel, and a structural rule. Interestingly, the problem of texture discrimination in map documents usually assumes a previously defined texture (hatched patterns in the most cases). In this work the only a priori assumed constraint is that texture texel is a polygon with a regular structure but its particular shape and structure will also be inferred in run time.

Since maps and plans are documents mainly composed by lines, attributed graphs are an efficient structure to represent them. In this work the input document is scanned and vectorized obtaining an attributed graph structure. The graph nodes represent the characteristic points of the document (junctions, end points or corner points), and the graph edges represent the lines between characteristic points. Starting from this structure a Region Adjacency Graph (RAG) is computed. The RAG nodes are the regions, i.e. minimal closed loops, of the former graph and the edges are the neigbouring relations between loops. A given graph region (RAG node) is represented by an attributed cyclic string containing the sequence of graph edges defining the region. Thus, the similarity between two regions can be computed in terms of the string edit distance between them. This is the basis of our algorithm. Notice that, since symbols as well as textures can be seen as a set of regions with a particular arrangement, the basis for the recognition is a matching between these regions and those stored in the library of patterns, for symbols, or a matching between neighbouring regions, for textures. The use of string matching techniques offers the advantage that, since a measure of similarity is given, the method can also be applied to disturbed documents such as hand drawn, unaccurately vectorized or insufficient-resolution scanned documents.

Symbols are recognized in terms of an error-tolerant subgraph isomorphism that computes the minimum cost edit sequence which transforms a model RAG, representing the symbol to recognize, to an input RAG, representing the document. This distance measure is considered to be the weighted sum of the costs of edit operations (insertion, deletion and substitution of RAG nodes and edges) to transform one graph to the other. These operations are defined for our RAGs. Notice that, since RAG nodes are represented as strings, the RAG edit operations must be defined in terms of string edit distance. The matching algorithm is a state-space search algorithm based on an algorithm. It expands a search tree such that each state in the tree represents a partial match from a subset of model regions to a subset of input regions. The generation of successor states is guided by the cost of edit operations. That is, the state with the minimum cost is expanded at each step by mapping a new model region to every input region not yet used in this partial match. In this state expansion only the neighbouring regions of matched regions are considered as candidates.

To discriminate textured areas, a hierarchical clustering is computed for the regions of the RAG representing the input document. Initially every region is a cluster. A random weight is associated to each cluster, and the clusters with the highest weight are marked as survivors. Each non-survivor cluster is associated to the more similar neighbouring survivor under a given threshold value. This similarity measure between regions is defined in terms of a weighted sum of the string edit distance between their boundary strings, as stated above, and the difference of their areas. The same method can be used to discriminate textures consisting of isolated elements regularly placed. In this case, a Voronoi tessellation of the input graph must be previously computed. Afterwards, a RAG is also obtained where regions represent Voronoi polygons. Finally, the same clustering procedure stated above is applied to this RAG representing the Voronoi tessellation and, thus, isolated element based textures discriminated.

In summary, an interpretation method for architectural floor plans is proposed. The input line document is represented in terms of its closed loop structure by an RAG. Attributed strings are used to represent the boundaries, and thus the shape, of the region (RAG nodes) boundaries. Using this representation, the similarity between two regions can be computed in terms of string matching procedures. This is the basis to develop two recognition goals within an input document: building elements, by means of graph matching methods, and structural textures, by means of clustering methods. The use of attributed strings to represent the regions of the input document offers two main advantages: the algorithms developed for symbol recognition and texture discrimination have a common basis, and the method also works for disturbed documents because string edit distance is an error-tolerant measure.

Key words: string matching, graph matching, structural textures, hierarchical clustering, plan documents.