HOUGH TRANSFORM APPLICATION TO DIGITIZE RECTANGULAR SPATIAL OBJECTS ON AEROSPACE IMAGERY

Miroshnichenko S.Yu., Titov V.S., Dremov E.N., Mosin S.A. Hough Transform Application to Digitize Rectangular Spatial Objects on Aerospace Imagery. Abstract. The paper describes the method of development for the remote sensing data processing to speed up the digitizing workflow. The method is designed to digitize rectangular objects using their approximate spatial positions and provides an automatic estimation of the orientation and aspect ratio. The paper contains a formal statement of the problem of digitizing an object with the desired geometric shape using it’s apriori known spatial position on a source image. The method creates polygonal representations of rectangular spatial objects from one or a few reference points set by an operator. It is based on source image’s pixels clustering using spectral bands as a feature space. The following Hough transform incorporates local direction of intensity gradient to estimate object’s orientation and reduce computational complexity together with low-pass filtering within an accumulation process to improve robustness. It is shown that the developed method can be modified to digitize objects of any analytically described shape. The method is designed to allow easy user interaction without any significant delays and to provide transparent and predictable control of an output object’s polygon size. To investigate the developed method a test dataset with more than 700 rectangular objects was used. The root-mean-square error of object’s points positioning, mean rotation error in polar coordinates and a Jaccard index were used to measure a precision of the digitized objects. The experiment results demonstrate that digitizing workflow is accelerated by 25–40% using the software implementing the developed method without a significant precision loss.


Introduction. Current stage of Earth remote sensing systems' (RSS)
development is marked with the continuous growth of their quantity.Modern RSS sensors gain better spatial resolution and a wider range of spectral channels.Together these facts turn to a rapidly growing stream of remote sensing data (RSD) that should be processed and georeferenced to have sufficient precision for further use [1,2].RSD processing workflow also should be accelerated to prevent airborne and space imagery lose their actuality.Digital maps are one of the RSD secondary processing main products.Operators create and update digital maps using interactive software instruments (interactive method) that do not imply any assistance to recognize objects or detect their boundaries.An operator digitizes image by finding and recognizing objects using his skills and experience.He defines each object's shape, spatial position, orientation, type and then sequentially places vertexes of a polygon that bounds the object from others and an underlying surface.An operator's actions within the confines of digitizing workflow are guided by editorial and technical requirements that change due to spatial scale and a resulting map's field of application.These requirements regulate object's level of detail, acceptable shape variants, a minimum dis-tance between neighboring objects, etc. for each thematic layer the map contains.A digitizing workflow is laborious; the precision of the result directly depends on operator's skills, experience, concentration and gradually reduces during work time due to his tiredness caused by monotony and diligence of performed actions.
There are some software products for automated RSD processing that provide wizards to create individual rules and step-by-step sequences for spatial objects digitizing and recognition [3][4][5][6].These products are based on object-oriented detection [7] and are aimed to accelerate digital maps creation workflow, but, in practice, in spite of declared features their range of application comes down to greenery and hydrography objects digitizing and calculative and estimative problems solution.The limited range of applications results from the fact that automated RSD processing software has a limited ability to follow editorial and technical requirements and to transform objects' polygonal representations to a demanded geometric shape.
Figure 1 shows the result of space image automated digitizing comparing to the interactive method.Source image on Figure 1a is separated to clusters depending on contour and texture features (Figure 1b) and used further to extract the buildings and structures subset (Figure 1d) with kNN method [8]. Figure 1c displays ground truth polygons of the same objects that were created in the interactive mode according to common editorial and technical requirements.Figure 1c displays ground truth polygons of the same objects created in the interactive mode according to common editorial and technical requirements.The automated result is an object's ''draft'' (Figure 1e shows a difference from the interactively created ground truth polygons) rather than a digital map's item and is only suitable to estimate the number of objects of the desired type, their spatial positions, occupied area, etc.For now, Jaccard index is a de facto standard quantitative measure of RSD automated recognition.The results of automated digitizing require per-object revision and correction to satisfy the minimum precision demands and lead to even greater operator's efforts than a map's interactive creation from scratch.
Results of investigations and competitions devoted to automated RSD processing demonstrate domination of approach based on deep learning artificial neural networks.Modern neural network architectures combined with GPGPU computing technologies allow to find practical solutions of the following space imagery recognition problems: to detect objects of a priori specified type (for instance roads in [10] and vehicles in [11][12]) or several classes (up to 10 in [9,[13][14][15][16]), to find and count certain kinds of animals [17][18].However, we should mention that all developments in the field of deep learning neural networks are in a stage of experimental investigation far from being incorporated into commercial software products and for now are available only for experts in machine learning.Another approach to decrease time costs for RSD processing lies in the development of automated software tools designed to be easily integrated into existing digitizing workflow and help users to perform the most laborious and tedious operations.Also, these tools can transform objects' polygons to the desired shape according to editorial and technical requirements.
ERDAS IMAGINE software contains the ''Region grow'' tool that in combination with a set of vector cleanup operations provide a usercontrolled automated digitizing [19].However, this tool has significant computational complexity (a created polygon appears through at least a couple of seconds after the user action).A vector cleanup process performs after the digitizing was finished for the whole layer and requires more operator's efforts to control output precision compared to the object-to-object approach.Feature Analyst contains a toolset for the assisted extraction that allows a user to digitize building by merely dragging a rectangle that encompasses it [20].The shortcoming of this tool is that the object must be oriented strictly along the screen axes.Our paper focuses on the RSD digitizing process speedup by development and investigation of a new method designed to create rectangular polygonal representations with automatic estimation of orientation and aspect ratio based on the corresponding objects' approximate position.
2. Problem formalization.The essence of the digitizing process is to create a bounding polygon v for each recognized object on the source image I .The bounding polygonal representation (further  polygon) is the polygon separating the object from others and an underlying surface.A priori information referring to the object is obtained by a preliminary recognition by an operator and includes the following: -set of reference points (Figure 2a) representing an approximate spatial position of the object to be described by the polygon v ; -object's shape s that guides a selection of the corresponding digitizing method.
Hence, automated digitizing the object with the rectangular shape sq s is described by:   , , where t  is a threshold value regulating a size of a produced rectangular polygon.Source image I contains K spectral bands, and each is described by a discrete brightness field with linear sizes x M N : where    f  is a vector of brightness values in the pixel   , x y .Reference points of R are set in the source image local coordinate system and can be readily transformed into geodetic or projected coordinates using appropriate matrix [1]: Object polygon v (Figure 2c) is described by a closed sequence of points in the source image local coordinate system: where v N  is a number of points in v , which for a rectangular object equals four.
To get a ground truth digital map, we have to make a parallel projection of every terrain object to an underlying surface with a normal vector n  .
For a perspective image (created with a sensor whose optical axis is set by a vector o  having a non-zero angle with n  ) without a digital surface model (DSM) of sufficient spatial resolution, we encounter coordinate distor-tions shown at Figure 3a.For that reason, the paper considers only orthorectified images which have measurement properties [1] and can be used to create digital maps directly (Figure 3b).Polygon ( 4) of a rectangular object contains four intersection points of four pairwise parallel and pairwise perpendicular straight lines (Figure 4).Coordinates of intersection points are calculated by a solution of the following system of linear equations: x y r r x y r r x y r r x y r r (5)

Developed method. The following requirements guided the method development process:
 the method has to implement functional (1) with the input parameters described by ( 2), (3) and output has the form of (4);  software tool based on the developed method should allow comfortable interaction without any significant (clearly visible by an operator) delays.Hence a time of a single execution of (1) should be less than 500 мs on a workstation with Intel Core i3 processor. the method should provide transparent and predictable control of an output polygon size using a threshold: a higher t value corresponds to a greater area of a produced polygon.
Choice of Hough transform [21] to implement a rectangular objects' digitizing is explained by fusion of its ability to detect straight lines forming an output rectangles' sides with acceptable computational complexity providing an interactive operation mode for a developing software tool.However, a classic Hough transform based straight lines detector combined with edge detection [22] as an object boundaries extractor do not grant to an operator a transparent model to control sizes of output polygon (4) using a threshold value t .
To satisfy the mentioned above requirements, we have chosen a twostage scheme consisting of the following sequentially performed functions: 1.The function   c f  creates a cluster c representing digitized ob- ject's boundaries visible on a source image I (Figure 2b).A function output depends on a reference points set R and a threshold value t : 2. The function   sq f  converts a previously obtained cluster c to a rectangular polygon v using a Hough transform (figure 2c): We have knowingly divided the functional (1) into functions ( 6) and ( 7), besides the advantages described above, to gain an ability to convert a detected on an image cluster to any analytically described shape by merely replacing function (7) with the desired type of Hough transform.

Object detection.
A starting point of function ( 6) is a reflection of a reference points set R into a set   0 С of initial cluster values (Figure 5a): We used maximum brightness difference to incorporate all K spectral bands of the image (2) as a feature space to calculate distance   d  : Further feature space extension by edges, textures or spectrums on the one hand increases object detection precision but on the other hand  complicates distance function (10) and requires preliminary computations to obtain the mentioned features that in total do not comply with the interactivity requirement.
Iterations number of cluster-growing function (9) doesn't have predefined limit.Growing process stops in case no new image pixels can be added to any cluster within a current iteration (Figure 5с): The final operation of function ( 6) is a merge   cm f  of clusters, containing in a set С , into a resulting cluster c (Figure 5d).Clusters pair , i n c c merging requires them to have joint or adjacent pixels: If a resulting set С after the merge ( 12) contains more then one item (i.e., the threshold value is not enough to cover feature space distances (10) between all reference points to produce a single cluster), there are three options of action: 1. Accept a cluster of С with the maximum area as a result of the function ( 6):   , , arg max 2. Let the result of ( 6) contain all produced clusters and transform each of them into a rectangular polygon using (7).
3. Continue an iterative execution of the function ( 9) and increase threshold value t every time the growing process stops due to condition (11) fulfill and a cluster set   m С contains more than one item.The developed method uses the first option because it allows an operator to concentrate on single object detection with sufficient visual control of the workflow together with the least computational complexity.

Object transformation to a rectangular polygon.
To construct rectangular polygon (7), we have to find a solution of four linear equations systems (5), which, in turn, requires to obtain parameters For interactivity reasons, an accumulator plane   h  is filled only by pixels   , x y situated on the external boundary of the cluster c : O p , that do not meet the interactivity requirement.
To obtain a linear asymptotic function, we reduced the number of accumulator plane   h  dimensions by calculating the parameter  directly from a source image as a mode

 
Mo  of gradient vector angle  [22,23] near the cluster's external boundary (15) within a range of   Now when we have a fixed parameter  any pixel   where   G w   is a Gauss function [24], applied as a low-pass filter to improve Hough transform robustness, and, to be more specific, to reduce an adverse effect of image's discreteness and  parameter calculation error on an output polygon precision.
Figure 6a illustrates a direct accumulation of r values similarly to (14) that leads to a 4 r parameter calculation error and the following incorrect position of one of the building's walls.Figure 6b shows an accumulation of low-pass blurred values ( 17) that provides more robust boundaries detection of a digitized object. Parameters , , , , r r r r  obtained through ( 16) and ( 18) are subsequently used to find a solution of linear equation system (5) and to create a resulting polygon in a form (4).

Method implementation.
The implementation of our method has the following features and particular qualities to comply with requirements to interactivity and control predictability described in Section 3: 1. Cluster creation (6) uses only a part of the source image that is visible to an operator and has a screen zoom level.Thus sizes of a processed image part never exceed a monitor resolution that at the moment in most cases is equal to FullHD.A preliminary pyramid computation [25] provides a fast image overviewing for zoom levels that is less the original.If a working area has a zoom level more then 100%, a source image will not be scaled for processing as standard scaling algorithms would supply no additional data.
2. Only K available spectral bands of the source image I form a feature space to distinguish an object from its neighbors and the underlying surface.A cluster stores current mean brightness values for (10) as its attributes and updates them on adding new pixels.
3. Mode of gradient vector's angle ( 16) is computed by Sobel operator combining a sufficient edge detection precision with a low computational cost compared to other operators and Gaussian derivatives [26,27].To further maximize performance a module and an angle of gradient vector are obtained only in the neighborhood (15) of current cluster's external boundary.A histogram with a step equal to 1 deg is applied to select a mode value (we chose the step value empirically).Histogram accumulates pairs ''gradient angle''→''gradient module'' with the following selection of an angle value coheres with the maximum module sum.
4. To reduce an operator's need to interact with a visual interface of software tool implementing our method we transferred a threshold value control to a wheel of a mouse manipulator.Figure 7 illustrates a threshold value effect on a resulting cluster.A further increase of the value over shown on Figure 7d is redundant and leads to cluster growth beyond object's visible boundaries.After one polygon was created, a current threshold value can be applied to digitize another similar object without adjusting to further reduce the number of an operator's actions.
5. During the digitizing, the external boundary is displayed together with corresponding object's polygon as shown on Figure 8 to improve visual perception of the process by an operator and to enhance the comfort of his work.
6.The reference points' setup process must be straightforward and transparent for an operator.It is recommended to set a point for every homogeneous region of an object starting from a larger one or a central one (in case regions sizes are close).After a starting polygon has appeared operator chooses one of the following options: -increases threshold value in case the polygon doesn't cover the whole object, and there is no clear border between its covered and uncovered regions; -places addition reference points in case there is a clear border between the polygon and uncovered areas (mostly one point per region); -decreases threshold value in case the polygon covers some regions that do not belong to a digitizing object.
An operator performs the described actions until the polygon's position and aspect ratio would match a digitizing object.For instance, the building with a span-roof contains two regions due to different sunlight reflection angles of its parts (the left object on Figure 8).To digitize that object an operator should set two reference points to each side of the roof.Figure 8 displays samples of different objects' digitizing with the developed method.We set up to three reference points to detect them: -one point per sample for buildings on Figure 8b, central and right on figure 8d and the larger one on Figure 8a; -two points per sample were used for instances on Figure 8e, 8f, the left on Figure 8d, and the smaller one on Figure 8a; -three points were required to digitize just the object on Figure 8c.The developed method, on the one hand, digitizes objects that consist of several visually distinctive parts (Figure 8c, 8e, 8f) and on the other hand, correctly detects objects that are fractionally hidden or shaded by others with compensation of minor clustering errors.
4. Experiment and results.Within the experiment, we compared an output map precision and time consumption to create it using software based on the developed method and one of the most popular interactive digitizing toolsets of GIS MapInfo [28].Dataset is represented by urban imagery with a spatial resolution of 0.5 m/pixel where mostly comprises multi-apartment residential development and industrial zones, and was used to prepare a ground truth digital map containing 777 rectangular buildings and structures (38% from all objects on the selected territory according to OpenStreetMap data [29]).Figure 10a shows a histogram of objects' sizes on this map.
We engaged two experts to carry out the experiment whose aim was to create a map similar to the ground truth using alternately the interactive toolset and the software based on the developed method.During the process experts did not have to spend time finding objects to digitize themselvesthey used the specific markup layer made from the ground truth map that indicates the desired buildings by crosses as shown on Figure 9.

Precision criterions.
Quantitative measure of precision for digital maps, produced by experts, is represented by the following criterions: 1. Root-mean-square error (RMSE) of object's points positioning which is expressed in pixels and equals to a Euclidian distance to the nearest point of the corresponding ground truth object.
2. Mean rotation error (MRE) expressed in degrees is measured as a minimum angle between straight lines forming the current object and the corresponding ground truth one.
3. Jaccard index (is dimensionless, expressed in percentages) represents the ratio of intersection area of a created object and a corresponding ground truth one to their union's area.
RMSE describes an absolute object positioning error comprising four types of distortions: aspect ratio mismatch, spatial shift, scaling and rotation errors.The last distortion is quantitatively expressed by MRE which was separated from RMSE to estimate a substitution precision of gradient angle mode (16) instead of the rotation angle.The algorithm of the created digital map precision estimation contains the following steps: 1. Find a corresponding object on the ground truth map for each one on the created map by a criterion of maximum intersection area (has to be nonzero).
2. If the ground truth object was found  calculate values of RMSE, MRE, and Jaccard index and write them to the created object's semantic fields to have the ability to analyze the results visually.
3. Calculate mean values of precision criterions for the created map in a whole.

Experiment results.
In total experts created four digital maps within the experiment (two -with the interactive toolset, and two  with the developed method).Table 1 demonstrates mean RMSE, MRE, and Jaccard index values together with digitizing workflow duration.Duration of the digitizing workflow decreased by 39.9% and 26.3% for the first and the second expert respectively.Error values of the digi-tal map created by the second expert using the developed method although are slightly greater comparing to the interactive one, but still are within the range of acceptable values.The first expert gained a more significant time economy but exceeded maximum RMSE and MRE values that for our experiment are set to 2 pixels (mean error of object's border visual detection in an original image scale without zooming) and 1 degree (determined by the a quantization of histogram used to calculate a gradient angle mode).Both experts used two reference points per object on the average.Within the experiment, we also investigated dependencies of RMSE (Figure 10b), MRE (Figure 10c) and Jaccard index (Figure 10d) from a polygon's size to reveal features of the developed method.
RMSE has a direct dependency from a polygon's size.For objects smaller then 15 pixels RMSE value is 25% below the mean both for the interactive method and for the developed one.Similarly, RMSE value is 25% above the mean for objects with sizes higher than 39 pixels because operators subjectively pay less attention to a precision of a more large object.A higher spread of RMSE values for these objects is explained by their less quantity (not enough to get a smooth graph).
A discreetness of the source image explains an MRE reverse dependency from a polygon's size.The positioning error of 1 pixel for an object with the size of 10 pixels causes a rotation error of 6.3 degrees.In contrast, the same positioning error for an object with the size of 50 pixels causes a rotation error of 1.3 degrees.Higher MRE values of the developed method for objects with sizes above 18 pixels appear due to quantization of histogram used to calculate a gradient angle mode (16).Mean Jaccard index value directly depends on polygon's size similar to MRSE that is also caused by a source image's discreetness.A higher error of the interactive method is specified by a capability to reach a sub-pixel precision by digitizing objects on a source image zoomed above the original scale (an operator magnifies the image in 2-5 times and manually sets points of an output polygon between vertexes of image's discrete grid using his skills and experience).

Conclusion and discussion.
We developed a new method for remote sensing data processing designed to digitize rectangular objects using approximate spatial positions providing automatic orientation and aspect ratio estimation.The experimental investigation of the method has shown workflow acceleration by 2540% facing a slight precision reduction.In a shortcomings list, we should mention 20% higher MRE value comparing to the interactive method and an impossibility to reach a sub-pixel precision of a resulting polygon's points positioning.

Fig. 1 .
Fig. 1.The result of space image's (a) automated buildings digitizing (b, d) comparing to the interactive one (c) using Jaccard Index, visually represented on (e)

Fig. 2 .
Fig. 2. Automated digitizing process: reference points (а) are used to detect corresponding objects (b) and then to transform them to a required (rectangular in our case) shape (c)

Fig. 3 .
Fig. 3. Creating a polygon from a perspective image without a DSM (a) and from an orthorectified image (b)

Fig. 4 .
Fig. 4. Polygon of a rectangular object contains four intersection points of four pairwise parallel and pairwise perpendicular straight lines

Fig. 5 .
Fig. 5.An illustration of a reference points reflection into the set of initial clusters (a), their growth (b shows the result of 16th iteration, c -is the final 62nd iteration) and merging to the resulting cluster (d)Next, an iterative cluster-growing function an object represented by a previously detected cluster c .As said before the mathematical basis for function(7)  implementation is a Hough transform that converts source image's I coordinate field to an accumulator plane   h  whose quantized dimensions represent parameters , r  of a straight line equation in Hesse normal form.Each pixel   , x y appears as a curve in a plane   h  by calculating the parameter r from all values n  of the independent parameter  within a range   an external perimeter of the cluster c .To reach a precision of object's spatial position and orientation, sufficient for cartographic products, we should set a quantity N  of parameter's  quants in proportion to c p so, that a rotation of radius vector from coordinate system origin to any cluster boundary pixel by an angle   of its ending to a distance not exceeding one pixel.Hence N  is a function of c p and furthermore an asymptotic of (14) corresponds to   2 с

, x y
of cluster's boundary transforms to a single value of r .Values 1 4 r r  are calculated through two one-dimensional accumulators   what corresponds to a particular pair of straight lines:

Fig. 6 .
An illustration of direct values accumulation (a) that leads to a parameter calculation error and to the subsequent incorrect position of building's wall compared to an accumulation of low-pass blurred values (b) that provides more robust boundaries detection Filled up accumulators are used to select two pairs of parameter valcorresponds to a maximum integral weight in view of a distance between straight lines:

Fig. 8 .
Fig. 8.A visualization of objects' digitizing with the developed method (output polygons' borders are black and the cluster borders are white)

Fig. 9 .
Fig. 9.A specific markup layer that indicates the desired objects by crosses Jaccard index represents a complete relative similarity measure of a created polygon to a ground truth one.The algorithm of the created digital map precision estimation contains the following steps:1.Find a corresponding object on the ground truth map for each one on the created map by a criterion of maximum intersection area (has to be nonzero).2.If the ground truth object was found  calculate values of RMSE, MRE, and Jaccard index and write them to the created object's semantic fields to have the ability to analyze the results visually.3.Calculate mean values of precision criterions for the created map in a whole.4.2.Experiment results.In total experts created four digital maps within the experiment (two -with the interactive toolset, and two  with the developed method).Table1demonstrates mean RMSE, MRE, and Jaccard index values together with digitizing workflow duration.Duration of the digitizing workflow decreased by 39.9% and 26.3% for the first and the second expert respectively.Error values of the digi-

Fig. 10 .
Fig. 10.Histogram of objects polygons' sizes on the experimental dataset (a), dependencies graphs of RMSE (b), MRE (c) and Jaccard index (d) from a polygon's size