Analysis of Compression Techniques for Stereoscopic Images

. Virtual Reality (VR) and Augmented Reality (AR) Head-Mounted Displays (HMDs) have been emerging in the last years and they are gaining an increased popularity in many industries. HMDs are generally used in entertainment, social interaction, education, but their use for work is also increasing in domains such as medicine, modeling and simulation. Despite the recent release of many types of HMDs, two major problems are hindering their widespread adoption in the mainstream market: the extremely high costs and the user experience issues [1]. The illusion of a 3D display in HMDs is achieved with a technique called stereoscopy. Applications of stereoscopic imagining are such that data transfer rates and–in mobile applications–storage quickly become a bottleneck. Therefore, efficient image compression techniques are required. Standard image compression techniques are not suitable for stereoscopic images due to the discrete differences that occur between the compressed and uncompressed images. The issue is that the loss in lossy image compression may blur the minute differences between the left-eye and right-eye images that are crucial in establishing the illusion of 3D perception. However, in order to achieve more efficient coding, there are various coding techniques that can be adapted to stereoscopic images. Stereo image compression techniques that can be found in the literature utilize discrete Wavelet transformation and the morphological compression algorithm applied to the transform coefficients. This paper provides an overview and comparison of available techniques for the compression of stereoscopic images, as there is still no technique that is accepted as best for all criteria. We want to test the techniques with users who would actually be potential users of HMDs and therefore would be exposed to these techniques. Also, we focused our research on low-priced, consumer grade HMDs which should be available for larger population.


Introduction.
Stereoscopy/stereovision is a technique for making an illusion of image depth which relies on the phenomenon of stereopsis (binocular depth perception) based on the difference between the images that we see with the left and right eye, Figure 1.These are the so-called pairs of stereoscopic images [2,3].
The images contain vast amounts of data, and the price of extra realism in stereo displays is the doubling of data (due to the simultaneous existence of two images), causing a bottleneck in the data flow 3D image is obtained only with the use of hardware or spectacles for observing stereoscopic images so that 3D information is not recorded except by implication in the difference between the two images.The technical limitations of this sort of display mean that the refresh rate of the display should be synchronized with the glasses used which may be done wirelessly as well.
It is known that because of its imperfections, the human eye cannot distinguish the entire color gamut available on modern displays.The ques-tion arises, therefore, whether it makes sense to keep all these shades of color if the human eye is not able to see them.In addition, there are also redundancies in image content, especially in the case of frames of video.Due to limitations in data transfer, online load, and synchronization, compression is a key component in modern communication and data communication services.There are a number of standard image compression techniques such as, for example, JPEG [4] and MPEG [5,6].Standard image compression techniques cannot establish correlations between the left and right stereo pairs and the information they contain.In the case of standard compression techniques, it would be necessary to compress each image of a stereo pair separately, which would lead to doubling of the bandwidth in data transmission [7,8].Because of the discrete differences that occur between the uncompressed and compressed images, the illusion is potentially disrupted, as the vital minute differences between images may be disrupted and there is no provision in standard image compression techniques to preserve them.Due to the lack of standard compression techniques, dedicated techniques for the compression of stereoscopic images are developed.The compression techniques of stereo images found in the literature use discrete wavelet transformation and morphological compression algorithm.The wavelet nature of the algorithm and the proposed disparity compensation provide reconstructed images without blocking artifacts and fewer annoying ringing effects.One main focus of research in stereo image coding has been disparity estimation, a technique used to reduce the coding rate by taking advantage of the redundancy in a stereo image pair.These are the reasons why the analyzed compression techniques are significant and stand out from standards compression techniques such as MPEG, JPEG and JPEG-200 [5].Also, many of the compression techniques proposed over the time are proprietary and as such are not easily available.Therefore, we decided to limit our research on the ones whose source codes, executables, and/or results were available to us.It is our intention to expand our research and to implement more compression techniques proposed for stereoscopic imaging.
A wavelet is a mathematical function used for digital signal processing and image compression.In signal processing, wavelets are used to recover weak signals from noise.It is also useful for X-ray and magnetic resonance imaging in medical applications.In internet communication, it is used to compress images on a larger scale.
In this paper, three techniques for the compression of stereoscopic images which most frequently mentioned in the literature [9] and whose results are available to us, will be analyzed: 1. Stereoscopic image compression using discrete wavelet transformation and coding using the morphological re-presentation of coefficients, with estimation of disparities within the morphological coder, known as the Dense disparity map algorithm.
2. Stereo image compression based on quadratic analysis and morphological representation of the wavelet coefficient, known as the Disparity compensated residual algorithm.
3. Stereo image compression based on MRF (Markov Random Field) analysis for the assessment of disparities and morphological coding.
These techniques are applied to pairs of stereoscopic images.Figure 2. shows a pair of stereo images [10] used to test and compare compression techniques in this paper.
The comparison of the above-mentioned compression techniques was done by objective methods for assessing the quality found in the analyzed literature, but also by the subjective estimates of the subjects tested.There are different methods for assessing the quality of stereoscopic and digital images in general.The most common methods for estimating the quality of digital images are objective and subjective methods.Objective methods estimate the quality of images in relation to some defined parameter.Subjective methods for assessing quality, on the other hand, are based on subjective image quality assessments.The respondents evaluate the quality according to personal feelings and observations, as was done in this paper.The interviewees observed the pictures with the use of virtual reality head-mounted display and evaluated their quality with grades of 1-5.The testing process is described in more details in the rest of the paper.The paper consists of six sections.The first section details the goal and subject of the research.In the second section, an overview of techniques for the compression of stereoscopic images will be given.The third section will give overview of devices used for display of stereoscopic images.In the fourth section, a comparison of the above techniques will be made.The results obtained by compression were also evaluated by objective and subjective metrics for image quality assessment.The fifth section will give a detailed description of the subjective analysis, as well as the analysis of the results obtained.In the last (sixth) section, the conclusions of this research will be presented.
2. Review of subjected compression techniques for stereoscopic images.This section will present an overview of techniques for the compression of stereoscopic images and a description of the notion of disparity in stereovision.

Disparity in stereovision.
The problem of finding points of a stereo pair corresponding to the same point of a 3D object is called correspondence.The problem is simplified if the cameras are coplanar.The distance between the two points on the stereo pair of the image corresponding to the same point of the scene is called disparity.The estimation of this distance (disparity vector -DV) is very important because the target image can be predicted from reference to DV.The disparity compensated difference (DCD) is estimated so that the redundant information is not encoded.Disparity compensation uses a Block Matching Algorithm-BMA and the determination of residual blocks [12,13], equations (1): where R j i b , and L j i b , ~ are corresponding blocks of the target and recon- structed images, and dv x and dv y are components of the disparity vector for the best matching [7], equation (2): where A is the window search area, and the matching criterion is the Mean Absolute Error (MAE).The described compensation method represents a closed loop because the prediction of the target image is performed using a reconstructed reference image.Difference compensation can be performed with a reference image and is called compensation for the difference in the open loop.Open loop algorithms are simpler, but less efficient since there is no need for inverse quantization and the reverse branch transformation on the encoder side.The difference compensation process uses spatial dependence among images to remove redundant information.The blocks that do not have the appropriate blocks in the reference image are called blocked blocks.The pages of stereo pairs that cannot see both eyes, as well as the areas that arise from the overlapping of objects are the occluded areas.2.2.Dense disparity map algorithm.The field of disparity vector is estimated by the method of pairing the pixels of the image and the pixels used for shape comparison.The algorithm is designed to produce a distribution of coefficients in each iteration in order to get the best performance.Clusters are formed in sub-groups.The target image clusters correspond to their "cousins" located on the reference image by shifting them so that they optimize the minimal absolute error.The disparity field vector is determined for the entire cluster, and the compensated disparity difference field is determined by subtracting the corresponding coefficients between the two clusters.
For the compression coefficients to be determined in a hierarchical way, a structured 3x3 element for the morphological dilation operation is used.Figure 3 shows the HL1, HL2 and HL3 subgroups of the left picture that are later divided into 3 levels.The spatial dependence of wavelet coeffi-cients is obvious and justifies the morphological monitoring of all coefficients.The morphological dilation is done on the basis of structured 3x3 elements.

Fig. 3. Spatial dependence of wavelet coefficients between the image division subgroups
A unique step size quantizes all subgroups and, as a result, a binary image with two partitions is obtained, separating significant and insignificant parts of the image.The coefficients within a defined range are called significant.The dependence of wavelet coefficients in the formation of clusters suggests the application of morphological dilation in order to identify the "neighbors" that are significant.These coefficients are divided into groups according to the similarity of characteristics.The results of this division contain the necessary additional information for their description in the decoding phase.The algorithm starts with the first significant coefficient after quantization by a uniform quantizer.Then dilation operation is applied at each first significant coefficient and assigned to each adjacent neighbor of importance, clockwise.Darker blocks indicate significant clustering coefficients that have been collected in a predefined way.Isolated significant coefficients that do not form clusters together with those coefficients that are not assigned as neighbors form a group of insignificant coefficients.This procedure produces a coder behavior map.The uncertainty regarding well-selected significant coefficients in the final scale is checked and solved by repeating the dilation operation on all specific structural elements.
The performance of this still image algorithm is quite good compared to other compression techniques.The division and progressive encoding of subgroups in a hierarchical sense provide an advantage for handling the estimation of disparities in the same image.This technique degrades quality, but it is computationally 'cheap.' 2.3.Disparity compensated residual algorithm.The algorithm consists of image coding based on the morphological prevalence of the coefficient of wavelet transformation and on the quadratic analysis of the disparity between the images of the stereo pair [18].The coding unit uses a discrete wavelet transformation followed by a morphological encoder, which exploits statistically the properties of wavelet coefficients within and between subgroups in order to create entropy-reducing partitions between the significant and insignificant coefficients.The disparity compensation procedure uses a block adaptation algorithm that is based on variable-size blocks generated by the quadratic decomposition of the target image.The diagram in Figure 4 demonstrates the components of the algorithm: -Discrete wavelet transformation -performs decomposition and quantization of target and residual images; -Morphological compression -divides the wavelet coefficients into significant and insignificant ones in order to reduce their entropy; -Inverse morphological compression -reconstructs the reference image and places it as an input to the disparity compensation units; -A Disparity Compensation Unit -compares two inputs (reconstructed reference and target image), estimates the best target image prognosis, and produces a residual target image, representing target images from the best prediction (compensated disparity difference).The best prediction vectors for each block are called disparity vectors; -Entropy coder -encodes the reference image, residual target image, and disparity vector.
Figure 4 shows the diagram of the proposed algorithm [9].The right image is segmented into blocks of homogeneous intensity with quadratic decomposition with intensity difference thresholding.These blocks belong either to the same object or to the background and show homogeneous disparity characteristics.This is followed by a quadratic decomposition with a simplified speed distortion criterion, which allows division of an already existing block into 4 sub-blocks.Figure 5 shows how the reference image is divided into blocks (a) and the residual image (b) [17].

Stereo image coding based on MRF analysis for the assessment of disparities and morphological coding.
MRF is an algorithm that is based on the fields of disparity D and occlusion O [12,17].The algorithm considers the existence of irregularly distributed points or positions in the image (called nodes) which are elements of the images to be paired.Possible correspondences of each node are a discrete set of selected image properties that correspond to the characteristics of another stereo pair image, according to the allowed range of disparities.The biggest problem is only the determi-nation of these fields.The configurations of the disparity field D and the occlusion O can be determined by the equation (3) [12]: where S L and S R represent the reference and target image.The first equation condition is the probable energy of the target image, relative to the left or reference image, and the fields  and .This term is called the boundary of similarity because it affects the similarity of two stereo pairs of images.
The second condition is the energy of the disparity field, in relation to the occlusion field O, and is called the obstacle.It is a smooth variation of the vector of disparity.The third condition is the energy of the occlusion field and is called the occult limit.This discards any discontinuities or closed blocks.Therefore, minimizing the total amount of these three conditions or rather optimization criteria affects the efficiency of the residual image encoding and the resulting disparity vectors.The field of the initial assessment is formed using the double threshold technique and is divided into three regions: unused, occlusive and the uncertain region depending on the estimated probability of being sorted as 'occlusive' or 'not occlusive.' 3. Overview of devices used for display of stereoscopic images.When providing a taxonomy of the current virtual reality (VR) hardware developments, the presented devices often exist only in the prototype stage; most of them are not yet commercially available and may even never be.The main category in current display technology represents the visual displays.In terms of consumer VR they are all head-mounted displays (HMDs) which are either wired or mobile [13], [14].Other categories providing haptic and multi-sensory feedback, but are expensive and not easily available.
Mobile HMDs -it is possible to identify three subcategories in the HDMs for mobile systems.All share the property of being wireless and being usable without an additional PC.In most cases, their application areas lie in entertainment -displaying 360• movies or panoramas rendered from a stationary point of view or alternatively interactive walkthroughs based on gaze directed navigation.
The first subcategory in the mobile displays is called ''simple casing''; these displays are basically a frame for smart phones with additional focusing lenses mounted at an appropriate distance.They fully rely on the technology of the smart phone to display and process the data.Google has developed the first devices of this kind.
The second mobile subcategory consists of ergonomically designed smart phone cases, which contain significantly better optics, possibly limited additional electronics, and are more comfortable to wear.The difference between this and the first subcategory is one of degree, not of kind and rest fundamentally on The third subcategory are dedicated mobile HMDs.Different prototypes and examples of this sort of device exist.Gameface and Oculus Go, are stand-alone systems that do not need an additional PC or a smart phone with all necessary hardware built into the unit.This seems to be a promising approach as it allows the hardware to be tailored to the task at hand, but the question of how affordable it is going to be for wider audience remains open.
Wired HMDs -these are devices that need to be tethered to a PC or some other high-power computing device which is in charge of generating 3D graphics.Despite their name this type of device needn't be actually wired, and may in fact be connected wirelessly.They differ in comparison to Mobile device by being dependent on their, fixed, computing platform.The feature list of wired HMDs is diverse and they may be distinguished on the basis of not only traditional quality factors like resolution, Field of View (FOV) or weight but also specialized additional features.Some are equipped with cameras to allow for AR and can be used as video seethrough displays, while others include eye tracking.
A representative of mobile HMDs is Google Cardboard.Google Cardboard follows a minimalist philosophy of design, acting as a very basic viewer, not even allowing the user to secure the device to their hear using a strap.In order to provide basic interaction the Cardboard unit is equipped with a magnet on the left side of the device.Phone sensors can detect the motion of the magnet.A vast amount of inexpensive cardboard clones exist and have disseminated the technology far and wide.They differ in such terms as lenses with a larger FOV or mounting apparatus of some kind.More advanced solutions are on the market as well, providing a simple plastic casing, back straps or hats to mount the phone [13].Xwave glasses (used for testing) do not cost much and are easily available and work on the same principle.They do not have an integrated solution for tracking user's head in space (similarly to the professional HMDs), but they allow tracking of orientation using the accelerometer built into the phone [14].The Xwave glasses are shown in Figure 6.
Generally, devices for displaying stereoscopic images are expensive and given our interest in the compression of stereo images and our desire for maximum reach, price is very important.Therefore, we restricted the research to a device that is relatively easy and cheap to buy and available to a large number of people [15].Table 1 shows the specifications of the Xwave glasses.[19][20][21].
MSE and PSNR methods are not complicated and are easy to understand and implement.Therefore, they are often used, perhaps most often in the assessment of image quality [22,23].These methods cannot give an objectively assessed quality that matches the observer's estimation for a wide range of coding and transmission parameters.This is because they compare the tested and reference data, without knowing what they actually represent.They do not take into account the characteristics of the HVS (Human Visual System), which show that HVS does not have the same sensitivity to different types of distortion and different distortion properties.In addition, it is very important to know in which part of the image the distortion occurs, and MSE or PSNR do not take this into account.They measure the accuracy of the signal without modeling any properties of the HVS or image content and semantics.
In accordance with the literature [11,12,18] this research uses PSNR as the main measure of compressed image quality.
Figures 7, 8 and 9 show the stereo pairs of images compressed by the techniques described in the paper.The first proposed technique for the compression of stereoscopic images has a low complexity of the algorithm itself, but the optimization of the distortion is very complex.Therefore, it is difficult to improve the performance of the algorithm itself, taking into account the optimization criteria for the distortion.An estimation of the pixel disparity has both advantages and disadvantages in relation to the techniques based on the division of images into blocks.It affects the low complexity of the algorithm, avoiding blocking artifacts, but it has lower efficiency in terms of distortion.The proposed method also has a low degree of complexity due to the simplicity of the quantization process and the effect on the determination of the disparity within the compression scheme as a single framework.For this technique, it can be said that it is of low complexity and that it represents the technique of an alternative estimate of disparity.Disparity compensated residual encoder is a robust encoder, which inherits all the benefits of wavelet transformation and reduces the entropy of the transferred images.The experimental evaluation of the proposed encoder [10] has shown that its performance is better than other stereoscopic image coders, since in comparison to the visual quality achieved, it can be said that the coder algorithm of low complexity.The latest proposed technique is MRF.This algorithm obtains smooth disparate fields [24] without increasing residual energy and thus allocates fewer coding bits, and therefore takes less time to complete the algorithm in relation to the previous two suggested techniques.
Also, based on the simple visual quality of the compressed images shown in Figures 7, 8 and 9, it is can be concluded that the Dense disparity map gives the lowest quality results, while the Disparity compensated residual algorithm yields high quality results.The visual quality of the compressed images was also checked subjectively, and that is described in the next section.

Subjective comparison of compression techniques for stereoscopic images.
In addition to objective metrics for assessing picture quality, there also exist subjective quality assessments.When the overall subjective image quality is measured, it would be best to do tests on the entire population, but of course, this is wildly impractical.This problem can be overcome with a sampled statistical analysis.The main assumption is that this limited number of respondents is representative of the whole population.[28][29][30].Accordingly, proper sampling (i.e., selection of respondents) should be ensured.Entrants participating in the testing should not have prior knowledge of the quality assessment of compressed stereoscopic images.In addition, subjects should be at the age of 18-30, because the visual system of people at this age is in optimal condition.In conclusion, the subject should be a rather young person, who will evaluate the quality of the displayed compressed images according to a subjective, personal impression.
In order to obtain statistically reliable results, the test session must be precisely carried out.Also, it is necessary to deal with practice and boredom effects.To avoid this, it is necessary to avoid showing all the test images during one test session and it is necessary to avoid showing the images in the same order as this will yield better statistical results.In the paper, the test consisted of four test images, one of which is the original image.Pairs of the pictures are presented individually to the respondents.A stereoscopic compressed image is considered as one test point.Figure 5 shows pairs of stereoscopic images seen by users.One line (a pair of images) represents one 3d image.The images have been shown in random order.Each respondent had breaks between viewing image pairs lasting five to fifteen minutes.
5.1.Description of the test procedure.In this paper, the results of image compression achieved by these techniques were shown to users on an Xwave VR display powered by a Samsung J5 2017 mobile phone.The glasses are shown in Figure 6 and the mobile phone is shown in Figure 10.Thirty respondents participated in the test.Before the start of the testing, each of the respondents was asked to complete a survey.First, it was necessary to enter the gender, age, and field of education.Then, respondents were asked questions related to sense of vision, which were:  Do you have an eyeglass prescription? Have you been diagnosed with astigmatism? Have you been diagnosed with dichromatism/daltonism?Only after they responded to all the questions in the survey, respondents were able to access the testing (Figure 12).
The respondents were divided into three groups of ten respondents.This was done in order to optimize the breaks between image pair viewing.Thus, each respondent had to wait between five to fifteen minutes.After the first respondent viewed the first image pair (which was chosen randomly), he/she had a break during which each of the rest of the nine respondents viewed their randomly chosen first image pair.
After all of the ten respondents viewed their first pair of images, the first respondent viewed the second image pair randomly chosen from the image pair he/she had not seen yet.Respondents rated the quality of the images they were shown they.The respondents rated the quality of the images they were shown.They immediately rated the quality of the image pair viewed.

Results and discussion.
As stated in the preceding section, the sample size was thirty, with a mean age of 28.6 years.The respondents were all given the same four pictures to evaluate on a five-point Likert scale.Neither the subject, nor the examiner was aware of the order in which the pictures would be shown in ensuring a double-blind experimental protocol.Given the use of a Likert scale [31], the assumption of normality is violated, especially given the relatively small n value and the heavy-tailed nature of the distribution [32] implied by treating Likert scale values on the continuous measurement level.Therefore, robust statistical methods [33] have been used to analyze the relevant data.These selected statistical methods can capture effects on a small sample.First, Figure 13 shows a bar graph of the data with 95% confidence interval error bars.While this is not a conclusive test, the error bars suggest that the means of the grades do differ between the original and the algorithms.This suggests that DCR has the superior image quality of all the algorithms but proving this may be beyond the resolution afforded by the sample.
However, this does not take into account the repeated-measures nature of the data, therefore a more careful analysis of the values is required.The first step is to perform a heteroscedastic one-way robust dependent ANOVA equivalent bootstrapped with 2000 resampling steps and 20% trimmed means.Bootstrapped statistics do not produce p-values but it should be noted that the test statistic generated is 24.1579 while the critical value is 2.9671 indicating a high degree of significance.A non-bootstrapped robust analogue produces similar values, F (2.62, 44.58) = 24.1579with a p of 0 indicating a value too small for the system to calculate.This permits us to reject the H0 of the means being equal between groups which is the expected result given the nature of the graph in Figure 10.Post hoc testing is much more interesting to us, and a post-hoc analogue to the ANOVA variant used is detailed in [33] and implemented in [34] and if implemented and applied to the data produces results in Table 2. Given these values, it is evident that most of the individual differences are significant even after accounting for familywise error rate.The one exception is the difference between DDM and MRF algorithms whose mean grade is close enough that the difference is not significant at the level of resolution available from the present data set.
This analysis allows us to claim with a certain degree of confidence that: the original picture is better than the output of any of the algorithms with a p-value of 0 and an effect size of, respectively,  =0.733, 0.818, 0.723, which according to Wilcox and Tien who defined the measure [35] is an exceptionally large effect size; -DCR is better than any of the other algorithms with a p-value of, respectively, 0.01705, 0.01690, and an effect size of  =0.479, 0.380, which according to [35] is a large effect.
6. Conclusion.This paper presents an overview of the compression techniques used for image compression in a stereoscopic display and analyzed the results through objective and subjective methods for assessing the quality of compressed images.As a conclusion of this analysis, it can be said that the wavelet technique of compression of pairs of stereoscopic images does not affect the minute differences between paired stereo images and thus does not affect the degree of immersion in the virtual environment created in part by the stereoscopic display, as is the case with standard image compression techniques.In the case of standard compression techniques, there is a shortcoming that sudden transitions between contrast values are not possible.It is precisely because of the discrete differences that occur between compressed and uncompressed that the 3D illusion is deteriorating.The MRF technique improves the quality of the reconstructed target images in comparison with the results that can be given by a plain technique based on the division of images into blocks.
The values from this analysis show that there are individual differences which are significant.The only exception is the difference between DDM and MRF.The difference not statistically significant on the level of the resolution available from the present data set.It shows that: the original picture is better than the output of any of the algorithms; -DCR is better than any of the other algorithms.

Fig. 1 .
Fig. 1.Demonstration of how a pair of stereo images creates an illusion of 3D scene/objects

Fig. 7 .
Fig. 7. Examples of stereo pairs of images (left and right) compressed with dense disparity map algorithm

Fig. 8 .Fig. 9 .
Fig. 8. Examples of stereo pairs of images (left and right) compressed with disparity compensated residual algorithm

Fig. 13 .
Fig. 13.Mean plot of grade values for the pictures in question.95% error bars shown in red, n=30

Table 1 .
Specifications of Xwave glasses

. Objective comparison of compression techniques for stereoscopic images.
Some of the objective methods are Peak Signal to Noise Ratio, Mean Square Error MSE, Structural SIMilarity, UIQI (Universal Index Quality), and RRIQA (Reduced Reference Image Quality Assessment)

Table 2 .
Results of post-hoc testing on mean grade values, n = 30