I Introduction
Markov Random Fields (MRFs) based models have a long history in lowlevel computer vision problems, which treat the image as a random field
[6]. It is wellknown that MRFs are particularly effective for image prior modeling in image processing. In a MRFbased image prior model, the probability of a whole image is defined based on the potential (or energy) of the overlapping local cliques.
An elegant MRFbased image prior model, called Fields of Experts (FoE) was recently proposed by Roth and Black [8]. The proposed FoE model is defined by (1) a heavytailed potential function, which is derived from the observation that the filter response of natural images exhibit heavytailed distribution when applying derivative filters onto them, (2) a set of linear filters, which are trained from image samples.
Due to its effectiveness of the FoE image prior model for many image restoration problems, many works have been devoted to the FoEbased image restoration problems, such as image denoising, inpainting, deblurring, etc [8, 9, 1]. Usually, there are two ways to investigate the learned FoE prior model for specific image restoration problems, the samplingbased MMSE estimation, such as [10, 4, 15, 16], and the energy minimization based MAP estimation, such as [8, 9, 1, 2].
It is [10] that for the first time claimed that the MMSE estimation can lead to better performance compared to the MAP estimation for the image denoising task with their learned FoE image prior model. After that, many works follow their suggestion to make use of the MMSE estimation for FoE related models, such as image deblurring [11, 17], image denoising [4], depth estimation [13, 5], image separation [15] and single image super resolution [16].
In a recent paper [16], the FoE prior model was exploited in the context of image super resolution. The authors also proposed to employ the MMSE estimate in the inference procedure instead of the MAP estimate. With the MMSE estimate, the FoEbased SR model demonstrates a stateoftheart SR algorithm. However, it is well known that the sampling based approach is very time consuming, alluding to the fact that the FoEbased SR model is not appealing for practical applications.
It is generally true that the MMSE estimate is a better alternative than MAP, as it can exploit the uncertainty of the model, especially in the case of multimodal distribution with multiple peaks. However, in practice it is usually hard to find an accurate solution for the MMSE estimate due to the difficulty of taking the expectations over entire images. As a consequence, the MAP inference, which seeks the maximum peak, might have the possibility to work equally well for some problems.
In this letter, we evaluate the performance of the MAP inference for the FoE based SR problem. Our experimental results demonstrate that the MAP inference of the FoEbased SR model has been underestimated in the previous work [16]. Numerical results show that with exactly the same image prior model exploited in the MMSE estimation, the MAP inference can achieve equivalent performance in terms of both quantitative measurements (PSNR and SSIM values) and visual perception quality. In addition, the MAP inference can obtain further improvements with the discriminatively trained FoE image prior of the same model capacity. It is clear that the MAP inference has a significant advantage of efficiency, and this advantage is even more remarkable with our recently proposed nonconvex optimization algorithm  iPiano [7].
To sum up, our experimental findings suggest us to exploit the MAP inference for solving the FoE priorbased image super resolution problem, because (1) there is no performance loss by using this simpler inference criterion, and (2) the MAP inference has an apparent advantage of high efficiency.
Ii MAP inference of FoE image prior based SR
In a typical image super resolution task, the lowresolution (LR) image is generated from a highresolution (HR) image using the following formulation
where and is the HR and LR image, respectively. is the matrix corresponding to the blurring operation and () signifies the downsampling operation.
is the noise (typically assumed to be Gaussian white noise with level
).The FoE image prior based SR model is formulated by the following Bayesian probabilistic model
(II.1) 
where is the probability density of an image under the FoE framework, written as
where is the maximal cliques, is the number of the filters, refers to the th pixel in the filtered image by , is the potential function with associated weights . In [16], the potential function is given by the Gaussian scale mixtures (GSMs) as
(II.2) 
where are the normalized weights of the Gaussian component with scale
and base variance
.According to the posterior (II.1), [16] used the samplingbased MMSE estimation to recover the underlying HR image . In this letter, we consider the MAP estimate. With the MAP estimation, the FoEbased SR task is formulated as the following energy minimization problem
(II.3) 
where with penalty function defined in (II.2).
Gradientbased algorithms are applicable to solve the minimization problem (II.3). First, we need to calculate the gradient , which is given as
(II.4) 
where a highly sparse matrix, implemented as 2D convolution of the image with filter kernel , i.e., , , with .
In our work, we consider a newly developed nonconvex optimization  iPiano [7] to solve the above minimization problem, instead of the commonly used conjugate gradient (CG) algorithm. We find that the iPiano algorithm is significantly faster than CG. We refer the interested readers to [7] for more details about the iPiano algorithm.
Iii Experimental results
We mainly conducted two types of experiments. The first type is to perform a direct comparison between the MAP estimate and the MMSE estimate for the FoE based SR task. The second type is to compare the MAP based SR model to very recent stateoftheart SR approaches. The corresponding implementations are all from publicly available codes provided by the authors, and are used as is.
Iiia Comparison between the MAP and MMSE estimate
In order to conduct a fair comparison with the MMSE estimation, we first considered the MAP estimation with exactly the same image prior model exploited in [16] (8 filters of size with GSMs potential). We repeated the experiments presented in the TABLE I of [16], where eight noisefree images were upsampled with a zooming factor of 3. The results of the MMSE and MAP estimates are shown in Table I. One can see that the MAP estimate using the same image prior model performs equally well compared to the MMSE estimate, in terms of PSNR and SSIM index^{1}^{1}1 Note that we were not able exactly reproduce the results presented in [16] due to the randomness of the samplingbased approach. We actually achieved slightly different results. .
House  Peppers  Cameraman  Barbara  Lena  Boat  Hill  Couple  
MMSE with prior (II.2)  31.73/88.85  25.94/90.94  26.26/83.43  25.55/74.44  32.93/90.34  29.32/83.32  30.28/81.83  28.47/80.34 
MAP with prior (II.2)  32.25/89.03  25.86/89.50  25.91/82.20  25.65/75.41  33.16/90.97  29.10/83.53  30.71/82.59  28.41/80.55 


MAP with prior (III.1)  32.72/89.61  26.62/91.43  26.69/84.66  25.71/75.71  33.52/91.44  29.48/84.47  31.14/83.77  28.77/81.91 
We then exploited a discriminatively trained FoE prior for the MAPbased SR model to further investigate its performance. The discriminatively trained FoE prior has the same model capacity, and is directly optimized based on the MAP estimate in the context of Gaussian denoising. We employed the Studentt based FoE model trained in our previous work [2], which is defined as
(III.1) 
where the penalty function is given as the Lorentzian function shown in Figure 1(b), and is the weight of the corresponding filter . The corresponding filters are shown in Figure 1(a).
The results of the MAPbased SR model with this discriminatively trained FoE prior (III.1) are also shown in Table I. One can see that the MAP inference with our discriminatively trained FoE model improves the PSNR and SSIM results. An illustrative example is presented in Figure 2.
Methods  House  Peppers  Cameraman  
1  MMSE with prior (II.2)  31.26/87.74  25.69/88.83  26.13/82.40 
MAP with prior (II.2)  31.66/87.30  25.87/87.93  25.72/80.91  
MAP with prior (III.1)  32.03/87.55  26.23/89.24  26.17/81.96  


2  MMSE with prior (II.2)  30.47/85.80  25.23/86.00  25.67/80.20 
MAP with prior (II.2)  30.84/85.62  25.38/85.75  25.24/78.82  
MAP with prior (III.1)  31.25/85.87  25.49/86.97  25.69/79.68  


3  MMSE with prior (II.2)  29.33/83.21  24.54/82.32  24.94/77.04 
MAP with prior (II.2)  30.30/84.55  24.88/83.80  24.65/76.53  
MAP with prior (III.1)  30.59/84.63  25.10/85.19  25.26/77.97 
We also evaluated the performance of the MAP inference in the presence of noise. For the cases of mild Gaussian noise, the results of the MAP inference with two different FoE image prior models are shown in Table II, together with the results of the MMSE based model. Note that this is a direct comparison to TABLE III of [16]. Again, one can see that the MAP estimate with the same FoE model (i.e., (II.2)) works equally well, and it leads to better results with our discriminatively trained FoE prior (III.1).
For the MAP estimate based SR model (II.3), we need to search an optimal for each case. For the noisefree image SR task, we use a relative large , and for the SR tasks with Gaussian noise, we find the following empirical choice (1) , (2) , and (3) , generally works well.
Run time: We run the inference algorithms on a server with Inter(R) Xeon(R) CPU E52680 v2 @ 2.80GHz. For the SR task of upsampling an image of size to the size of , the average computation time per iteration of the MMSEbased algorithm is 87s. Typically, the MMSE estimate takes 100 iterations, and therefore for this SR task, it requires about 2.4h, making this approach hardly appealing for practical application.
In contrast, the MAP inference is much faster. The average computation time per iteration of the MAP inference is 0.039s in the case of the Studentt based FoE prior (III.1)^{2}^{2}2 With the same model capacity of 8 filters of size . Typically, it takes 150200 iterations to solve the resulting nonconvex minimization problem^{3}^{3}3Also note that the required iterations is dramatically reduced by using the iPiano algorithm, compared to the usual CG algorithm used in previous works, such as [8, 10], where the iterative algorithm has to run 5000 iterations. . As a consequence, the MAP inference with the Studentt based FoE prior is able to accomplish the same SR task in 7s, which is dramatically faster than the MMSE inference (2.4h). Implementation will be available at our homepage (www.GPU4Vision.org) after acceptance.
IiiB Comparison to stateoftheart SR approaches
In order to conduct a comprehensive evaluation for the MAP based SR model, we further compared it with very recent stateoftheart SR approaches: the KSVD based method [14], the ANR (Anchored Neighborhood Regression) based method [12] and deep convolutional network based method  SRCNN [3]. In order to perform a fair comparison with these methods, we strictly obey the same test protocols as in [12]. We used the same test sets  Set14 and Set5 to evaluate the upscaling factor of 3. For the MAP based SR model, we incorporated a FoE prior model with larger filter size and more filters (shown in Figure 3, 48 filters of size ), which is trained in [2]. Replacing the FoE prior model show in Figure 1 with this new FoE model having increased model capacity can improve the performance of the MAP based SR model.
The SR results on Set14 and Set5 are summarized in Table III. We can see that the FoE based SR model with filters of size achieves similar average PSNR as the SRCNN method, and outperforms other competing algorithms. A visual example is shown in Figure 4^{4}^{4}4Following [3]
, we only consider the luminance channel (in YCrCb color space) in our experiments. The two chrominance channels are directly upsampled using the bicubic interpolation for the purpose of display.
. In the highlighted region, one can see that our SR method achieve much clear edges than other approaches. In summary our model obtains strongly competitive quality performance to very recent stateoftheart SR methods.Set14 images  Bicubic  KSVD  ANR  SRCNN  
baboon  23.21  23.52  23.56  23.60  23.58 
barbara  26.25  26.76  26.69  26.66  26.43 
bridge  24.40  25.02  25.01  25.07  25.13 
coastguard  26.55  27.15  27.08  27.20  27.25 
comic  23.12  23.96  24.04  24.39  24.26 
face  32.82  33.53  33.62  33.58  33.70 
flowers  27.23  28.43  28.49  28.97  28.84 
foreman  31.18  33.19  33.23  33.35  33.83 
lenna  31.68  33.00  33.08  33.39  33.31 
man  27.01  27.90  27.92  28.18  28.15 
monarch  29.43  31.10  31.09  32.39  31.88 
pepper  32.39  34.07  33.82  34.35  34.30 
ppt3  23.71  25.23  25.03  26.02  26.42 
zebra  26.63  28.49  28.43  28.87  26.81 
average  27.54  28.67  28.65  29.00  28.99 
Set5 images  Bicubic  KSVD  ANR  SRCNN  
baby  33.91  35.08  35.13  35.01  35.10 
bird  32.58  34.57  34.60  34.91  35.07 
butterfly  24.04  25.94  25.90  27.58  26.79 
head  32.88  33.56  33.63  33.55  33.72 
woman  28.56  30.37  30.33  30.92  30.79 
average  30.39  31.90  31.92  32.39  32.29 
Iv Discussion and Conclusion
In the context of higherorder MRF based models, it is generally true that the MAP estimate, which only seeks for the posterior mode, could not generally exploit the full potential offered by the probabilistic modeling, while the MMSE estimate, which directly draw samples from the probability model, should be more powerful. On the other hand, it is wellknown that the sampling based MMSE estimation is very slow, making the corresponding methods hardly appealing for practical applications if one has to stick to the MMSE inference.
In this letter, we have concentrated on the higherorder MRFs based SR problem, and evaluated the performance of the MAP estimate in inference. We found that the MAP estimate can work equally well compared to MMSE in the presence of the same FoE prior, despite of the nonconvexity of the resulting optimization problem. We believe the reason is twofolds: first, the exploited iPiano algorithm which is an effective nonconvex optimization algorithm, helps us reach the MAP mode in a short time; secondly, in practice one is not able to obtain an accurate solution for the MMSE estimate. In addition, we found that the performance of MAP estimate can be further boosted by using discriminatively trained FoE prior models. As a consequence, the resulting model, which involves 48 filters of size can lead to strongly competitive results to very recent stateoftheart SR methods. Therefore, concerning the higherorder MRFs based SR task, we suggest to exploit the MAP estimate for inference because there is no performance loss by using this simpler inference criterion while it has an obvious advantage of high efficiency.
Furthermore, it is notable to point out that the findings about the MAP estimate presented in this letter strengthen our arguments drawn based on the Gaussian denoising problem in our previous works [1, 2]. We have show in [1, 2] that the MAPbased denoising model with our discriminatively trained FoE prior leads to the best results among the MRFbased systems, including MMSE based models. Therefore, we believe that MAPbased denoising model does not perform well in previous works, e.g., [10, 9] just because they have not obtained a good FoE prior wellsuited for the MAP inference.
In summary, we believe that in the context of higherorder MRF image prior based modeling for image restoration problems, it is a better choice to make use of the MAP estimate, together with the discriminatively trained FoE prior.
References
 [1] Y. Chen, T. Pock, R. Ranftl, and H. Bischof. Revisiting lossspecific training of filterbased mrfs for image restoration. In GCPR, pages 271–281, 2013.
 [2] Y. Chen, R. Ranftl, and T. Pock. Insights into analysis operator learning: From patchbased sparse models to higher order MRFs. IEEE Transactions on Image Processing, 23(3):1060–1072, 2014.
 [3] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image superresolution. In Computer Vision–ECCV 2014, pages 184–199. Springer, 2014.
 [4] Q. Gao and S. Roth. How well do filterbased MRFs model natural images? In DAGM/OAGM Symposium, pages 62–72, 2012.
 [5] C. D. Herrera, J. Kannala, P. Sturm, and J. Heikkila. A learned joint depth and intensity prior using markov random fields. In 3DTVConference, 2013 International Conference on, pages 17–24. IEEE, 2013.
 [6] S. Z. Li. Markov random field modeling in computer vision. SpringerVerlag New York, Inc., 1995.
 [7] P. Ochs, Y. Chen, T. Brox, and T. Pock. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM Journal on Imaging Sciences, 7(2):1388–1419, 2014.
 [8] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
 [9] K. G. G. Samuel and M. Tappen. Learning optimized map estimates in continuouslyvalued MRF models. In CVPR, 2009.
 [10] U. Schmidt, Q. Gao, and S. Roth. A generative perspective on MRFs in lowlevel vision. In CVPR, pages 1751–1758, 2010.

[11]
U. Schmidt, K. Schelten, and S. Roth.
Bayesian deblurring with integrated noise estimation.
In
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
, pages 2625–2632. IEEE, 2011.  [12] R. Timofte, V. De, and L. V. Gool. Anchored neighborhood regression for fast examplebased superresolution. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1920–1927. IEEE, 2013.
 [13] X. Wang, C. Hou, L. Pu, and Y. Hou. A depth estimating method from a single image using FoE CRF. Multimedia Tools and Applications, pages 1–16, 2014.
 [14] R. Zeyde, M. Elad, and M. Protter. On single image scaleup using sparserepresentations. In Curves and Surfaces, pages 711–730. Springer, 2012.
 [15] H. Zhang and Y. Zhang. Bayesian image separation with natural image prior. In Image Processing (ICIP), 2012 19th IEEE International Conference on, pages 2097–2100. IEEE, 2012.
 [16] H. Zhang, Y. Zhang, H. Li, and T. S. Huang. Generative bayesian image super resolution with natural image prior. Image Processing, IEEE Transactions on, 21(9):4054–4067, 2012.
 [17] B. Zhao, W. Zhang, H. Ding, and H. Wang. Nonblind image deblurring from a single image. Cognitive Computation, 5(1):3–12, 2013.
Comments
There are no comments yet.