BayesBD: An R Package for Bayesian Inference on Image Boundaries

Abstract:

We present the BayesBD package providing Bayesian inference for boundaries of noisy images. The BayesBD package implements flexible Gaussian process priors indexed by the circle to recover the boundary in a binary or Gaussian noised image. The boundary recovered by BayesBD has the practical advantages of guaranteed geometric restrictions and convenient joint inferences under certain assumptions, in addition to its desirable theoretical property of achieving (nearly) minimax optimal rate in a way that is adaptive to the unknown smoothness. The core sampling tasks for our model have linear complexity, and are implemented in C++ for computational efficiency using packages Rcpp and RcppArmadillo. Users can access the full functionality of the package in both the command line and the corresponding shiny application. Additionally, the package includes numerous utility functions to aid users in data preparation and analysis of results. We compare BayesBD with selected existing packages using both simulations and real data applications, demonstrating the excellent performance and flexibility of BayesBD even when the observation contains complicated structural information that may violate its assumptions.

Cite PDF Tweet

Published

Oct. 25, 2017

Received

Dec 23, 2016

Citation

Syring & Li, 2017

Volume

Pages

9/2

149 - 162


1 Introduction

Boundary estimation is an important problem in image analysis with wide-ranging applications from identifying tumors in medical images , classifying the process of machine wear by analyzing the boundary between normal and worn materials , to identifying regions of interest in satellite images, such as the boundary of Scotland’s Lake Menteith . Furthermore, boundaries present in epidemiological or ecological data may reflect the progression of a disease or an invasive species; see , , and .

There is a rich literature on image segmentation for both noise-free and noisy observations; see the surveys in , and particularly the Bayesian approaches in and . Recently, developed a flexible nonparametric Bayesian model to detect image boundaries, which achieved four aims of guaranteed geometric restriction, (nearly) minimax optimal rate adaptive to the smoothness level, convenience for joint inference, and computational efficiency. However, despite the theoretical soundness, the practical implementation of Li and Ghosal’s method is far from trivial, mostly in the approachability of the proposed nonparametric Bayesian framework and further improvement in the speed of posterior sampling algorithms, which becomes critical in attempts to popularize this approach in statistics and the broader scientific community. In this paper, we present the R package BayesBD which aims to fill this gap. The developed BayesBD package provides support for analyzing binary images and Gaussian-noised images, which commonly arise in many applications. We implement various options for posterior calculation including the Metropolis-Hastings sampler  and slice sampler . To further speed up the Markov Chain Monte Carlo (MCMC), we take advantage of the integration via RcppArmadillo  of R and the compiled C++ language. We further integrate the BayesBD package with shiny  to facilitate the usage of implemented boundary detection methods in real applications.

A far as we know, there are no other R packages for image boundary detection problems achieving the four goals mentioned above. An earlier version of the BayesBD package  provided first-of-its-kind tools for analyzing images, but support for Gaussian-noised images, C++ implementations, more choices of posterior samplers, and shiny integrations were not available until the current version. For example, the nested loops required for MCMC sampling were inefficient in R programming. The combination of new programming and faster sampling algorithms means that a typical simulation example consisting of 5000 posterior samples from 10,000 data points can now be completed in about one minute.

The rest of the paper is organized as follows. We first introduce the problem of statistical inference on boundaries of noisy images, the nonparametric Bayesian models in use, and posterior sampling algorithms. We then demonstrate how to use the main functions of the package for data analysis working with both the command line and shiny. We next conduct a comprehensive experiment on the comparison of sampling methods and coding platforms, scalability of BayesBD, and comparisons with existing packages including mritc, bayesImageS, and bayess. We illustrate a pair of real data analyses of medical and satellite images. The paper is concluded by a Summary section.

2 Statistical analysis of image boundaries

Image data

An image observed with noise may be represented by a set of data points or pixels (Yi,Xi)i=1n, where the intensities Yi are observed at locations XiX=[0,1]2. Following , we assume that there is a closed region ΓX such that the intensities Yi are distributed as follows conditionally on whether its location is inside Γ or in the background Γc: (1)Yi{f(;ξ)XiΓ;f(;ρ)XiΓc, where f is a given probability mass function or probability density function of a parametric family up to unknown parameters (ξ,ρ). For example, Figure 1 shows two simulated images where the parametric family is Bernoulli and Gaussian, respectively. These images can be reproduced using the functions par2obs, parnormobs, and plotBD which will be demonstrated in detail later on.

graphic without alt textgraphic without alt text

Figure 1: Left: a binary image generated using an elliptical boundary and parameters π1 = 0.65 and π2 = 0.35. Right: a Gaussian-noised image generated using a triangular boundary and parameters μ1 = 1, μ2 =  − 1, and σ1 = σ2 = 1. Both images have the size 100 × 100.

The parameter of interest is the boundary of the closed region γ:=Γ, which is assumed to be closed and smooth, while (ξ,ρ) are nuisance parameters. We make the following assumptions about the noisy image:

  1. The pixel locations Xi are sampled either completely randomly, i.e., Xii.i.d.Unif(X) or jitteredly randomly, i.e., X is first partitioned into blocks Xi using an equally-spaced grid and then locations are sampled XiUnif(Xi).

  2. The closed region Γ is star-shaped with a known reference point OΓ, i.e., the line segment joining O to any point in Γ is also in Γ.

  3. The boundary γ is an α smooth function, i.e., γCα(S) where Cα(S):={f:SR+,|f(α0)(x)f(α0)(y)|Lfxyαα0,x,yS} , where S is the unit circle, α0 is the largest integer strictly smaller than α, Lf is some positive constant, and is the Euclidean distance.

Assumptions 2 and 3 imply that the region of interest is star-shaped with a smooth boundary. While these assumptions are crucial to guarantee desirable asymptotic properties of the estimator implemented in BayesBD and are reasonable in many applications, it is certainly of great interest to investigate the performance of BayesBD when these assumptions are violated. In what follows, we study numerous examples that are not uncommon in practice but violate these assumptions to some extent, to demonstrate the flexibility of BayesBD and its capacity to handle practical images that may be much more complicated than the two-region setting with assumptions above. These examples include the triangular boundary in Figure 1 which has a piecewise smooth boundary. and thus violates Assumption 3, and three real data examples including the image of Lake Menteith 8 and two neuroimaging examples 6 where multiple regions are present and the region of interest has non-smooth or even discontinuous boundary.

Letting Θ be the parameter space of the parametric family f, conditions to separate the inside and outside parameters are needed. Examples of the parameter space Θ for (ξ,ρ) include but are not limited to:

In practice, the order restriction in 4A or 4B is often naturally obtained depending on the concrete problem. For instance, in brain oncology, a tumor often has higher intensity values than its surroundings in a positron emission tomography scan, while for astronomical applications objects of interest emit light and will be brighter. A more general condition for any parametric family can be referred to Condition (C) in .

It is worth noticing that although model (1) follows a two-region framework, the method in  and our developed BayesBD have the flexibility to handle data with multiple regions by running the two-region method iteratively, which is demonstrated in the neuroimaging application below.

A nonparametric Bayesian model for image boundaries

Let Y={Yi}i=1n and X={Xi}i=1n, then the likelihood of the image data described in (1) is L(Y|X,θ)=iI1f(Yi;ξ)iI2f(Yi;ρ), where I1={i:XiΓ}, I2={i:XiΓc}, and θ denotes the full parameter (ξ,ρ,γ).

We view γ as a curve mapping [0,2π]R+ and model it using a randomly rescaled Gaussian process prior on the circle S: γ(ω)GP(μ(ω),Ga(,)/τ) where the covariance kernel Ga(t1,t2)=exp(a2{(cos2πt1cos2πt2)2+(sin2πt1sin2πt2)2})=exp{4a2sin2(πt1πt2)} is the so-called squared exponential periodic kernel obtained by mapping the squared exponential kernel on unit interval [0,1] to the circle through Q:[0,1]S,ω(cos2πω,sin2πω) as in . The parameters a and τ control the smoothness and scale of the kernel, respectively. As shown in , the covariance kernel has the following closed form decomposition: Ga(t,t)=k=1vk(a)ψk(t)ψk(t) where v1(a)=e2a2I0(2a2);v2j(a)=v2j+1(a)=e2a2Ij(2a2),j1; and In(x) denotes the modified Bessel function of the first kind of order n and ψj(t) is the jth Fourier basis function in {1,cos2πt,sin2πt,...}. The above expansion allows us to write the boundary as a sum of basis functions: (2)γ(ω)=μ(ω)+k=1zkψk(ω), where zkN(0,vk(a)/τ). In practice, we truncate this basis function expansion using the first L functions, i.e., γ(ω)=μ(ω)+k=1Lzkψk(ω). In the BayesBD package, we use L=2J+1 with the default J=10, which seems adequate for accurate approximation of γ(w) as shown in , but users may specify a different value depending on the application.

We use a Gamma prior distribution Gamma(αa,βa) for the rescaling factor a. This random rescaling scheme is critical to obtain rate adaptive estimates without assuming the smoothness level α in Assumption 3 is known; see, for example,   and . The default values of hyperparameters are αa=2 and βa=1.

We use a constant function as the prior mean function μ(), with value determined by user input or by an initial maximum likelihood estimation. The other hyperparameter and parameters follow standard conjugate priors. Specifically, we use a Gamma distribution Gamma(ατ,βτ) prior for τ with default values ατ=500 and βτ=1. Priors for the nuisance parameters ξ and ρ depend on the parametric family f, which are:

The orders in OIB, OIN and OIG are provided by users if such information is available; otherwise, the ordered independent distributions revert to independent distributions. Our default specifications are chosen to make the corresponding prior distributions spread out. For example, in the BayesBD package, the default values are α1=β1=0 for binary images, and μ0=Y¯,σ0=103 and α2=β2=102 for Gaussian noise, where Y¯ is the same mean of all intensities. Under Assumptions 1–4,   proved that the nonparametric Bayes approach is (nearly) rate-optimal in the minimax sense, adaptive to unknown smoothness level α.

Posterior sampling and estimation of the boundary

Let z={zi}i=1L and Σa=diag(v1(a),,vL(a)). We use Metropolis-Hastings (MH) with the Gibbs sampler  to sample the joint distribution of the parameters (z,ξ,ρ,τ,a), where the MH step is for the vector parameter z. We also allow a slice sampling with the Gibbs sampler where slice sampling is used for z as in . We give the detailed sampling algorithms for binary image in Algorithm 1 and Gaussian-noised images in Algorithm 2. Comparisons between MH and slice sampling, along with other numerical performances are referred to in the section on Performance tests.

Initialize the parameters: z=0, τ=500, a=1, and μ() is taken to be constant, i.e. a circle. Then, initialize (ξ,ρ)=(π1,π2) by the maximum likelihood estimates given μ().

  1. At iteration t+1, sample z(t+1)|(π1(t),π2(t),τ(t),a(t),Y,X) one entry at a time, using either MH sampling and slice sampling, using the following logarithm of the conditional posterior density N1logπ1(t)(1π2(t))π2(t)(1π1(t))+n1log1π1(t)1π2(t)τ2(z(t))Σa(t)1z(t), where n1=i=1n1(ri<γi(t)) and N1=i=1n1(ri<γi(t))Yi; here (ri,ωi) are the polar coordinates of pixel location Xi and γi(t)=γ(t)(ωi) is the radius of the image boundary at iteration t and the ith pixel.

  2. Sample τ(t+1)|z(t+1),a(t)Gamma(α,β) where α=ατ+L/2 and β=βτ+(z(t+1))Σa(t)1z(t+1)/2.

  3. Sample (π1,π2)|(z,Y)OIB(α1+N1,β1+n1N1,α1+N2,β1+n2N2), where n2=i=1n1(riγi(t)) and N2=i=1n1(riγi(t))Yi.

  4. Sample a(t+1)|(z(t+1),τ(t+1)) by slice sampling using the logarithm of the conditional posterior density k=1Llogvk(a(t))2k=1Lτ(t+1)zk22vk(a(t))+(αa1)loga(t)βa.

Algorithm 1: – Binary images.

Initialize the parameters: z=0, τ=500, a=1, and μ() is taken to be constant, i.e. a circle. Then, initialize (ξ,ρ)=(π1,π2) by the maximum likelihood estimates given μ().

  1. At iteration t+1, sample z(t+1)|(μ1(t),σ1(t),μ2(t),σ2(t),τ(t),a(t),Y,X) one entry at a time, using either slice sampling or MH sampling, using the following logarithm of the conditional posterior density n1(logσ1(t)logσ2(t))iI1(Yiμ1(t))22(σ12)(t)iI2(Yiμ2(t))22(σ22)(t)τ(z(t))Σa(t)1z(t)2.

  2. Sample τ(t+1)|z(t+1),a(t) as in Algorithm 1.

  3. Sample (μ1,σ1,μ2,σ2)|(z,Y) conjugately.

    • Sample (σ12)(t+1) from a Gamma distribution with parameters α=α2+n12,β=β2+iI1(YiY¯(1))22+σ02n1n1+σ02(Y¯(1)μ0)22, where Y¯(1) is the sample mean of intensities in I1.

    • sample μ1(t+1) from a normal distribution with mean and standard deviation σ02μ0n1+σ02+n1y¯1n1+σ02,(n1+σ02)1/2.

    • Sample (σ22)(t+1) and μ2(t+1) analogously.

    • If ordering information is available, sort (μ1(t+1),μ2(t+1)) and (σ1(t+1),σ2(t+1)) accordingly.

  4. Sample a(t+1)|(z(t+1),τ(t+1)) as in Algorithm 1.

Algorithm 2: – Gaussian-noised images.

Let {γt(ω)}t=1T be the posterior samples after burn-in where T is the number of posterior samples. We use the posterior mean as the estimate and construct a variable-width uniform credible band. Specifically, let (γ^(ω),s^(ω)) be the posterior mean and standard deviation functions derived from {γt(ω)}. For the tth MCMC run, we calculate the distance ut=(γtγ^)/s=supω{|γt(ω)γ^(ω)|/s^(ω)} and obtain the 95th percentile of all the ut’s, denoted as L0. Then a 95% uniform credible band is given by [γ^(ω)L0s^(ω),γ^(ω)+L0s^(ω)].

3 Analysis of image boundaries using BayesBD from the command line

There are three steps to our Bayesian image boundary analysis: load the image data into R in the appropriate format, use the functions provided to sample from the joint posterior of the full parameter θ=(ξ,ρ,γ), and summarize the posterior samples both numerically and graphically.

Generating image data

Two functions are included in BayesBD to facilitate data simulation for numerical experiments: par2obs for binary images and parnormobs for Gaussian-noised images. Table 1 describes the function arguments to par2obs, which returns sampled intensities and pixel locations in both Euclidean and polar coordinates. The function parnormobs is similar, with the replacement of arguments pi.in and pi.out by mu.in, mu.out, sd.in, and sd.out corresponding to parameters μ1, μ2, σ1, and σ2, respectively.

As a demonstration, the following code generates a 100×100 binary image of an ellipse using a jitteredly-random design with a reference point of (0.5, 0.5):

> gamma.fun = ellipse(a = 0.35, b = 0.25)
> bin.obs = par2obs(m = 100, pi.in = 0.6, pi.out = 0.4, design = 'J',
+ gamma.fun, center = c(0.5, 0.5))

Similarly, the following code generates a 100×100 Gaussian-noised image with a triangle boundary using a random uniform design with a reference point of (0.5, 0.5):

> gamma.fun = triangle2(0.5)
> norm.obs = parnormobs(m = 100, mu.in = 1, mu.out = -1, sd.in = 1, 
+ sd.out = 1, design = 'U', gamma.fun, center = c(0.5, 0.5))

The output of either par2obs or parnormobs is a list containing the intensities Y in a vector named intensity, the vectors theta.obs and r.obs containing the polar coordinates of the pixel locations X, and the reference point contained in a vector named center. Image data to be analyzed using BayesBD can be in a list of this form, or may be a .png or .jpg image file.

Table 1: Arguments of the par2obs function.
Argument Description
m Generate m×m observations over the unit square.
pi.in The success probability, P(Yi=1), if XiI1.
pi.out The success probability, P(Yi=1), if XiI2.
design Determines how locations Xi are determined: ‘D’ for deterministic
(equally-spaced grid) design, ‘U’ for completely uniformly random,
or ‘J’ for jitteredly random design.
center Two-dimensional vector of Euclidean coordinates (x,y) of the reference
point in I1.
gamma.fun A function to generate boundaries, e.g. ellipse or triangle2.

Analysis and visualization

There are two functions to draw posterior samples following Algorithm 1 and 2 based on images either simulated or provided by users: fitBinImage for binary images, and fitContImage for Gaussian-noised images. These sampling functions take the same arguments, with the exception of the ordering input which is duplicated in fitContImage to allow ordering of the two parameters, i.e., the mean and standard deviation. The inputs for fitBinImage are summarized in Table 2. We have included a function rectToPolar to facilitate formatting the image data for fitBinImage and fitContImage by converting the rectangular coordinates of the pixels to polar coordinates. The initial boundary is a circle with radius inimean and center center. The radius inimean may be specified by the user or left blank, in which case it will be estimated using maximum likelihood.

Table 2: Arguments of the fitBinImage function.
image The noisy observation, either a list with elements:
intensity, a vector of intensities;
theta.obs a vector of pixel radian measure from center;
r.obs a vector of pixel radius measure from center;
or a string giving the path to a .png or .jpg file.
gamma.fun the true boundary, if known, used for plotting.
center the reference point in Euclidean coordinates.
inimean A constant specifying the initial boundary μ. Defaults to NULL,
in which case μ is estimated automatically using maximum likelihood.
nrun The number of MCMC runs to keep for estimation.
nburn The number of initial MCMC runs to discard.
J The number of eigenfunctions to use in estimation is 2J+1.
ordering Indicates which Bernoulli distribution has higher success probability:
"I", the Bernoulli distribution inside the boundary;
"O", ther Bernoulli distribution outside the boundary;
"N", no ordering information is available.
mask Logical vector (same length as obs$intensity) to indicate region of interest.
Should this data point be included in the analysis? Defaults to NULL
and uses all data points.
slice Should slice sampling be used to sample Fourier basis function coefficients?
outputAll Should all posterior samples be returned?

If argument outputAll is FALSE, the functions fitBinImage and fitContImage produce output vectors theta, estimate, upper, and lower giving a grid of 200 values on [0,2π] on which the boundary γ is estimated, along with 95% uniform credible bands. If argument outputAll is TRUE, the functions also return posterior samples of (ξ,ρ) and Fourier basis function coefficients z.

Following the examples of data simulation for binary and Gaussian-noised images in the previous section, we can obtain posterior samples via

> bin.samples= fitBinImage(bin.obs, c(.5, .5), NULL, 4000, 1000, 10, "I", NULL, TRUE,                    
+ FALSE)

for a binary image and

> norm.samples = fitContImage(norm.obs, c(.5, .5), NULL, 4000, 1000, 10, "I", "N", NULL,
TRUE, FALSE)

for a Gaussian-noised image. For each sampling function, we have set the center of the image as (0.5,0.5), instructed the sampler to use a mean function of μ()=0.4, keep 4000 samples, burn 1000 samples, use L=2×10+1=21 basis functions to model γ, use slice sampling for the basis function coefficients, and return only the plotting results.

Using the function plotBD we can easily construct plots of our model results. There are two arguments to plotBD: fitted.image is a list of results from fitBinImage or fitContImage, and plot.type which indicates whether to plot the image data only, the posterior mean and credible bands, or the posterior mean overlaid on the image data. Using the posterior samples we obtained from the Gaussian-noised image above, we can plot our results with the following code

> par(mfrow = c(1, 2))
> plotBD(norm.samples, 1)
> plotBD(norm.samples, 2)

to produce Figure 2.

graphic without alt text
Figure 2: Output from fitBinImage as plotted using plotBD. The plot on the left is the simulated binary observation, and the plot on the right is the estimated boundary and 95% uniform credible bands.

4 BayesBD shiny app

In order to reach a broad audience of users, including those who may not be familiar with R, we have created a shiny app version of BayesBD to implement the boundary detection method. The app has the full functionality to reproduce the simulations in Section 5 and conduct real data applications using images uploaded by users. The app is accessible both from the package by running the code BayesBDshiny() in an R session or externally by visiting https://syring.shinyapps.io/shiny/. With the app users can analyze real data or produce a variety of simulations.

Once the app is open, users are presented with an array of inputs to set as illustrated in Figure 3. In order to analyze real data, the user should select "user continuous image" or "user binary image" from the second drop-down menu. Next, the user inputs the system path to the .png or .jpg file. A plot of the image will appear and the user will be prompted to identify the image center with a mouse click. The user has some control over the posterior sample size, but we recommend to first limit the number of available samples in order to display results quickly. Finally, the user may enter ordering information for the mean and variance of pixel intensities at the bottom of the display.

For simulations, we first select either an elliptical or triangular boundary, or upload an R script with a custom boundary function. Next, the user instructs the app to either simulate a binary or Gaussian-noised image, or to use binary or Gaussian data the user has uploaded. In addition to the boundary function, the user specifies the reference point. Sample sizes for simulations are kept at 100×100 pixels. Finally, the last several inputs allow the user to customize the intensity parameters (ξ,ρ) for binary and Gaussian simulations. Once the user has selected all settings, clicking the "Update" button at the bottom of the window will run the posterior sampling algorithm, which should take less than a minute on a typical computer for image data of 100×100 pixels. The "Download" button provides the user with a file indicating which pixels were contained inside the estimated boundary as determined by the outer edge of the 95% uniform credible bands.

graphic without alt text
Figure 3: Screenshot of shiny app implementing BayesBD.

5 Performance tests

Comparison of sampling methods

The main aim of this section is to highlight the speed improvements we have made in the latest version of BayesBD. Our flexibility in choosing between slice and Metropolis-Hastings (MH) sampling algorithms gives users the potential to unlock efficiency gains. We highlight these gains below for both binary and Gaussian-noised simulations, and note that the faster MH method suffers little in accuracy.

In our performance tests, we consider the following examples, which correspond to examples B2, B3, and G1 from .

  1. Image is an ellipse centered at (0.1,0.1) and rotated 60 counterclockwise. Intensities are binary with π1=0.5 and π2=0.2.

  2. Image is an equilateral triangle centered at (0,0) with height 0.5. Intensities are binary with π1=0.5 and π2=0.2.

  3. Image is an ellipse centered at (0.1,0.1) and rotated 60 counterclockwise. Intensities are Gaussian with μ1=4, σ1=1.5, μ2=1, and σ2=1.

We simulated each case 100 times using n=100×100 observations per simulation. In each run, we sampled 4000 times from the posterior after a 1000 sample burn in. The results of our performance tests comparing slice with MH sampling are summarized in Table 3.

If Metropolis-Hastings sampling is used instead of slice sampling, we observe a speed up by about a factor of two for binary images, seven for the Gaussian-noised image. Slice sampling is guaranteed to produce unique posterior samples, and may give better results than Metropolis-Hastings samplers, especially when Metropolis-Hastings mixes poorly and produces many repeated samples. However, slice sampling may involve a very large number of proposed samples for each accepted sample, requiring many likelihood evaluations. On the other hand, each Metropolis Hastings sample requires only two evaluations of the likelihood.

To measure the accuracy of BayesBD we use three metrics: the Lebesgue error, which is simply the area of the symmetric difference between the posterior mean boundary and the true boundary; the Dice Similarity Coefficient(DSC), see ; and Hausdorff distance, see and . We have included the utility functions lebesgueError, dsmError, and hausdorffError, which take as input the output of either fitBinImage or fitContImage and output the corresponding error. In the binary image examples considered, the different sampling algorithms did not affect the accuracy of the posterior mean boundary estimates when measured by Lebesgue error, i.e., the area of the symmetric difference between the estimated boundary and the true boundary. For the Gaussian-noised image, the slice sampling method produced Lebesgue errors approximately an order of magnitude smaller than when using Metropolis-Hastings sampling, but the overall size of the errors was still small in both cases and practically indistinguishable when plotting results. With our built-in functions, it is easy to reproduce Example S2 in Table 3 with the following code:

> gamma.fun = triangle2(0.5)
> for(i in 1:100){
+   norm.obs = par2obs(m = 100, pi.in = 0.5, pi.out = 0.2, design = 'J',
+                      center = c(0.5, 0.5), gamma.fun)
+   norm.samp.MH = fitBinImage(norm.obs, gamma.fun, NULL, NULL, 4000, 1000,
+                          10, "I", rep(1, 10000), FALSE, FALSE)
+   norm.samp.slice = fitBinImage(norm.obs, gamma.fun, NULL, NULL, 4000, 
+                      1000, 10, "I", rep(1, 10000), TRUE, FALSE)
+   print(c(dsmError(norm.samp.MH), hausdorffError(norm.samp.MH),   
+           lebesgueError(norm.samp.MH), dsmError(norm.samp.slice), 
+       hausdorffError(norm.samp.slice), lebesgueError(norm.samp.slice)))
+ }
Table 3: Average Runtimes (in seconds) and Lebesgue errors (with standard deviations) of posterior mean boundary estimates using C++.
Example Sampling Method Runtime (s) Lebesgue Error DSC Hausdorff
S1 MH 58 0.01(0.01) 0.02(0.00) 0.02(0.00)
Slice 100 0.01(0.00) 0.02(0.00) 0.02(0.00)
*S2 MH 45 0.02(0.00) 0.09(0.02) 0.09(0.01)
Slice 82 0.02(0.00) 0.09(0.01) 0.09(0.01)
S3 MH 66 0.01(0.01) 0.01(0.01) 0.01(0.01)
Slice 488 0.00(0.00) 0.01(0.01) 0.01(0.01)

Comparison of coding platforms

Our use of C++ for posterior sampling has led to very significant efficiency gains over using R alone. Implementation of C++ with R is streamlined using Rcpp and RcppArmadillo.

The first implementation of BayesBD (version 0.1 in ) was entirely written in R. The results in Table 4 labeled R code reflect this first version of the package, while those labeled C++ code are from the new version. The main takeaway is that the new code developed using C++ is at least three times faster in the binary image examples and over six times faster for the Gaussian-noised image example, both measured while using slice sampling.

Table 4: Average run times (in seconds) using slice sampling.
Example Coding Method Runtime (s)
S1 C++ 100
R 375
S2 C++ 82
R 246
S3 C++ 488
R 3327

Scalability of BayesBD

The Gaussian process is notorious for scaling poorly as n increases because it is usually necessary to invert of a large covariance matrix. By utilizing the analytical decomposition of a GP kernel in (2), we eliminates the step of inverting a n by n covariance matrix and the BayesBD package appears to achieve a linear complexity. We investigate the scalability of BayesBD by plotting the system time against sample size for fitBinImage and fitContImage in Figure 4 using a triangular boundary curve and 5000 MCMC iterations. Both algorithms appear to scale approximately linearly in number of pixels, which makes sense as the costliest computations in Steps 1 and 3 in Algorithms 1 and 2 only involve sums over the n pixels.

graphic without alt textgraphic without alt text

Figure 4: Left: Runtimes of fitBinImage. Right: Runtimes of fitContImage

Comparison with existing packages

Although no packages besides BayesBD provide boundary estimation, there are several existing packages that can provide image segmentation or filtering. Below we make some qualitative comparisons between BayesBD and mritc, bayesImageS, and bayess; see , , and in a later section. Figure 5 compares these packages using two simulated images with Bernoulli and Gaussian noise, respectively. BayesBD gives very reasonable estimates for the true boundaries; mritc package fails to deliver a recognizable smoothed image in either example; and bayesImageS was able to produce a very clear segmentation for the Gaussian-noised ellipse example, but not for the triangle image with Bernoulli noise.

graphic without alt text graphic without alt text
Figure 5: Left to right: the image, the BayesBD boundary estimate, the mritc filtered image, and the bayesImageS filtered image.

6 Real data application

Medical imaging

studied the performance of two different tracers used in conjunction with positron emission tomography (PET) scans in identifying brain tumors. Figure 6, reproduced from , gives an example of the image data used in diagnosing tumors, and demonstrates their conclusion that the F-FDOPA tracer provides a more informative PET scan than the F-FDA tracer. We use the BayesBD package to analyze the F-FDOPA PET scan images in Figure 6. The tumor imaging data along with sample code for reproducing the following analysis can be found in the documentation to the BayesBD package.

graphic without alt text
Figure 6: MRI (left), F-FDG PET (middle), and F-FDOPA PET (right) of glioblastoma (A) and grade II oligodendroglioma (B). Image taken from .

We convert the two F-FDOPA PET images in Figure 6 into 111×111-pixel images and normalize the intensities to the interval [0,10]. The pixel coordinates are a grid on [0,1]×[0,1] and we choose reference points (0.7,0.5) and (0.4,0.55) for each image, roughly corresponding to the center of the darkest part of each image. We use the default mean function, choose J=10 for 21 basis functions, and sample 4000 times after a 1000 burn-in using MH sampling.

graphic without alt text graphic without alt text
A. Glioblastoma B. Grade II oligodendroglioma
Figure 7: F-FDOPA PET images from Chen et al. (2006) (left) fit twice, and (right) fit three times to filter the background and find features at increasing granularity.

Figure 7 displays posterior mean boundary estimates for the F-FDOPA images in Figure 6. From the analysis on glioblastoma (A) in the first two plots, it seems that that we accurately capture the regions of interest in the F-FDOPA PET images. Furthermore, it is expected that the Gaussian assumption on the real data may fail, and this shows that the method implemented in BayesBD is robust to model misspecifications, thus practically useful.

Tumor heterogeneity, which is not unusual in many applications, may make the boundary detection problem more challenging . The BayesBD package allows us to address tumor heterogeneity by a repeated implementation. We first apply fitContImage to the entire image, which includes a white background not of interest, and produce the estimated boundary. This step succeeds in separating the brain scan from the white background. A second run is performed on the subset of the image inside the outer 95% uniform credible band, producing a nested boundary. In general, this technique can be used in a multiple region setting where the data displays more heterogeneity than the simple "image and background" setup in (1).

Satellite imaging

We compared the performance of BayesBD with the R packages mritc and bayess using an image of Lake Menteith available in the bayess package. BayesBD gives a very reasonable estimate for the boundary of the lake even though it is not smooth. The mritc package again does not provide useful output in this example, but bayess produces a nicely-segmented image; see Figure 8.

graphic without alt text
Figure 8: Left to right: the image, the BayesBD boundary estimate, the mritc filtered image, and the bayess segmented image.

7 Summary

BayesBD is a new computational platform for image boundary analysis with many advantages over existing software. The underlying methods in functions fitBinImage and fitContImage are based on theoretical results ensuring their dependability over a range of problems. Our use of Rcpp and RcppArmadillo help make BayesBD much faster than base R code and further speed can be gained by our flexible sampling algorithms. Finally, our integration with shiny provides users with an easy way to utilize our package without having to write R code.

For the latest updates to BayesBD and requests, readers are recommended to check out the package page at CRAN or refer to the Github page at https://github.com/nasyring/GSOC-BayesBD.

8 Acknowledgments

This work was partially supported by the Google Summer of Code program. We thank the Associate Editor and one anonymous referee for comprehensive and constructive comments that helped to improve the paper.

CRAN packages used

BayesBD, RcppArmadillo, shiny

CRAN Task Views implied by cited packages

NumericalMathematics, WebTechnologies

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Footnotes

    References

    R. S. Ananda and T. Thomas. Automatic segmentation framework for primary tumors from brain MRIs using morphological filtering techniques. In Biomedical engineering and informatics (BMEI), 2012 5th international conference on, pages. 238–242 2012. IEEE.
    M. Basu. Gaussian-based edge-detection methods—a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 32(3): 252–260, 2002.
    S. Bhardwaj and A. Mittal. A survey on various edge detector techniques. Procedia Technology, 4: 220–226, 2012.
    W. Chen, D. H. S. Silverman, D. Sibylle, J. Czernin, N. Kamdar, W. Pope, N. Satyamurthy, C. Schiepers and T. Cloughesy. F-FDOPA PET imaging of brain tumors: Comparison study with F-FDG PET evaluation of diagnostic accuracy. Journal of Nuclear Medicine, 47(6): 904–911, 2006.
    L. Cucala and J.-Michel. Marin. “Bayesian inference on a mixture model with spatial dependence. J. Comput. Graph. Stat., 22(3): 584–597, 2014.
    D. Eddelbuettel. Seamless R and C++ integration with Rcpp. New York: Springer-Verlag, 2013. ISBN 978-1-4614-6867-7.
    D. Eddelbuettel and C. Sanderson. RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational Statistics & Data Analysis, 71: 1054–1063, 2014. URL http://dx.doi.org/10.1016/j.csda.2013.02.005.
    D. Feng and L. Tierney. Mritc: MRI tissue classification. 2015. URL https://CRAN.R-project.org/package=mritc.
    M. C. Fitzpatrick, E. L. Preisser, A. Porter, J. Elkinton, L. A. Waller, B. P. Carlin and A. M. Ellison. Ecological boundary detection using Bayesian areal wombling. Ecology, 91(12): 3448–3455, 2010. URL http://www.jstor.org/stable/29779525.
    S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6): 721–741, 1984.
    U. Grenander and M. I. Miller. Pattern theory: From representation to inference. New York: Oxford University Press, 2007.
    W. K. Hastings. Monte carlo sampling methods using Markov chains and their applications. Biometrika, 57(1): 97–109, 1970.
    G. H. Heppner. Tumor heterogeneity. Cancer research, 44: 2259–2265, 1984.
    M. A. Hurn, O. K. Husby and H. Rue. Advances in Bayesian image analysis. In Highly structured stochastic systems, Eds P. J. Green, N. L. Hjort and S. Richardson pages. 301–322 2003. Oxford: Oxford University Press.
    C. Li, C. Xu, C. Gui and M. D. Fox. Distance regularized level set evolution and its application to image segmentation. IEEE Trans. Image Processing, 19(12): 3243–3254, 2010.
    M. Li. BayesBD: Bayesian boundary detection in images. 2015. R package version 0.1.
    M. Li and S. Ghosal. Bayesian detection of image boundaries. 2015. URL https://arxiv.org/abs/1508.05847.
    H. Lu and B. P. Carlin. Bayesian areal wombling for geographical boundary analysis. Geographical Analysis, 37(3): 265–285, 2005. URL http://proxying.lib.ncsu.edu/index.php?url=/docview/289033244?accountid=12725.
    D. J. MacKay. Introduction to Gaussian processes. NATO ASI Series F Computer and Systems Sciences, 168: 133–166, 1998.
    R. Maini and H. Aggarwal. Study and comparison of various image edge detection techniques. International Journal of Image Processing, 3(1): 1–11, 2009.
    J.-M. Marin and C. P. Robert. Bayesian essentials with R. Springer-Verlag, 2014.
    M. Moores, K. Mengersen and D. Feng. Bayesian methods for image segmentation using a Potts model. 2017. URL https://CRAN.R-project.org/package=bayesImageS.
    R. Neal. Slice sampling. The Annals of Statistics, 31(3): 705–767, 2003.
    C. P. Robert and J.-M. Marin. Bayesian essentials with R. 2015. URL https://CRAN.R-project.org/package=bayess.
    RStudio, Inc. Easy web applications in r. 2016. URL: http://www.rstudio.com/shiny/.
    N. Syring and M. Li. BayesBD: Bayesian inference for image boundaries. 2017. R package version 1.2.
    R. F. Thompson. RadOnc: An R package for analysis of dose-volume histogram and three- dimensional structural data. J. Rad. Onc. Info., 6(1): 98–100, 2014.
    R. F. Thompson. RadOnc: Analytical tools for radiation oncology. 2017. URL https://CRAN.R-project.org/package= RadOnc.
    A. W. van der Vaart and J. H. van Zanten. Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. Ann. Statist., 37(5B): 2655–2675, 2009. URL https://doi.org/10.1214/08-aos678.
    L. A. Waller and C. A. Gotway. Applied spatial statistics for public health data. John Wiley & Sons, 2004. URL https://doi.org/10.1002/0471662682.
    W. Yuan, K. S. Chin, M. Hua, G. Dong and C. Wang. Shape classification of wear particles by image boundary analysis using machine learning algorithms. Mechanical Systems and Signal Processing, 72-73: 346–358, 2016.
    D. Ziou and S. Tabbone. Edge detection techniques – an overview. Pattern Recognition and Image Analysis C/C of Raspoznavaniye Obrazov I Analiz Izobrazhenii, 8: 537–559, 1998.

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Syring & Li, "BayesBD: An R Package for Bayesian Inference on Image Boundaries", The R Journal, 2017

    BibTeX citation

    @article{RJ-2017-052,
      author = {Syring, Nicholas and Li, Meng},
      title = {BayesBD: An R Package for Bayesian Inference on Image Boundaries},
      journal = {The R Journal},
      year = {2017},
      note = {https://rjournal.github.io/},
      volume = {9},
      issue = {2},
      issn = {2073-4859},
      pages = {149-162}
    }