We present the BayesBD package providing Bayesian inference for boundaries of noisy images. The BayesBD package implements flexible Gaussian process priors indexed by the circle to recover the boundary in a binary or Gaussian noised image. The boundary recovered by BayesBD has the practical advantages of guaranteed geometric restrictions and convenient joint inferences under certain assumptions, in addition to its desirable theoretical property of achieving (nearly) minimax optimal rate in a way that is adaptive to the unknown smoothness. The core sampling tasks for our model have linear complexity, and are implemented in C++ for computational efficiency using packages Rcpp and RcppArmadillo. Users can access the full functionality of the package in both the command line and the corresponding shiny application. Additionally, the package includes numerous utility functions to aid users in data preparation and analysis of results. We compare BayesBD with selected existing packages using both simulations and real data applications, demonstrating the excellent performance and flexibility of BayesBD even when the observation contains complicated structural information that may violate its assumptions.
Boundary estimation is an important problem in image analysis with wide-ranging applications from identifying tumors in medical images (Li et al. 2010), classifying the process of machine wear by analyzing the boundary between normal and worn materials (Yuan et al. 2016), to identifying regions of interest in satellite images, such as the boundary of Scotland’s Lake Menteith (Cucala and Marin 2014; Marin and Robert 2014). Furthermore, boundaries present in epidemiological or ecological data may reflect the progression of a disease or an invasive species; see Waller and Gotway (2004), Lu and Carlin (2005), and Fitzpatrick et al. (2010).
There is a rich literature on image segmentation for both noise-free and noisy observations; see the surveys in Ziou and Tabbone (1998; Basu 2002; Maini and Aggarwal 2009; Bhardwaj and Mittal 2012), and particularly the Bayesian approaches in Hurn et al. (2003) and Grenander and Miller (2007). Recently, Li and Ghosal (2015) developed a flexible nonparametric Bayesian model to detect image boundaries, which achieved four aims of guaranteed geometric restriction, (nearly) minimax optimal rate adaptive to the smoothness level, convenience for joint inference, and computational efficiency. However, despite the theoretical soundness, the practical implementation of Li and Ghosal’s method is far from trivial, mostly in the approachability of the proposed nonparametric Bayesian framework and further improvement in the speed of posterior sampling algorithms, which becomes critical in attempts to popularize this approach in statistics and the broader scientific community. In this paper, we present the R package BayesBD (Syring and Li 2017) which aims to fill this gap. The developed BayesBD package provides support for analyzing binary images and Gaussian-noised images, which commonly arise in many applications. We implement various options for posterior calculation including the Metropolis-Hastings sampler (Hastings 1970) and slice sampler (Neal 2003). To further speed up the Markov Chain Monte Carlo (MCMC), we take advantage of the integration via RcppArmadillo (Eddelbuettel 2013; Eddelbuettel and Sanderson 2014) of R and the compiled C++ language. We further integrate the BayesBD package with shiny (RStudio, Inc 2016) to facilitate the usage of implemented boundary detection methods in real applications.
A far as we know, there are no other R packages for image boundary
detection problems achieving the four goals mentioned above. An earlier
version of the BayesBD package (Li 2015) provided
first-of-its-kind tools for analyzing images, but support for
Gaussian-noised images, C++ implementations, more choices of posterior
samplers, and shiny integrations were not available until the current
version. For example, the nested loops required for MCMC sampling were
inefficient in R programming. The combination of new programming and
faster sampling algorithms means that a typical simulation example
consisting of
The rest of the paper is organized as follows. We first introduce the problem of statistical inference on boundaries of noisy images, the nonparametric Bayesian models in use, and posterior sampling algorithms. We then demonstrate how to use the main functions of the package for data analysis working with both the command line and shiny. We next conduct a comprehensive experiment on the comparison of sampling methods and coding platforms, scalability of BayesBD, and comparisons with existing packages including mritc, bayesImageS, and bayess. We illustrate a pair of real data analyses of medical and satellite images. The paper is concluded by a Summary section.
An image observed with noise may be represented by a set of data points
or pixels par2obs
,
parnormobs
, and plotBD
which will be demonstrated in detail later
on.
The parameter of interest is the boundary of the closed region
The pixel locations
The closed region
The boundary
Assumptions 2 and 3 imply that the region of interest is star-shaped with a smooth boundary. While these assumptions are crucial to guarantee desirable asymptotic properties of the estimator implemented in BayesBD and are reasonable in many applications, it is certainly of great interest to investigate the performance of BayesBD when these assumptions are violated. In what follows, we study numerous examples that are not uncommon in practice but violate these assumptions to some extent, to demonstrate the flexibility of BayesBD and its capacity to handle practical images that may be much more complicated than the two-region setting with assumptions above. These examples include the triangular boundary in Figure 1 which has a piecewise smooth boundary. and thus violates Assumption 3, and three real data examples including the image of Lake Menteith 8 and two neuroimaging examples 6 where multiple regions are present and the region of interest has non-smooth or even discontinuous boundary.
Letting
One-parameter family such as Bernoulli, Poisson, exponential
distributions, and
Two-parameter family such as Gaussian distributions, and
In practice, the order restriction in 4A or 4B is often naturally obtained depending on the concrete problem. For instance, in brain oncology, a tumor often has higher intensity values than its surroundings in a positron emission tomography scan, while for astronomical applications objects of interest emit light and will be brighter. A more general condition for any parametric family can be referred to Condition (C) in (Li and Ghosal 2015).
It is worth noticing that although model (1) follows a two-region framework, the method in (Li and Ghosal 2015) and our developed BayesBD have the flexibility to handle data with multiple regions by running the two-region method iteratively, which is demonstrated in the neuroimaging application below.
Let
We view
We use a Gamma prior distribution Gamma(
We use a constant function as the prior mean function
For binary images: the parameters are the probabilities
For Gaussian noise: the parameters are the mean and standard
deviation
The orders in OIB, OIN and OIG are provided by users if such information
is available; otherwise, the ordered independent distributions revert to
independent distributions. Our default specifications are chosen to make
the corresponding prior distributions spread out. For example, in the
BayesBD package, the default values are
Let
Initialize the parameters:
At iteration
Sample
Sample
Sample
Initialize the parameters:
At iteration
Sample
Sample
Sample
sample
Sample
If ordering
information is available, sort
Sample
Let
There are three steps to our Bayesian image boundary analysis: load the
image data into R in the appropriate format, use the functions provided
to sample from the joint posterior of the full parameter
Two functions are included in BayesBD to facilitate data simulation
for numerical experiments: par2obs
for binary images and parnormobs
for Gaussian-noised images. Table 1 describes the
function arguments to par2obs
, which returns sampled intensities and
pixel locations in both Euclidean and polar coordinates. The function
parnormobs
is similar, with the replacement of arguments pi.in
and
pi.out
by mu.in
, mu.out
, sd.in
, and sd.out
corresponding to
parameters
As a demonstration, the following code generates a
> gamma.fun = ellipse(a = 0.35, b = 0.25)
> bin.obs = par2obs(m = 100, pi.in = 0.6, pi.out = 0.4, design = 'J',
+ gamma.fun, center = c(0.5, 0.5))
Similarly, the following code generates a
> gamma.fun = triangle2(0.5)
> norm.obs = parnormobs(m = 100, mu.in = 1, mu.out = -1, sd.in = 1,
+ sd.out = 1, design = 'U', gamma.fun, center = c(0.5, 0.5))
The output of either par2obs
or parnormobs
is a list containing the
intensities intensity
, the vectors theta.obs
and r.obs
containing the polar coordinates of the pixel locations center
. Image data
to be analyzed using BayesBD can be in a list of this form, or may be
a .png
or .jpg
image file.
Argument | Description |
---|---|
m |
Generate |
pi.in |
The success probability, |
pi.out |
The success probability, |
design |
Determines how locations |
(equally-spaced grid) design, ‘U’ for completely uniformly random, | |
or ‘J’ for jitteredly random design. | |
center |
Two-dimensional vector of Euclidean coordinates (x,y) of the reference |
point in |
|
gamma.fun |
A function to generate boundaries, e.g. ellipse or triangle2. |
There are two functions to draw posterior samples following Algorithm 1
and 2 based on images either simulated or provided by users:
fitBinImage
for binary images, and fitContImage
for Gaussian-noised
images. These sampling functions take the same arguments, with the
exception of the ordering
input which is duplicated in fitContImage
to allow ordering of the two parameters, i.e., the mean and standard
deviation. The inputs for fitBinImage
are summarized in
Table 2. We have included a function rectToPolar
to
facilitate formatting the image data for fitBinImage
and
fitContImage
by converting the rectangular coordinates of the pixels
to polar coordinates. The initial boundary is a circle with radius
inimean
and center center
. The radius inimean
may be specified by
the user or left blank, in which case it will be estimated using maximum
likelihood.
image |
The noisy observation, either a list with elements: |
intensity , a vector of intensities; |
|
theta.obs a vector of pixel radian measure from center ; |
|
r.obs a vector of pixel radius measure from center ; |
|
or a string giving the path to a .png or .jpg file. |
|
gamma.fun |
the true boundary, if known, used for plotting. |
center |
the reference point in Euclidean coordinates. |
inimean |
A constant specifying the initial boundary |
in which case |
|
nrun |
The number of MCMC runs to keep for estimation. |
nburn |
The number of initial MCMC runs to discard. |
J |
The number of eigenfunctions to use in estimation is |
ordering |
Indicates which Bernoulli distribution has higher success probability: |
"I" , the Bernoulli distribution inside the boundary; |
|
"O" , ther Bernoulli distribution outside the boundary; |
|
"N" , no ordering information is available. |
|
mask |
Logical vector (same length as obs $intensity ) to indicate region of interest. |
Should this data point be included in the analysis? Defaults to NULL | |
and uses all data points. | |
slice |
Should slice sampling be used to sample Fourier basis function coefficients? |
outputAll |
Should all posterior samples be returned? |
If argument outputAll
is FALSE
, the functions fitBinImage
and
fitContImage
produce output vectors theta
, estimate
, upper
, and
lower
giving a grid of 200 values on outputAll
is TRUE
, the functions also return posterior
samples of
Following the examples of data simulation for binary and Gaussian-noised
images in the previous section, we can obtain posterior samples via
> bin.samples= fitBinImage(bin.obs, c(.5, .5), NULL, 4000, 1000, 10, "I", NULL, TRUE,
+ FALSE)
for a binary image and
> norm.samples = fitContImage(norm.obs, c(.5, .5), NULL, 4000, 1000, 10, "I", "N", NULL,
TRUE, FALSE)
for a Gaussian-noised image. For each sampling function, we have set the
center of the image as
Using the function plotBD
we can easily construct plots of our model
results. There are two arguments to plotBD
: fitted.image
is a list
of results from fitBinImage
or fitContImage
, and plot.type
which
indicates whether to plot the image data only, the posterior mean and
credible bands, or the posterior mean overlaid on the image data. Using
the posterior samples we obtained from the Gaussian-noised image above,
we can plot our results with the following code
> par(mfrow = c(1, 2))
> plotBD(norm.samples, 1)
> plotBD(norm.samples, 2)
to produce Figure 2.
fitBinImage
as plotted using plotBD
. The
plot on the left is the simulated binary observation, and the plot on
the right is the estimated boundary and 95% uniform credible
bands.In order to reach a broad audience of users, including those who may not
be familiar with R, we have created a shiny app version of BayesBD
to implement the boundary detection method. The app has the full
functionality to reproduce the simulations in Section 5 and
conduct real data applications using images uploaded by users. The app
is accessible both from the package by running the code BayesBDshiny()
in an R session or externally by visiting
https://syring.shinyapps.io/shiny/. With the app users can analyze
real data or produce a variety of simulations.
Once the app is open, users are presented with an array of inputs to set
as illustrated in Figure 3. In order to analyze real
data, the user should select "user continuous image" or "user binary
image" from the second drop-down menu. Next, the user inputs the system
path to the .png
or .jpg
file. A plot of the image will appear and
the user will be prompted to identify the image center with a mouse
click. The user has some control over the posterior sample size, but we
recommend to first limit the number of available samples in order to
display results quickly. Finally, the user may enter ordering
information for the mean and variance of pixel intensities at the bottom
of the display.
For simulations, we first select either an elliptical or triangular
boundary, or upload an R script with a custom boundary function. Next,
the user instructs the app to either simulate a binary or
Gaussian-noised image, or to use binary or Gaussian data the user has
uploaded. In addition to the boundary function, the user specifies the
reference point. Sample sizes for simulations are kept at
The main aim of this section is to highlight the speed improvements we have made in the latest version of BayesBD. Our flexibility in choosing between slice and Metropolis-Hastings (MH) sampling algorithms gives users the potential to unlock efficiency gains. We highlight these gains below for both binary and Gaussian-noised simulations, and note that the faster MH method suffers little in accuracy.
In our performance tests, we consider the following examples, which correspond to examples B2, B3, and G1 from Li and Ghosal (2015).
Image is an ellipse centered at
Image is an equilateral triangle centered at
Image is an ellipse centered at
We simulated each case
If Metropolis-Hastings sampling is used instead of slice sampling, we observe a speed up by about a factor of two for binary images, seven for the Gaussian-noised image. Slice sampling is guaranteed to produce unique posterior samples, and may give better results than Metropolis-Hastings samplers, especially when Metropolis-Hastings mixes poorly and produces many repeated samples. However, slice sampling may involve a very large number of proposed samples for each accepted sample, requiring many likelihood evaluations. On the other hand, each Metropolis Hastings sample requires only two evaluations of the likelihood.
To measure the accuracy of BayesBD we use three metrics: the Lebesgue
error, which is simply the area of the symmetric difference between the
posterior mean boundary and the true boundary; the Dice Similarity
Coefficient(DSC), see Feng and Tierney (2015); and Hausdorff distance, see Thompson (2014) and
Thompson (2017). We have included the utility functions lebesgueError
,
dsmError
, and hausdorffError
, which take as input the output of
either fitBinImage
or fitContImage
and output the corresponding
error. In the binary image examples considered, the different sampling
algorithms did not affect the accuracy of the posterior mean boundary
estimates when measured by Lebesgue error, i.e., the area of the
symmetric difference between the estimated boundary and the true
boundary. For the Gaussian-noised image, the slice sampling method
produced Lebesgue errors approximately an order of magnitude smaller
than when using Metropolis-Hastings sampling, but the overall size of
the errors was still small in both cases and practically
indistinguishable when plotting results. With our built-in functions, it
is easy to reproduce Example S2 in Table 3 with the
following code:
> gamma.fun = triangle2(0.5)
> for(i in 1:100){
+ norm.obs = par2obs(m = 100, pi.in = 0.5, pi.out = 0.2, design = 'J',
+ center = c(0.5, 0.5), gamma.fun)
+ norm.samp.MH = fitBinImage(norm.obs, gamma.fun, NULL, NULL, 4000, 1000,
+ 10, "I", rep(1, 10000), FALSE, FALSE)
+ norm.samp.slice = fitBinImage(norm.obs, gamma.fun, NULL, NULL, 4000,
+ 1000, 10, "I", rep(1, 10000), TRUE, FALSE)
+ print(c(dsmError(norm.samp.MH), hausdorffError(norm.samp.MH),
+ lebesgueError(norm.samp.MH), dsmError(norm.samp.slice),
+ hausdorffError(norm.samp.slice), lebesgueError(norm.samp.slice)))
+ }
Example | Sampling Method | Runtime (s) | Lebesgue Error | DSC | Hausdorff |
---|---|---|---|---|---|
S1 | MH | 58 | 0.01(0.01) | 0.02(0.00) | 0.02(0.00) |
Slice | 100 | 0.01(0.00) | 0.02(0.00) | 0.02(0.00) | |
*S2 | MH | 45 | 0.02(0.00) | 0.09(0.02) | 0.09(0.01) |
Slice | 82 | 0.02(0.00) | 0.09(0.01) | 0.09(0.01) | |
S3 | MH | 66 | 0.01(0.01) | 0.01(0.01) | 0.01(0.01) |
Slice | 488 | 0.00(0.00) | 0.01(0.01) | 0.01(0.01) |
Our use of C++ for posterior sampling has led to very significant efficiency gains over using R alone. Implementation of C++ with R is streamlined using Rcpp and RcppArmadillo.
The first implementation of BayesBD (version 0.1 in (Li 2015)) was entirely written in R. The results in Table 4 labeled R code reflect this first version of the package, while those labeled C++ code are from the new version. The main takeaway is that the new code developed using C++ is at least three times faster in the binary image examples and over six times faster for the Gaussian-noised image example, both measured while using slice sampling.
Example | Coding Method | Runtime (s) |
---|---|---|
S1 | C++ | 100 |
R | 375 | |
S2 | C++ | 82 |
R | 246 | |
S3 | C++ | 488 |
R | 3327 |
The Gaussian process is notorious for scaling poorly as fitBinImage
and
fitContImage
in Figure 4 using a triangular
boundary curve and
fitBinImage
. Right:
Runtimes of fitContImage
Although no packages besides BayesBD provide boundary estimation, there are several existing packages that can provide image segmentation or filtering. Below we make some qualitative comparisons between BayesBD and mritc, bayesImageS, and bayess; see Feng and Tierney (2015), Moores et al. (2017), and Robert and Marin (2015) in a later section. Figure 5 compares these packages using two simulated images with Bernoulli and Gaussian noise, respectively. BayesBD gives very reasonable estimates for the true boundaries; mritc package fails to deliver a recognizable smoothed image in either example; and bayesImageS was able to produce a very clear segmentation for the Gaussian-noised ellipse example, but not for the triangle image with Bernoulli noise.
Chen et al. (2006) studied the performance of two different tracers used in conjunction with positron emission tomography (PET) scans in identifying brain tumors. Figure 6, reproduced from (Chen et al. 2006), gives an example of the image data used in diagnosing tumors, and demonstrates their conclusion that the F-FDOPA tracer provides a more informative PET scan than the F-FDA tracer. We use the BayesBD package to analyze the F-FDOPA PET scan images in Figure 6. The tumor imaging data along with sample code for reproducing the following analysis can be found in the documentation to the BayesBD package.
We convert the two F-FDOPA PET images in Figure 6 into
|
|
A. Glioblastoma | B. Grade II oligodendroglioma |
Figure 7 displays posterior mean boundary estimates for the F-FDOPA images in Figure 6. From the analysis on glioblastoma (A) in the first two plots, it seems that that we accurately capture the regions of interest in the F-FDOPA PET images. Furthermore, it is expected that the Gaussian assumption on the real data may fail, and this shows that the method implemented in BayesBD is robust to model misspecifications, thus practically useful.
Tumor heterogeneity, which is not unusual in many applications, may make
the boundary detection problem more
challenging (Heppner 1984; Ananda and Thomas 2012). The BayesBD package
allows us to address tumor heterogeneity by a repeated implementation.
We first apply fitContImage
to the entire image, which includes a
white background not of interest, and produce the estimated boundary.
This step succeeds in separating the brain scan from the white
background. A second run is performed on the subset of the image inside
the outer 95% uniform credible band, producing a nested boundary. In
general, this technique can be used in a multiple region setting where
the data displays more heterogeneity than the simple "image and
background" setup in (1).
We compared the performance of BayesBD with the R packages mritc and bayess using an image of Lake Menteith available in the bayess package. BayesBD gives a very reasonable estimate for the boundary of the lake even though it is not smooth. The mritc package again does not provide useful output in this example, but bayess produces a nicely-segmented image; see Figure 8.
BayesBD is a new computational platform for image boundary analysis
with many advantages over existing software. The underlying methods in
functions fitBinImage
and fitContImage
are based on theoretical
results ensuring their dependability over a range of problems. Our use
of Rcpp and RcppArmadillo help make BayesBD much faster than base
R code and further speed can be gained by our flexible sampling
algorithms. Finally, our integration with shiny provides users with an
easy way to utilize our package without having to write R code.
For the latest updates to BayesBD and requests, readers are recommended to check out the package page at CRAN or refer to the Github page at https://github.com/nasyring/GSOC-BayesBD.
This work was partially supported by the Google Summer of Code program. We thank the Associate Editor and one anonymous referee for comprehensive and constructive comments that helped to improve the paper.
NumericalMathematics, WebTechnologies
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Syring & Li, "BayesBD: An R Package for Bayesian Inference on Image Boundaries", The R Journal, 2017
BibTeX citation
@article{RJ-2017-052, author = {Syring, Nicholas and Li, Meng}, title = {BayesBD: An R Package for Bayesian Inference on Image Boundaries}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {149-162} }