This paper describes an R package LeArEst that can be used for
estimating object dimensions from a noisy image. The package is based
on a simple parametric model for data that are drawn from uniform
distribution contaminated by an additive error. Our package is able to
estimate the length of the object of interest on a given straight line
that intersects it, as well as to estimate the object area when it is
elliptically shaped. The input data may be a numerical vector or an
image in JPEG format. In this paper, background statistical models and
methods for the package are summarized, and the algorithms and key
functions implemented are described. Also, examples that demonstrate
its usage are provided.
Availability: LeArEst is available on CRAN.
Image noise may arise by the physical processes of imaging, or it can be caused by the presence of some unwanted structures (e.g., soft tissue captured in X-ray images of bones). Such problems can occur, for example, when the object is observed with a fluorescent microscope (Ruzin et al. 1999), ground penetrating radar, medical equipment (X-ray, ultrasound), etc. With the presence of additive noise, the detection of the object edge as well as determining length or area of the object becomes a non-trivial problem. The well known edge detection methods (Canny 1986; Qiu 2005) generally do not perform well.
Our approach does not use the mentioned edge detection methods, but
looks at the problem in a different way. We start with a simple
univariate model where the data represent independent realizations of a
random variable
Different aspects of this model are developed in Benšić and Sabo (2007b), Benšić and Sabo (2007a), Benšić and Sabo (2010), Benšić and Sabo (2016), Sabo and Benšić (2009), and Schneeweiss (2004). The basic one-dimensional model is described in Section 2 together with the results that are used for statistical inference incorporated in the package. Although this model is not universal in all applications, we find it useful in some cases.
With the assumption that the observed object has a circular or elliptical shape, a two-dimensional approach has been developed, dealing with an object area estimation problem (Benšić and Sabo 2007a; Sabo and Benšić 2009). This approach utilizes many border estimations and performs parametric curve fitting on its results.
The package LeArEst (Bensic et al. 2017) uses these methods for length and area estimation of an object captured with noise. It supports numerical inputs, which is useful if a machine that records an object stores numerical data (coordinates of recorded points). However, if an object is captured in a picture file, the package includes a web interface with which one can load a picture, specify a line that intersects the object, adjust the parameters, and perform an edge detection on the drawn line. Another web interface allows the user to draw a rectangle around the object and perform area estimation of the marked object. Description of functions dealing with numerical and graphical estimations and examples of their use are given in Section 3.
The basic model we deal with in this package is an additive error model
Our model is the special case of a general additive error model
For our purpose (e.g., estimating borders of some object from a noisy
image), we find that the model with
Let
Let
Some flexibility of our model is achieved by changing the error
distribution. For now, three types of error distributions are available
in the package. The normal distribution
Two procedures for deriving confidence intervals for
These two approaches can be used to test hypotheses regarding the
parameter
The package LeArEst depends on the following packages that should installed in addition to the LeArEst package: conicfit (Gama and Chernov 2015), jpeg (Urbanek 2014), and opencpu (Ooms 2014). The stable version of the package is available on the Comprehensive R Archive Network repository (CRAN; https://CRAN.R-project.org/) and can be downloaded and installed by issuing the following command at the R console:
> install.packages("LeArEst")
The package is loaded using the following command:
> library(LeArEst)
An overview of the package’s functions is given in Table 1.
Function | Description |
---|---|
lengthest() |
Performs length estimation from a numerical data set. |
lengthtest() |
Performs one-sided and two-sided tests for uniform |
distribution half-length. | |
areaest() |
Performs area estimation of a numerically described object |
in plane. | |
startweb.esttest() |
Opens default web browser and loads a web page for length |
estimation and testing (the object of interest is shown in | |
an image). | |
startweb.area() |
Opens default web browser and loads a web page for area |
estimation of the object shown in an image. |
The function lengthest
computes the length of an interval which is the
domain of a uniform distribution from data contaminated by an additive
error according to the model described in the previous section. The
function’s arguments and results are given in Table
2.
Arguments | Description |
---|---|
x |
Vector of input data. |
error |
Error distribution. |
var |
Error variance. |
var.est |
Method of error variance estimation. |
conf.level |
Confidence level of the confidence interval. Defaults to |
Results | Description |
radius |
Estimated half-length of the uniform support. |
var.error |
Error variance, estimated or explicitly given by argument var . |
conf.int |
Confidence interval for half-length of the uniform support. |
method |
Method used for computing a confidence interval |
(asymptotic distribution of ML or likelihood ratio statistic). |
In order to perform length estimation, a type of the error distribution
must be chosen through the argument error
with three possibilities:
laplace
(Benšić and Sabo 2016), gauss
(Benšić and Sabo 2007b, 2010), or student
(scaled
Student distribution with 5 degrees of freedom).
The variance of the additive error may or may not be known. If the
variance is known, argument var
should be used and the variance should
be assigned to it. In the case of unknown variance, function lengthest
implements two methods for its estimation: Method of Moments and
Maximum Likelihood. Value MM
of the argument var.est
instructs the
functions to use Method of Moments, while the corresponding value ML
triggers Maximum Likelihood Method. There is the possibility, depending
on the input data, that the Method of Moments estimate of error variance
does not exist. When that is the case, the function stops and outputs
the message instructing the user to use Maximum Likelihood estimator or
to give an explicit variance. It is important to mention that arguments
var
and var.est
may not be used simultaneously.
The last argument, conf.level
, specifies the confidence level of the
confidence interval calculated by the function.
The results of this function are the estimated half-length of uniform distribution (i.e., of an object), estimated or explicitly given error variance, confidence interval for half-length (with regard to the given confidence level) and the statistical method for computing a confidence interval.
Usage example. Let us generate a sample of size
set.seed(12)
sample_1 <- runif(1000, -1, 1)
sample_2 <- rnorm(1000, 0, 0.1)
sample <- sample_1 + sample_2
Figure 2 shows density estimation from the
generated data obtained with the R function density
. A half-length
estimation of the uniform support for these data can be done with the
following command:
lengthest(x = sample, error = "gauss", var.est = "MM", conf.level = 0.90)
The most important part of its output is:
$radius
MLE for radius (a) of uniform distr.: 0.9916513
$var.error
MM estimate for error variance: 0.01279636
$method
[1] "Asymptotic distribution of LR statistic"
$conf.int
[1] 0.9724316 1.0116479
Function lengthtest
performs one-sided and two-sided tests against
hypothesized half-length of the uniform support as it is described in
Section 2. Since the actual calculations
inside this function are based on the ML approach most input arguments
are similar to those in the function lengthest
(see Table
3). Argument null.a
is a positive number
representing hypothesized half-length of the uniform support, while
argument alternative
defines the usual forms of alternatives
(two.sided
, greater
, or less
).
Arguments | Description |
---|---|
x |
Vector of input data. |
error |
Error distribution. |
null.a |
Specified null value being tested. |
alternative |
The form of the alternative hypothesis. |
var |
Error variance. |
var.est |
Method of error variance estimation. |
conf.level |
Confidence level of the confidence interval. Defaults to |
Results | Description |
p.value |
p-value of the test. |
tstat |
The value of the test statistic. |
radius |
Estimated half-length of the uniform support. |
var.error |
Error variance, estimated or explicitly given by argument var . |
conf.int |
Confidence interval for half-length. |
method |
Method used for computing a confidence interval |
(asymptotic distribution of ML or likelihood ratio statistic). |
Function lengthtest
also performs length estimation, so all values
from its output, except p.value
and the calculated value of the test
statistic (tstat
), are the same as that of the function lengthest
.
Usage example. Generate the data in a similar manner as in the
lengthest
example:
set.seed(12)
sample_1 <- runif(1000, -1, 1)
sample_2 <- rnorm(1000, 0, 0.1)
sample <- sample_1 + sample_2
To test that the uniform support half-length equals lengthest
can be used in the following
way:
lengthtest(x = sample, error = "gauss", alternative = "less", var.est = "MM",
null.a = 1, conf.level = 0.95)
Part of the output dealing with a testing procedure is:
$p.value
[1] 0.2418929
$tstat
[1] -0.7002265
The input for the function areaest
is supposed to be a data set of
points in the plane representing independent realizations of a
two-dimensional random vector
Arguments | Description |
---|---|
data |
Two-column data matrix containing the points that describe the |
observed object. The first column represents the |
|
while the second column represents its |
|
nrSlices |
Number of slices applied for plain data cutting. Defaults to 10. |
error |
Error distribution. |
var |
Error variance. |
var.est |
Method of error variance estimation. |
plot |
Logical parameter that determines whether to plot data set, calculated |
edge points, and the resulting ellipse. Defaults to FALSE . |
|
Results | Description |
area |
Estimated area of the object. |
points |
Set of estimated object’s edge points. |
semiaxes |
Resulting ellipse’s semi-axes. |
The algorithm implemented in the function areaest
is explained in
detail in (Benšić and Sabo 2007a). The main task in area estimation is to
estimate edge points of the uniform support. In order to achieve this,
the original problem is reduced to several corresponding one-dimensional
problems, which can in turn be solved by function lengthest
.
Let us denote the data set with
areast
transforms this data set in two different ways:
through the
(Transformation through the
Separating through the
Choose an integer
Centering through the
Let us denote
Using this algorithm the data are transformed in the way that we have
sets nrSlices
corresponds to lengthest
(the parameters error
, var
,
and var.est
are used in a lengthest
call in the way described
earlier). After doing so, the algorithm needs to be repeated through the
The next task is to choose one of the well-known curve fitting procedures for parameter estimation. Here we are dealing with a nonlinear parameter estimation problem.
Let us suppose that we have an elliptical domain, i.e.,
On the basis of data obtained so far, the vector of unknown parameters
EllipseDirectFit
function from the conicfit package is used.
This function implemets the algebraic ellipse fit method by
Fitzgibbon-Pilu-Fisher (Fitzgibbon et al. 1999). Having parameters
Usage example. Two internal files are provided with the package:
ellipse_3_4_0.1_gauss.txt
and ellipse_3_4_0.1_laplace.txt
. Both of
them represent an ellipsoidal object with center in point
In order to use one of these files, the data needs to be read into a data frame:
inputfile <- system.file("extdata", "ellipse_3_4_0.1_laplace.txt", package = "LeArEst")
inputdata <- read.table(inputfile)
Area estimation of the uniform support can be done with the command:
areaest(inputdata, error = "laplace", var.est = "ML", nrSlices = 5, plot = TRUE)
In the previous example, the parameter plot
is set to TRUE
, so the
function plots the given input data (black dots), estimated border
points (red dots), and the resulting ellipse (cyan ellipse); see Figure
3.
areaest
The most important parts of the numerical output are:
$area
[1] 9.938305
$semiaxes
[1] 2.048028 1.544638
In order to apply the described methods to a picture of an object, two web interfaces have been built and embedded into the package.
As far as we know, shiny (Chang et al. 2017) provides the simplest way of building web applications using R. However, limitations of its free version discouraged us from using it, so we decided to use the opencpu package. This package provides a reliable and interoperable HTTP API for data analysis based on R. Basically, it provides an interface between functions in R package and a custom-made web page bundled with the package, using JavaScript and AJAX. Building web interfaces using opencpu is more complex than using shiny, but at the same time, it provides more flexibility in application design. It is assumed that developers are familiar with HTTP protocol, HTML, and the JavaScript language, in order to develop such web applications.
Function startweb.esttest
will be described in this section. This
function takes no arguments and returns no results, its task is to start
a web interface for length estimation and hypothesis testing (Figure
4).
To start the analysis using the web interface the picture in JPG format should be loaded (Load Picture button). Then, a line should be drawn that intersects the object of interest by clicking on two points in the picture - length estimation will be performed on that line. Finally, parameters for a data set preparation should be set.
The Levels of gray
parameter determines how many levels of grey the
algorithm should take into account. It is important to mention that,
although color images can be loaded, they are internally converted to
grey-scales prior to any calculations. Since JPG format supports
Line thickness
specifies how many picture pixels around the drawn line
are taken into account in length estimation. For instance, if
Line thickness
is set to Line thickness
pixels –
PixelMatrix is obtained.
By doing so, we have obtained the matrix of (length of the line)
Line thickness
pixels – PixelMatrix.
By clicking on the Prepare data button the data set will be prepared for the inference.
The following step deals with data preparation and is a crucial step of
the algorithm. Each pixel of PixelMatrix is mapped to a new matrix of
Box size
Box size
booleans – DotMatrix (note that
Box size
is a parameter). Further, every DotMatrix is filled with
uniformly distributed dots (i.e., TRUE
values) in a way that total
number of dots in each DotMatrix corresponds to the brightness of the
pixel it represents. Then, DotMatrices are tiled up with respect to
the position of corresponding pixel and, by doing so, a new matrix of
(length of the line Box size
) Line thickness
Box size
) booleans is obtained – FinalDotMatrix. The last
step is to summarize rows of the FinalDotMatrix to obtain a vector of
(length of the line) Box size
integers. The vector’s histogram
is shown on a web interface (Figure 4, at the
bottom) and the vector itself serves as an input to the functions
lengthest
or lengthtest
.
Parameters in the Estimation section of the web interface are
transferred to lengthest
, as well. After the user clicks on Estimate
button, lengthest
is executed, and its results are displayed below the
picture. Additionally, the estimated uniform support is marked red on
the intersecting line.
Estimated length is expressed in width of a pixel and in percentage of whole image’s width as well. As stated in the info box, it is important to use a proportional screen resolution on user’s display, so the pixels on the screen are square-shaped.
The Testing section of this web interface serves for hypothesis
testing. Procedures related to image loading, choosing an intersecting
line, and data preparation are the same as described above. For the
purpose of testing, values for H0
, unit
, and alternative
(greater
, less
, or two-sided
) need to be specified. The part of
the web interface dealing with output of hypothesis testing procedure is
shown in Figure 5.
The function startweb.area
starts a web interface for area estimation
(Figure 6). Again, the first step is to load an image.
To select an object whose surface needs to be evaluated, a rectangle
should be drawn around it. It is done by clicking on its upper-left and
lower-right corners, after which a green rectangle is drawn on the
picture.
Data parameters are similar to ones in the length estimation web
interface, with the exception of number of slices
.
The first step in the area estimation algorithm for this function is to
roughly isolate the object in the selected rectangle. In order to do
that, pixels from the selected rectangle are divided into two clusters
by using the kmeans
function from base-R stats package (the
criterion for clustering is pixel brightness). Further, only pixels from
the ‘object cluster’ are observed and divided into horizontal and
vertical stripes, as described earlier in Algorithm
1. The number of stripes is dictated by the
number of slices
parameter. A length estimation procedure is conducted
on each stripe, obtaining two estimated edge points of the object for
each stripe (red dots in Figure 5). Two parameters in
the Estimation section of the web interface are related to the length
estimation procedure of the stripes.
Finally, an optimal ellipse that fits into edge points is found using
EllipseDirectFit
function from the conicfit package, as well as
in the areaest
function described earlier. The resulting ellipse is
drawn in red in Figure 5. Its area is printed below,
this is measured in pixels and the percentage of the whole image area.
The R package LeArEst provides routines for estimating the support
of the random variable
This work was supported by the Croatian Science Foundation through research grant IP-2016-06-6545. We would like to thank Krunoslav Buljan from Osijek Clinical Hospital Center for providing an image of the carotid artery.
LeArEst, decon, deamer, conicfit, jpeg, opencpu, shiny
ModelDeployment, NumericalMathematics, WebTechnologies
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Benšić, et al., "LeArEst: Length and Area Estimation from Data Measured with Additive Error", The R Journal, 2017
BibTeX citation
@article{RJ-2017-043, author = {Benšić, Mirta and Taler, Petar and Hamedović, Safet and Nyarko, Emmanuel Karlo and Sabo, Kristian}, title = {LeArEst: Length and Area Estimation from Data Measured with Additive Error}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {461-473} }