LeArEst : Length and Area Estimation from Data Measured with Additive Error by

This paper describes an R package LeArEst that can be used for estimating object dimensions from a noisy image. The package is based on a simple parametric model for data that are drawn from uniform distribution contaminated by an additive error. Our package is able to estimate the length of the object of interest on a given straight line that intersects it, as well as to estimate the object area when it is elliptically shaped. The input data may be a numerical vector or an image in JPEG format. In this paper, background statistical models and methods for the package are summarized, and the algorithms and key functions implemented are described. Also, examples that demonstrate its usage are provided. Availability: LeArEst is available on CRAN.


Introduction
Image noise may arise by the physical processes of imaging, or it can be caused by the presence of some unwanted structures (e.g., soft tissue captured in X-ray images of bones).Such problems can occur, for example, when the object is observed with a fluorescent microscope (Ruzin and others, 1999), ground penetrating radar, medical equipment (X-ray, ultrasound), etc.With the presence of additive noise, the detection of the object edge as well as determining length or area of the object becomes a non-trivial problem.The well known edge detection methods (Qiu, 2005;Canny, 1986) generally do not perform well.
Our approach does not use the mentioned edge detection methods, but looks at the problem in a different way.We start with a simple univariate model where the data represent independent realizations of a random variable X, X = U + ε.In the aforementioned equation U is supposed to be uniformly distributed over the object image and describes the object image without an error, while ε represents measurement error.It is shown that such a simple model can also be very useful in applications itself, not only related to the image analysis.For instance, in Tolić et al. (2017) the sum of uniform and normal distributions is confirmed to be the most representative distribution for modelling transmission loss data.
Different aspects of this model are developed in Benšić and Sabo (2007b), Benšić and Sabo (2007a), Benšić and Sabo (2010), Benšić and Sabo (2016), Sabo and Benšić (2009), and Schneeweiss (2004).The basic one-dimensional model is described in Section 2.2 together with the results that are used for statistical inference incorporated in the package.Although this model is not universal in all applications, we find it useful in some cases.
With the assumption that the observed object has a circular or elliptical shape, a two-dimensional approach has been developed, dealing with an object area estimation problem (Benšić and Sabo, 2007a;Sabo and Benšić, 2009).This approach utilizes many border estimations and performs parametric curve fitting on its results.
The package LeArEst (Bensic et al., 2017) uses these methods for length and area estimation of an object captured with noise.It supports numerical inputs, which is useful if a machine that records an object stores numerical data (coordinates of recorded points).However, if an object is captured in a picture file, the package includes a web interface with which one can load a picture, specify a line that intersects the object, adjust the parameters, and perform an edge detection on the drawn line.Another web interface allows the user to draw a rectangle around the object and perform area estimation of the marked object.Description of functions dealing with numerical and graphical estimations and examples of their use are given in Section 2.3.

Basic statistical model
The basic model we deal with in this package is an additive error model The R Journal Vol.9/2, December 2017 ISSN 2073-4859 Here we suppose that the random variable U is uniformly distributed on the interval [−a, a], a > 0, i.e., has a density and ε is an absolutely continuous random variable describing measurement error.Further, we assume that ε is independent of U with zero mean.Instead of the sample U 1 , U 2 , . . ., U n from U, one can only observe the contaminated i.i.d.sample X 1 , X 2 , . . ., X n from X.We are to estimate the unknown parameter a.
Our model is the special case of a general additive error model X = Y + ε, where Y and ε are assumed to be independent continuous random variables, but only X is observable.Estimating the unknown density f Y from an i.i.d.sample X 1 , X 2 , . . ., X n , X 1 ∼ X, is known as the deconvolution problem.Usually, the error part ε is assumed to have a known density f ε .Several nonparametric methods have been developed to estimate f Y (Meister, 2009); the most popular and studied is the deconvolution kernel density estimator (Carroll and Hall, 1988;Stefanski and Carroll, 1990).The packages decon (Wang and Wang, 2013) and deamer (Stirnemann et al., 2012) provide functions for estimating density f Y in a nonparametric way.Two different approaches in estimating the support of a density from a contaminated sample can be seen in (Meister, 2006) and (Delaigle and Gijbels, 2006).One is based on the deconvolving kernel density estimator (Delaigle and Gijbels, 2006) and the other on the moment estimation (Meister, 2006).To our knowledge, none of them is implemented in some R package submitted to the CRAN repository.
For our purpose (e.g., estimating borders of some object from a noisy image), we find that the model with Y ∼ U [−a, a] is useful in some instances.Namely, in many cases we have a relatively high contrast between an object and its background as it is the case in Figure 1.It seems reasonable to assume that the data extracted from the green line in Figure 1 come from a uniform distribution, but contaminated by an additive error.
Let f ε and F ε be the density and distribution function of the error part ε.Then the density of If we suppose that the distribution of ε belongs to a scale family, with scale parameter σ, then (2) may be rewritten as with F(x) being the standard (σ = 1 and zero mean) distribution function.Let x = (x 1 , . . . ,x n ) denote the realization of the i.i.d.sample X = (X 1 , . . . ,X n ).The likelihood function has the form The R Journal Vol.9/2, December 2017 ISSN 2073-4859 and the log-likelihood function is given by If the distribution of ε is symmetric around zero, then the Fisher information is Supposing that parameter σ is known or consistently estimated then, under regularity, we have where a 0 is the true value of a.
Some flexibility of our model is achieved by changing the error distribution.For now, three types of error distributions are available in the package.The normal distribution ε ∼ N 0, σ 2 is sometimes a natural choice.Properties of maximum likelihood (ML) and method of moments (MM) estimators with known σ 2 are given in Benšić and Sabo (2007b).The model with Y ∼ U [0, a] is treated in Schneeweiss (2004).Benšić and Sabo (2010) considered the unknown σ 2 case.The possibility of this (one-dimensional) model in two-dimensional problems is given in Benšić and Sabo (2007a) and Sabo and Benšić (2009).For the sake of robustness Laplace and scaled Student (with 5 degrees of freedom) distributions are also incorporated in the package as a choice of the error distribution.Estimating issues with Laplacian error can be seen in Benšić and Sabo (2016), as well as a discussion connected to robustness.
Two procedures for deriving confidence intervals for a are described in (Hamedović et al., 2017).The first one is based on the asymptotic distribution of ML estimator in (3).For a specified 0 The second method is based on the likelihood ratio statistic From the asymptotic distribution of the log-likelihood ratio statistic −2 log λ(X) where χ 2 1 (1 − α) is the 1 − α quantile of χ 2 1 distribution.These two approaches can be used to test hypotheses regarding the parameter a.For example, in the case of a two-sided hypotheses H 0 : a = a 0 versus H 1 : a = a 0 , the critical regions of asymptotic size α are , respectively.Note that both methods are asymptotically equivalent.

Overview of the package
The package LeArEst depends on the following packages that should installed in addition to the LeArEst package: conicfit (Gama and Chernov, 2015), jpeg (Urbanek, 2014), and opencpu (Ooms, Function Description

lengthest()
Performs length estimation from a numerical data set.

lengthtest()
Performs one-sided and two-sided tests for uniform distribution half-length.

areaest()
Performs area estimation of a numerically described object in plane.startweb.esttest()Opens default web browser and loads a web page for length estimation and testing (the object of interest is shown in an image).

startweb.area()
Opens default web browser and loads a web page for area estimation of the object shown in an image.

var.error
Error variance, estimated or explicitly given by argument var.

conf.int
Confidence interval for half-length of the uniform support.method Method used for computing a confidence interval (asymptotic distribution of ML or likelihood ratio statistic).> install.packages("LeArEst") The package is loaded using the following command: > library(LeArEst) An overview of the package's functions is given in Table 1.

Length estimation -a numerical data set
The function lengthest computes the length of an interval which is the domain of a uniform distribution from data contaminated by an additive error according to the model described in the previous section.The function's arguments and results are given in Table 2.
In order to perform length estimation, a type of the error distribution must be chosen through the argument error with three possibilities: 'laplace' (Benšić and Sabo, 2016), 'gauss' (Benšić and Sabo, 2007bSabo, , 2010)), or 'student' (scaled Student distribution with 5 degrees of freedom).
The variance of the additive error may or may not be known.If the variance is known, argument var should be used and the variance should be assigned to it.In the case of unknown variance, function lengthest implements two methods for its estimation: Method of Moments and Maximum Likelihood.Value 'MM' of the argument var.est instructs the functions to use Method of Moments, while the corresponding value 'ML' triggers Maximum Likelihood Method.There is the possibility, depending on the input data, that the Method of Moments estimate of error variance does not exist.When that is the case, the function stops and outputs the message instructing the user to use Maximum Likelihood estimator or to give an explicit variance.It is important to mention that arguments var and var.est may not be used simultaneously.
The R Journal Vol.9/2, December 2017 ISSN 2073-4859 The results of this function are the estimated half-length of uniform distribution (i.e., of an object), estimated or explicitly given error variance, confidence interval for half-length (with regard to the given confidence level) and the statistical method for computing a confidence interval.

Testing hypothesis -a numerical data set
Function lengthtest performs one-sided and two-sided tests against hypothesized half-length of the uniform support as it is described in Section 2.2.Since the actual calculations inside this function are based on the ML approach most input arguments are similar to those in the function lengthest (see Table 3).Argument null.a is a positive number representing hypothesized half-length of the uniform support, while argument alternative defines the usual forms of alternatives ('two.sided','greater', or 'less').
Function lengthtest also performs length estimation, so all values from its output, except p.value and the calculated value of the test statistic (tstat), are the same as that of the function lengthest.

var.error
Error variance, estimated or explicitly given by argument var.

Area estimation -a numerical data set
The input for the function areaest is supposed to be a data set of points in the plane representing independent realizations of a two-dimensional random vector It is assumed that U has a uniform distribution on an ellipsoid and ε is a two-dimensional error term independent of U. Arguments and results of this function are listed in Table 4.
The algorithm implemented in the function areaest is explained in detail in Benšić and Sabo (2007a).The main task in area estimation is to estimate edge points of the uniform support.In order to achieve this, the original problem is reduced to several corresponding one-dimensional problems, which can in turn be solved by function lengthest.
Let us denote the data set with D = {(x i , y i ) , i = 1, . . ., n}.The function areast transforms this data set in two different ways: through the y-axis and through the x-axis.The algorithm for transformation through the y-axis is presented below, while the transformation through the x-axis is done analogously.ALGORITHM 1 (Transformation through the y-axis (Benšić and Sabo, 2007a)) Step 1. Separating through the y-axis.
Choose an integer m < n and real numbers Step 2. Centering through the y-axis.

Let us denote
Using this algorithm the data are transformed in the way that we have sets C k , k = 1, . . ., m − 1 that represent centered tiny strips.Argument nrSlices corresponds to m − 1 and specifies the number of strips.The lengths of these strips (in x-direction) can be estimated using the function lengthest (the parameters error, var, and var.est are used in a lengthest call in the way described earlier).After doing so, the algorithm needs to be repeated through the x-axis.Finally, at this point of the procedure, the data that is a noisy version of points from the curve is created -it represents estimated points from the border of the object.
The next task is to choose one of the well-known curve fitting procedures for parameter estimation.
Here we are dealing with a nonlinear parameter estimation problem.
Let us suppose that we have an elliptical domain, i.e., On the basis of data obtained so far, the vector of unknown parameters p needs to be estimated, and by doing so, the optimal ellipse that fits into our points is to be defined.For this purpose EllipseDirectFit function from the conicfit package is used.This function implemets the algebraic ellipse fit method by Fitzgibbon-Pilu-Fisher (Fitzgibbon et al., 1999).Having parameters p, it is a trivial task to calculate the area of the ellipse that approximates the observed object.
Usage example.Two internal files are provided with the package: 'ellipse_3_4_0.1_gauss.txt'and 'ellipse_3_4_0.1_laplace.txt'.Both of them represent an ellipsoidal object with center in point (1, 1), half-axes 1.5 and 2, with added measurement error.In the first file, the error distribution is a twodimensional normal with independent margins and variance 0.01, while in the other it is Laplacian (λ = 0.1) in both directions, again with independent margins.
In order to use one of these files, the data needs to be read into a data frame: inputfile <-system.file("extdata","ellipse_3_4_0.1_laplace.txt",package = "LeArEst") inputdata <-read.table(inputfile) Area estimation of the uniform support can be done with the command: areaest(inputdata, error = "laplace", var.est = "ML", nrSlices = 5, plot = TRUE) The R Journal Vol.9/2, December 2017 ISSN 2073-4859 In the previous example, the parameter plot is set to TRUE, so the function plots the given input data (black dots), estimated border points (red dots), and the resulting ellipse (cyan ellipse); see Figure 3.

Length estimation and testing for an object shown in a picture
In order to apply the described methods to a picture of an object, two web interfaces have been built and embedded into the package.
As far as we know, shiny (Chang et al., 2017) provides the simplest way of building web applications using R.However, limitations of its free version discouraged us from using it, so we decided to use the opencpu package.This package provides a reliable and interoperable HTTP API for data analysis based on R. Basically, it provides an interface between functions in R package and a custommade web page bundled with the package, using JavaScript and AJAX.Building web interfaces using opencpu is more complex than using shiny, but at the same time, it provides more flexibility in application design.It is assumed that developers are familiar with HTTP protocol, HTML, and the JavaScript language, in order to develop such web applications.Function startweb.esttestwill be described in this section.This function takes no arguments and returns no results, its task is to start a web interface for length estimation and hypothesis testing (Figure 4).
To start the analysis using the web interface the picture in JPG format should be loaded (Load Picture button).Then, a line should be drawn that intersects the object of interest by clicking on two points in the picture -length estimation will be performed on that line.Finally, parameters for a data set preparation should be set.
The Levels of gray parameter determines how many levels of grey the algorithm should take into account.It is important to mention that, although color images can be loaded, they are internally converted to grey-scales prior to any calculations.Since JPG format supports 2 24 different colors, the number of possible colors should be reduced in order to optimize estimation speed and memory consumption.
Line thickness specifies how many picture pixels around the drawn line are taken into account in length estimation.For instance, if Line thickness is set to 3, the algorithm takes pixels which are The R Journal Vol.9/2, December 2017 ISSN 2073-4859 direct left neighbours of the line, pixels on the line itself, and the ones which are direct right neighbours of the line.By doing so, the matrix of (length of the line) × Line thickness pixels -PixelMatrix is obtained.
By doing so, we have obtained the matrix of (length of the line) × Line thickness pixels -PixelMatrix.
By clicking on the Prepare data button the data set will be prepared for the inference.
The following step deals with data preparation and is a crucial step of the algorithm.Each pixel of PixelMatrix is mapped to a new matrix of Box size × Box size booleans -DotMatrix (note that Box size is a parameter).Further, every DotMatrix is filled with uniformly distributed dots (i.e., TRUE The R Journal Vol.9/2, December 2017 ISSN 2073-4859 Parameters in the Estimation section of the web interface are transferred to lengthest, as well.After the user clicks on Estimate button, lengthest is executed, and its results are displayed below the picture.Additionally, the estimated uniform support is marked red on the intersecting line.
Estimated length is expressed in width of a pixel and in percentage of whole image's width as well.As stated in the info box, it is important to use a proportional screen resolution on user's display, so the pixels on the screen are square-shaped.
The Testing section of this web interface serves for hypothesis testing.Procedures related to image loading, choosing an intersecting line, and data preparation are the same as described above.For the purpose of testing, values for H0, unit, and alternative ('greater', 'less', or 'two-sided') need to be specified.The part of the web interface dealing with output of hypothesis testing procedure is shown in Figure 5.

Area estimation of an object shown in a picture
The function startweb.areastarts a web interface for area estimation (Figure 6).Again, the first step is to load an image.To select an object whose surface needs to be evaluated, a rectangle should be drawn around it.It is done by clicking on its upper-left and lower-right corners, after which a green rectangle is drawn on the picture.Data parameters are similar to ones in the length estimation web interface, with the exception of number of slices.
The first step in the area estimation algorithm for this function is to roughly isolate the object in the selected rectangle.In order to do that, pixels from the selected rectangle are divided into two clusters by using the kmeans function from base-R stats package (the criterion for clustering is pixel brightness).Further, only pixels from the 'object cluster' are observed and divided into horizontal and vertical stripes, as described earlier in Algorithm 2.3.3.The number of stripes is dictated by the number of slices parameter.A length estimation procedure is conducted on each stripe, obtaining two estimated edge points of the object for each stripe (red dots in Figure 5).Two parameters in the Estimation section of the web interface are related to the length estimation procedure of the stripes.
Finally, an optimal ellipse that fits into edge points is found using EllipseDirectFit function from the conicfit package, as well as in the areaest function described earlier.The resulting ellipse is drawn in red in Figure 5.Its area is printed below, this is measured in pixels and the percentage of the whole image area.

Concluding remarks
The R package LeArEst provides routines for estimating the support of the random variable U, U ∼ U [−a, a], based on a sample from X = U + ε.The random variable ε represents additive measurement error and is supposed to have either a normal, Laplace, or scaled Student distribution with 5 degrees of freedom.The package also includes functions for estimating either the borders or the area of some object from a noisy image.The package may be useful for this purpose mainly in the case of images with reasonable contrast between the object of interest and background.For greater robustness, we find it convenient to use some error distributions with heavier tails.Sometimes we have different amounts of noise in the tails, so it would be useful to include some asymmetric error distributions as well.These are some features we are going to add in the package in order to improve flexibility of our The R Journal Vol.9/2, December 2017 ISSN 2073-4859

Figure 1 :
Figure 1: Line intersecting the object and histogram of recorded points for statistical inferences

Figure 2 :
Figure 2: Estimation of the density function

Figure 3 :
Figure 3: Data points, estimated border points, and the resulting ellipse obtained by the function areaest

Figure 4 :
Figure 4: Web interface for length estimation and hypothesis testing.Loaded image shows arterial wall of the carotid artery; we are trying to estimate its intima media thickness (darker layer below artery cavity).

Figure 5 :
Figure 5: Web interface for length estimation and hypothesis testing -hypothesis testing output

Figure 6 :
Figure 6: Web interface for area estimation showing an MRI scan detail (taken from Bankman (2008))

Table 1 :
Overview of LeArEst functions Method of error variance estimation.conf.levelConfidence level of the confidence interval.Defaults to 0.95.
radiusEstimated half-length of the uniform support.

Table 2 :
Summary of arguments and results of lengthest 2014).The stable version of the package is available on the Comprehensive R Archive Network repository (CRAN; https://CRAN.R-project.org/) and can be downloaded and installed by issuing the following command at the R console:

Table 3 :
Summary of arguments and results of lengthtest Usage example.Generate the data in a similar manner as in the lengthest example: Two-column data matrix containing the points that describe the observed object.The first column represents the x coordinate of the point, while the second column represents its y coordinate.nrSlices Number of slices applied for plain data cutting.Defaults to 10.

Table 4 :
Summary of arguments and results of areaest