rcosmo: R Package for Analysis of Spherical, HEALPix and Cosmological Data

The analysis of spatial observations on a sphere is important in areas such as geosciences, physics and embryo research, just to name a few. The purpose of the package rcosmo is to conduct efficient information processing, visualisation, manipulation and spatial statistical analysis of Cosmic Microwave Background (CMB) radiation and other spherical data. The package was developed for spherical data stored in the Hierarchical Equal Area isoLatitude Pixelation (Healpix) representation. rcosmo has more than 100 different functions. Most of them initially were developed for CMB, but also can be used for other spherical data as rcosmo contains tools for transforming spherical data in cartesian and geographic coordinates into the HEALPix representation. We give a general description of the package and illustrate some important functionalities and benchmarks.


Introduction
Directional statistics deals with data observed at a set of spatial directions, which are usually positioned on the surface of the unit sphere or star-shaped random particles. Spherical methods are important research tools in geospatial, biological, palaeomagnetic and astrostatistical analysis, just to name a few. The books (Fisher et al., 1987;Mardia and Jupp, 2009) provide comprehensive overviews of classical practical spherical statistical methods. Various stochastic and statistical inference modelling issues are covered in (Yadrenko, 1983;Marinucci and Peccati, 2011).
The CRAN Task View Spatial shows several packages for Earth-referenced data mapping and analysis. All currently available R packages for spherical data can be classified in three broad groups.
The first group provides various functions for working with geographic and spherical coordinate systems and their visualizations. Probably the most commonly used R package to represent spatial maps and data is sp (Bivand et al., 2013). It includes tools for spatial selection, referencing and plotting spatial data as maps. It has comprehensive hierarchical classes and methods for spatial 2d and 3d data. Another example, sphereplot (Robotham, 2013), uses rgl (Adler et al., 2018) to create 3d visualizations in the spherical coordinate system. The functions car2sph and sph2car implement transformations between Cartesian and spherical coordinates. The package geosphere (Hijmans, 2017) includes functions for computing distances, directions and areas for geographic coordinates.
The second group covers various numerical procedures that can be useful for spherical approximations and computations. For example, SpherWave (Fernández-Durán and Gregorio-Domíinguez, 2016) is developed to implement the spherical wavelets and conduct the multiresolution analysis on the sphere. Functions for numerical integration over high-dimensional spheres and balls are provided in the package SphericalCubature (Nolan and University, 2017).
The third group provides statistical tools for spherical data analysis. In this group, the most commonly used packages are RandomFields (Schlather et al., 2015) and geoR (Ribeiro Jr and Diggle, 2018). These packages provide a number of tools for model estimation, spatial inference, simulation, kriging, spectral and covariance analyses. Most of their underlying models are for 2d or 3d data, but some additional spherical models are listed for future developments. The package Directional (Tsagris et al., 2019) has functions for von Mises-Fisher kernel density estimation, discriminant and regression analysis on the sphere. The package gensphere (Nolan, 2017) implements multivariate spherical distributions. CircNNTSR (Fernández-Durán and Gregorio-Domíinguez, 2016) provides functions for statistical analysis of spherical data by means of non-negative trigonometric sums. The package VecStatGraphs3D (Felicísimo et al., 2014) conducts statistical analysis on 3d vectors. It includes estimation of parameters and some spherical test. Another example is the package sm (Bowman and Azzalini, 2018) for spherical regression analysis and non-parametric density estimation.
There are also several R packages developed for spherical data in astronomy 1 . For example, cosmoFns (Harris, 2012) and CRAC (Liu, 2014) implement functions to compute spherical geometric quantities useful for cosmological research. The package FITSio (Harris, 2016) reads and writes files in one of standard astronomical formats, FITS (Flexible Image Transport System). spider (Brown et al., 2012) includes functions for all-sky grid/scatter plots under various astronomical coordinate systems (equatorial, ecliptic, galactic). The package astro (Kelvin, 2014) provides functions for basic 1 see the list in https://asaip.psu.edu/forums/software-forum/459833927 cosmological statistics and FITS file manipulations.
In recent years the spatial analysis and theory of spherical data have been strongly motivated by the studies on the Cosmic Microwave Background (CMB) radiation data collected by NASA and European Space Agency missions COBE, WMAP and Planck and usually stored in the Hierarchical Equal Area isoLatitude Pixelation (HEALPix) format. Cosmologists have developed comprehensive Python and MATLAB software packages 2 to work with CMB and HEALPix data.
Although the mentioned before R packages can be used for geographic or spherical coordinate referenced data, comprehensive and easy to use tools for CMB and HEALPix data are not available in R, which motivated the authors to design the package rcosmo (Fryer et al., 2020).
The aims of the package rcosmo are • to give convenient access and integrated in one package tools for analysis and visualisation of CMB and HEALPix data to the R statistical community; • to develop R functions for models and methods in spherical statistics that are not available in the existing R packages; • to extend familiar R classes to spherical HEALPix data making them cross-compatible and intuitively interactable with many existing R statistical packages.
The HEALPix format has numerous advantages to classical geographic representations of spherical data, see (Gorski et al., 2005). R implementation of computationally expensive statistical and geometrical methods, such as nearest neighbour searches, empirical covariance function estimation, uniform sampling, spectral density estimation, in a way that takes advantage of the HEALPix data structure, could be useful for geostatistics and other applications. It can reduce algorithmic complexity and computational time. Various data processing, manipulation, visualisation and statistical analysis tasks are achieved efficiently in rcosmo , using optimised C++ code where necessary.

Basics of CMB data
In the Standard Cosmological Model, the Cosmic Microwave Background is redshifted microwave frequency light that is believed to have originated around 14 billion years ago. CMB is the main source of data about the early universe and seeds of future galaxies. Bell Labs physicists Arno Penzias and Robert Wilson received the Nobel prize in physics in 1978 for their famously happenstance discovery of CMB radiation as an inconvenient background "noise" during their experimentation with the Holmdel Horn Antenna radio telescope (Durrer, 2015).
Over a decade later, NASA's Cosmic Microwave Background Explorer (COBE) satellite mission produced the first detailed full CMB sky map (Efstathiou et al., 1992). Referred to as the dawn of precision cosmology, COBE results provided fine constraints on many cosmological parameters. Particular attention has been paid to the existence of CMB anisotropies and associated non-Gaussianity, usually investigated through the CMB angular power spectrum (Durrer, 2015). The Wilkinson Microwave Anisotropy Probe, was launched in 2001 by NASA and returned more precise measurements of CMB (Bennett et al., 2003). Then, the third and most detailed space mission to date was conducted by the European Space Agency, via the Planck Surveyor satellite (Adam et al., 2016). The radiation that astronomers detect today forms an expanding spherical surface of radius approximately 46.5 billion light years. The next generation of CMB experiments, CMB-S4, LiteBIRD, and CORE, will consist of highly sensitive telescopes. It is expected that these experiments will provide enormous amount of CMB measurements and maps to nearly the cosmic variance limit.
The term "CMB data" refers to a broad range of location tagged quantities describing properties of the CMB. For example, the Infrared Science Archive by Caltech's Infrared Processing and Analysis Center (IPAC) hosts curated CMB products from the North American Space Agency (NASA) 3 .
To produce CMB maps (see Figure 5), the products of the Planck mission data (in the range of frequencies from 30 to 857 GHz) are separated from foreground noise using one of the four detailed methods named COMMANDER, NILC, SEVEM and SMICA 4 . These CMB maps are provided at either low resolution (N side = 1024, i.e., 10 arcmin resolution), or high resolution (N side = 2048, i.e., 5 arcmin resolution). The maps include temperature intensity and polarisation data, as well as common masks for identifying regions where the reconstructed CMB is untrusted.
Our focus will mostly be on the CMB temperature intensity data. In Planck CMB products, these data are stored as 4-byte floating point binary numbers in K cmb defined as the unit in which a black body spectrum at 2.725 Kelvin (K) is flat with respect to the frequency (Hurier et al., 2015).
The map of the CMB temperature is usually modelled as a realization of an isotropic Gaussian random field on the unit sphere. The Appendix introduces a statistical model and basic notations of spherical random fields. More details can be found, for example, in the monographs (Marinucci and Peccati, 2011), (Yadrenko, 1983) and the paper (Leonenko et al., 2018).

rcosmo package
The current version of the rcosmo package can be installed from CRAN. A development release is available from GitHub (https://github.com/frycast/rcosmo).
The package offers various tools for • Handling and manipulating of CMB radiation and other spherical data, • Working with Hierarchical Equal Area isoLatitude Pixelation of a sphere (Healpix), • Spherical geometry, • Various statistical analysis of CMB and spherical data, • Visualisation of HEALPix data.
Most of rcosmo features were developed for CMB, but it can also be used for other spherical data.
The package has more than 100 different functions. Figure 1 shows the core functions available in rcosmo and some typical data analysis flow sequences.  Figure 1: The flowchart of rcosmo main structure and core functions for 3 typical inputs: "data.frame" (general input), "downloadCMBMap" (conventional HEALPix format), and "downloadCMBPS" (power spectrum data).
Rather than attempting a systematic description of each functions, the remainder of this paper shows broad classes of methods implemented in rcosmo with particular examples of core functions. A reproducible version of the code in this paper for the current version of rcosmo is available in the folder "Research materials" from the website https://sites.google.com/site/olenkoandriy/.

Visualisation tools
Interactive visualisations of spherical data are a focus point of rcosmo . Standard Python and MATLAB tools for CMB and HEALPix visualization use the Mollweide projection of the unit sphere to the 2d plane. This is an equal-area projection, but it distorts spherical angles and distances. In contrast rcosmo employs the 'OpenGL' powered 3d visualization device system rgl for R to allow 3d interactive plots of data on the unit sphere.
The generic plot function produces interactive 3d vector graphics that may be easily exported to a HTML document. Some examples of using different plot functions are provided later in the paper. The results of plot.CMBDataFrame are given in Figures 5 and 6. For an example of using plot.HPDataFrame see Figure 15 and examples of using plot.CMBWindow are shown in Figures 5 and 7. For better visualization some figures produced by the R code in this paper were rotated and zoomed in before including in the article.
By default, the Planck colour scale is applied to CMB intensity data for plotting 5 . Additional features such as automatic plot legends, alternative colour scales, and greater configurability are planned for future releases. In addition, rcosmo provides a variety of 2d plot functionality to support visualisation of statistical analysis results and some additional 3d plot functionality for demonstrating HEALPix pixel properties.

rcosmo classes
Four R classes have been developed to support HEALPix data representation and analysis in the package rcosmo : "CMBDat", "CMBDataFrame", "HPDataFrame" and "CMBWindow". First three are main parent classes of objects to store spherical data. The class "CMBWindow" is used to choose observation windows.
• The function CMBDat creates objects of class "CMBDat". CMBDat objects are lists containing header information, other metadata and a data slot. Data slots may include standard information about CMB intensity (I), polarisation (Q, U), PMASK and TMASK. It also may contain a mmap object that points to the CMB map data table in a FITS file. As for standard data frames new data slots can be created to store other types of spherical data.
• The function CMBDataFrame creates objects of class "CMBDataFrame". These class is a special modification of "data.frame" that also carries metadata about, e.g., the HEALPix ordering scheme, coordinate system, and nside parameter (i.e., the resolution of the HEALPix grid that is being used). Each row of a CMBDataFrame is associated with a unique HEALPix pixel index.
• The class "HPDataFrames" is a type of "data.frame" designed to carry data located on the unit sphere. Unlike "CMBDataFrame", "HPDataFrames" may have repeated pixel indices. It allows to store multiple data points falling within a given pixel in different rows of HPDataFrame objects.
• The function CMBWindow creates objects of class "CMBWindow". These objects are polygons, spherical discs, or their compliments, unions and intersections.
As the main rcosmo data classes are special modifications of "data.frame" it means that spatial objects produced by rcosmo can be subsequently processed by other R packages/functions that work with standard data frames. The rbind and cbind generics that work with the "data.frame" class have been customised in rcosmo to preserve the validity of CMBDataFrame objects.

Getting data into rcosmo
In this section, we demonstrate how to import CMB data in the typical case of a full sky map stored as a FITS file. Such maps can be sourced from the NASA/IPAC Infrared Science Archive 6 .
The function downloadCMBMap can be used to download Planck CMB maps. One can specify the type of map ('COMMANDER', 'NILC', 'SEVEM' or 'SMICA'), the resolution (N side = 1024 or 2048), and save it in a working directory with a specified file name.
The map 'COM_CMB_IQU-smica_1024_R2.02_full.fits' used in most of the examples in this paper is a FITS file of approximate size 200 megabytes. This map has the resolution N side = 1024, so it contains N pix = 12 × 1024 2 = 12582912 pixels, each having its own intensity I, polarisation Q, U, temperature mask value T mask ∈ {0, 1} and polarisation mask value P mask ∈ {0, 1}.

Use of memory mapping
The standard library for reading data from FITS files is a collection of C and FORTRAN subroutines called 'CFITSIO'. In R, the package FITSio (Harris, 2016) is the only general FITS file reader that the authors are aware of. However, importing a full sky CMB map with approximately 12 million intensity samples from a FITS file using FITSio took too long when development of rcosmo began. We were able to reduce the necessary run time to under 4 seconds with the rcosmo function CMBDat. rcosmo still substantially outperforms the last version of FITSio. In the following example we used the CMB map with 'SMICA' foreground separation and N side = 2048 (having approximately 50 million intensity samples) to test it on a modern laptop 7 . > filename1 <-"CMB_map_smica2048.fits" > downloadCMBMap(foreground = "smica", nside = 2048, filename1) > system.time(sky <-CMBDataFrame(filename1)) user system elapsed 1.36 0.29 1.73 > system.time(fits <-FITSio::readFITS(filename1)) user system elapsed 822.28 90.05 942.14 The approach used in rcosmo is based on a novel application of the mmap package by Jeffrey Ryan (Ryan, 2018). The package mmap is a highly optimised interface to 'POSIX mmap' and Windows 'MapViewOfFile'. Using mmap in rcosmo required an update of mmap to support big-Endian byte order. The current version of mmap allows us to import data from a FITS binary table very efficiently, one row at a time, using a C struct data type. Ideally, for a typical rcosmo user, the details of using mmap are abstracted away while the user constructs and interacts with "CMBDataFrame" objects.
Another use of mmap in rcosmo concerns the elimination of the need to read a large full sky CMB map into R memory. Often it is unmanageable to read the entire contents of a FITS file. For example, it may not be possible to obtain sufficiently large blocks of continguous memory from the operating system when importing more than a few hundred megabytes of data as a numeric vector. In rcosmo , integration of mmap allows one to maintain a C-style pointer at a particular byte-offset to the target binary file (e.g., the FITS file), so that data can be read into R memory on command from this offset. In the following example, only rows 1, 2, 4, 7 and 11 are read into memory from the file.
> v <-c(1,2,4,7,11) > sky <-CMBDataFrame(filename, spix = v, include.p = T, include.m = T, coords = "spherical") > sky A CMBDataFrame # A tibble: 5 x 7 theta phi I Q U TMASK PMASK <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 1.57 0.785 -0.0000920 6.47e-8 -0. The next section discusses the HEALPix data structure. HEALPix ordering schemes can be used to map coordinates on the sphere to row indices in a FITS binary table. Combining this feature with the technique in the above example allows rcosmo to efficiently import random samples of data from a variety of geometric regions of the sphere without ever having to import the entire CMB map. This is particularly useful on larger maps and will become increasingly important in future as advances in cosmology allow for higher resolution CMB maps to be produced.

Introduction to HEALPix
Present generation Cosmic Microwave Background experiments produce data with up to 5 arcminutes resolution on the sphere. For a full-sky map, this amounts to approximately 50 million pixels, each describing distinct location, intensity, polarisation and other attributes. The statistical analysis of such massive datasets, and associated discretisation of functions on the sphere, can involve prohibitive computational complexity and non-adequate sampling in the absence of an appropriate data structure. The Hierarchical Equal Area isoLatitude Pixelation is a geometric structure designed to meet this demand using a self-similar refinable mesh. It is currently the most widely used pixelation for storing and analysing CMB data (Gorski et al., 2005).
The package rcosmo provides various tools to visualize and work with the HEALPix structure.
HEALPix initially divides the sphere into 12 equiareal base pixels. To visualise these with rcosmo , we can first generate a CMBDataFrame at some low resolution (e.g, N side = 64) and then take three separate window subsets in the pixels that we intend to colour, as shown in Figure ??). Note that, while all HEALPix pixels are 4-sided, their edges are not geodesics, i.e., they are not spherical quadrillaterals (Calabretta and Roukema, 2007).
Many tasks gain or lose efficiency with the choice of ordering scheme. For example, nearest neighbour searches are best conducted in the nested scheme (Gorski et al., 2005). As such, every object of class "CMBDataFrame" or "HPDataFrame" has an attribute named ordering to indicate which of the two schemes is being used. This allows rcosmo functions to choose the most efficient scheme for each task, performing any necessary conversions with the internal functions nest2ring and ring2nest.

HEALPix functions
For working directly with HEALPix properties there are a number of rcosmo functions. Some core functions are shown in Figure 1 in the yellow colour, other are internal and some were exported. Broadly, there are the following categories: • Working with ordering schemes, • Navigating the nested ordering hierarchy, • Geometric functions involving pixel indices, • Visualising the HEALPix structure.
The main ordering functions include converting between two ordering schemes and getting information about a type of ordering (ordering generic and internals nest2ring and ring2nest). For example, the ordering generic function is useful for getting and setting the ordering attribute of a CMBDataFrame or HPDataFrame.
For example, the k th ancestor of a pixel index p at resolution j := log 2 (N side ) is the pixel index p a to which p belongs, k steps up the hierarchy (i.e., at resolution j − k). It turns out that p a is a function p a = f (p, k) that is independent of j. The following example produces the ancestors of pixel p = 10 3 , for k = 1, 2, . . . , 5, using the internal function ancestor. > ancestor(1e3, 1:5) [1] 250 63 16 4 1 A function that is not resolution independent is pixelWindow. In the following code, pixelWindow retrieves all pixels at resolution j 2 = 5 that lie within pixel p = 1, specified at resolution j 1 = 1. The result is shown in Figure 4. p <-1; j1 <-1; j2 <-5 P <-pixelWindow(j1 = j1, j2 = j2, pix.j1 = p) displayPixels(spix = P, j = j2, plot.j = j2) The group of rcosmo functions that includes pix2coords, pixelArea, nestSearch, etc., computes spherical geometric properties in relation to by pixel indices. For example, the nestSearch function searches a pixel closest to a point in 3d space. It uses an algorithm that achieves a high level of efficiency using the nested hierarchy. A comparison, via microbenchmark (Mersmann, 2018), with a basic linear search algorithm which we call geoDistSearch reveals the following.

Subsetting and combining spherical regions
rcosmo functions for selecting and visualizing spherical regions can be broadly divided in the following groups: • Creating basic CMBWindow objects (polygons or spherical discs), • Combining different sub-regions by using compliments, unions and intersections to create a new CMBWindow object, • Plotting a region with boundary and inside points, • Extracting data from a given CMBDataFrame restricted to a CMBWindow region.
Class CMBWindow is designed to carry geometrical information describing the interior or exterior of spherical figures (polygons, spherical discs (caps), and their complements, unions and intersections). The polygons can be non-convex, though CMBWindow carries a boolean attribute assumedConvex that should be set to TRUE, if the polygon is known in advance to be convex. In this case special methods that decrease computation times are applied.

Spherical geometry functions
Several basic tools for spherical geometry are implemented in rcosmo : • Converting between different coordinate systems on the sphere, • Computing geodesic distances between points and windows, • Calculating spherical angles, • Computing areas of spherical figures, • Triangulating spherical polygons.
The currently implemented core geometric functions are shown in Figure 1 in the orange colour. Some other spherical geometric tools are specified for the HEALPix representation or CMBWindows and are shown in the green colour.
For example, the functions geoArea computes the area of a figure on the unit sphere that is encompassed by its pixels.

> geoArea(sky.annulus) [1] 2.11917
Another example is the function maxDist that computes the maximum geodesic distance either between all points in a data.frame pairwise, or between all points in a data.frame and a target point.

Statistical functions
In this section we overview core statistical functions implemented in rcosmo . The package provides various tools for statistical analysis of spherical data that can be broadly divided in the following types: • Spherical random sampling, • Univariate spherical statistics and plots, • Multivariate statistics for data from different CMBWindows, • Measures of spatial dependencies.
The main currently implemented statistical functions are shown in Figure 1 in the blue colour. Below we provide few examples of functions for each type.
The first Minkowski functional fmf returns an area of the spherical region, where the intensities are above of the specified threshold level α.
> fmf(cmbdf = sky.annulus, alpha = 0, intensities = "I") [1] 1.054269 fRen computes values of the sample fractal scaling exponent on the grid of N uniformly spaced points in the interval [q.min, q.max]. The scaling exponent describes fractal properties of random fields and can be used for testing departures from Gaussianity, see (Leonenko and Shieh, 2013). For example, Figure 8 shows that the sample scaling exponent is an approximate strait line for for the data in CMBWindow sky.annulus. Thus, there is not enough evidence for substantial multifractality of the random field based on the data in this CMBWindow. More details can be found in the paper (Leonenko et al., 2020).
There are several rcosmo functions for comparison of data from two or more CMBWindows. For example, the function qqplotWin is a modification of the standard qqplot to produces a QQ plot of quantiles of observations in two CMBWindows against each other for a specified CMBDataFrame column.
The example below shows that the distributions of temperatures in the Dragon and Scorpion constellations are similar.
> qqplotWin(df1, polygon1, polygon2) The function qstatq can be used to measure spatial stratified heterogeneity in a list of CMBWindows.
It takes values in [0, 1], where 0 corresponds to no spatial stratified heterogeneity, 1 means a perfect heterogeneity case. For example, the results below shows that there is not enough evidence for spatial stratified heterogeneity, i.e. the value of the temperature intensities are not different in these two CMBWindows.

Investigating spatial dependencies
This section presents some of rcosmo tools for the analysis of spatial dependencies in spherical data.
As the geodesic and Euclidean distances are different, covariance functions on R 3 can not be used directly for S 2 . The package implemented several parametric models of covariance functions (2) on the sphere, see theoretical foundations in (Gneiting, 2013). rcosmo uses the package geoR and extends its list of general spatial models and some functions to the spherical case. Currently available choices of covariance models are matern,exponential,spherical,powered.exponential,cauchy, gencauchy, pure.nugget,askey, c2wendland,c4wendland,sinepower, and multiquadric. The default option is matern.
The function covmodelCMB computes values of theoretical covariance functions given the separation distance of locations. The function returns the value of the covariance Γ(h) at the geodesic distance h. The covariance model uses the general form Γ(h) = σ 2 · ρ(h/ψ), where the variance σ 2 and the range ψ are vertical and horizontal scaling parameters respectively.
For example, the following command computes the value of the Askey covariance function with the parameters σ 2 = 1, ψ = π, and κ = 4 at the geodesic distance h = π/4. > covmodelCMB (pi/4, cov.pars = c(1, pi), kappa = 4, cov.model = "askey" ) [1] 0.3164062 The command plotcovmodelCMBPlot is designed to produce quick plots of theoretical covariance functions. The result for the Askey covariance function is shown in Figure 11. > plotcovmodelCMB("askey", phi = pi/4, to = pi/2, kappa = 4) If a random fields is isotropic its covariance function depends only on a geodesic distance between locations. In this case the function covCMB can be used to compute an empirical covariance function for intensity data in a CMBDataFrame or data.frame. Output is given up to a maximal geodesic distance max.dist, which can be chosen between 0 and π (by default equals π).   The function variofitCMB estimates parameters of variogram models (see equation (3) for the link between covariance and variogram functions) by fitting a parametric model from the list covmodelCMB to a sample variogram estimated by the function variogramCMB. This function is built on and extends variofit from the package geoR to specific rcosmo covariance models on the sphere.
In the example below the Matern variogram is fitted to the empirical variogram on the interval [0, 0.1] using the ordinary least squares method, see Figure 13. > varcmb <-variogramCMB(df1, max.dist = 0.1, num.bins = 30) > ols <-variofitCMB(varcmb, fix.nug=FALSE, wei="equal", cov.model= "matern") > plot(varcmb) > lines(ols, lty=2) The package also has tools to work with angular power spectra of spherical random fields. The CMB power spectrum data are freely available from the section "Cosmology products" of the Planck Legacy Archive 9 . They can be easily downloaded in a ready-to-use rcosmo format by the function downloadCMBPS.
The function covPwSp uses values of an angular power spectra to provide a covariance estimate by equation (2). As the argument of the covariance function in equation (2) is cos Θ the following code uses the inverse transformation acos to plot the covariance estimate as a function of angular distances in Figure 14.

Converting other spherical data to HEALPix format
While the HEALPix is the main representation in cosmological applications there are numerous spherical data, for example, in geosciences, that use different coordinate systems and spherical formats. This example shows how non-HEALPix spherical data can be converted to the HEALPix format for rcosmo analysis.
A HPDataFrame is suitable for storing data that is located on a sphere, but has not been preprocessed to suit HEALPix structured storage. There are various ways to assign HEALPix pixel indices to the rows of a HPDataFrame. This example presents the way when a desired resolution is specified in advance. Then the HPDataFrame constructor automatically assigns pixel indices based on coordinates provided. A row is assigned the pixel index of its closest pixel center. rcosmo also has the option of automatic computing of a required resolution for given data to assign data locations to unique pixels.
As an example we use the database with the latitude and longitude of over 13 thousand world's large cities and towns available from the World Cities Database 10 . First, cities latitudes and longitudes in degrees are converted to spherical coordinates (θ, φ) in radians. Then , we create and visualize HPDataFrame at the resolution nside = 1024, see Figure 15.

Summary and future directions
This article introduces the package rcosmo for analysis of CMB, HEALPix and other spherical data. The package integrates the HEALPix representation and various spherical geometric and statistical methods in a convenient unified framework. It opens efficient handling and analysis of HEALPix and CMB data to the R statistical community. rcosmo also introduces several new spherical statistical models and methods that were not available in R before. The package can also be very useful for researchers working in geosciences.
There are several possible extensions that would be useful to the package. Some of them include: • integrating with available Python and C++ HEALPix software, • including new spherical statistical models and methods, • further development of spherical spectral and multifractal methods, • adding new visualisation tools, • improved use of memory mapping to use on extremely high resolution images.

Appendix: Statistical model
Consider a sphere in the three-dimensional Euclidean space S 2 = x ∈ R 3 : x = 1 ⊂ R 3 .
A real-valued second-order mean-square continuous spherical random field T can be expanded in the series where {Y lm (θ, ϕ)} represents the complex spherical harmonics and a lm are random variables. If a random field is isotropic then Ea lm a * l m = δ l l δ m m C l , −l ≤ m ≤ l, −l ≤ m ≤ l , l ≥ 0.
The covariance function of the isotropic random fields T has the following representation Γ(cos Θ) = ET(θ, ϕ)T(θ , ϕ ) = 1 4π ∞ ∑ l=0 (2l + 1)C l P l (cos Θ), where P l (·) is the l-th Legendre polynomial and ∑ ∞ l=0 (2l + 1)C l < ∞. The variogram (semivariogram) function is defined by The package uses observations of the field T at HEALPix points. There are two main approaches in the statistical analysis of T that are realised in rcosmo . First approach directly uses the observations for parameter estimation and hypothesis tests. For example, the classical estimator of the isotropic variogram γ(h) takes the form of where N h is the set of the spherical location pairs at the geodesic distance h. The second approach is spectrum-based. Initially, the estimatesâ lm are computed by inverting (1). Then, empirical covariance functions and variograms can be obtained by substituting the estimated values of the angular power spectrumĈ l = 1 2l + 1 l ∑ m=−l |â lm | 2 in equations 2 and 3.