Aviation data has become increasingly more accessible to the public thanks to the adoption of technologies such as Automatic Dependent Surveillance-Broadcast (ADS-B) and Mode S, which provide aircraft information over publicly accessible radio channels. Furthermore, the OpenSky Network provides multiple public resources to access such air traffic data from a large network of ADS-B receivers. Here, we present openSkies, the first R package for processing public air traffic data. The package provides an interface to the OpenSky Network resources, standardized data structures to represent the different entities involved in air traffic data, and functionalities to analyze and visualize such data. Furthermore, the portability of the implemented data structures makes openSkies easily reusable by other packages, therefore laying the foundation of aviation data engineering in R.
The ADS-B aircraft surveillance technology enables the tracking of aircraft through the reception by ground stations of a signal broadcast by the aircraft itself (Rekkas and Rees 2008). Such a system does not require any external input and is instead fully dependent on data obtained by the aircraft navigation system. Additionally, unlike other radar-based systems used for aircraft surveillance, no interrogation signals from the ground stations are required. This is in opposition to, for example, the air traffic control radar beacon system operating in Mode S (Selective), where specific aircraft send a response message only after reception of another signal (the interrogation) emitted by a ground antenna. In the ADS-B, messages broadcast by an aircraft can include precise location information, typically determined by a Global Navigation Satellite System, such as GPS (Global Positioning System). Alternatively, it is possible to estimate the position of the aircraft from the timestamps of reception of the broadcast signal at different stations through multilateration (Kaune et al. 2012).
The OpenSky Network initiative provides free access to high-quality air traffic data derived from a network of thousands of ground stations that receive ADS-B messages, and, more recently, FLARM ("Flight Alarm") messages as well (Schäfer et al. 2014). Both ADS-B and FLARM messages are essentially radio signals broadcasted by an aircraft that can be received by other aircraft or ground stations. They carry positional information and other data about the broadcasting aircraft (with the main difference being that the FLARM system is primarily aimed at avoiding collisions. Therefore, it also transmits a prediction of the trajectory of the aircraft in the next 20 seconds). More than 25 trillion ADS-B messages and over 3 billion FLARM messages have been collected by the OpenSky Network since 2013, corresponding to over 400,000 aircraft in 190 countries. Such data provides the basis to develop novel algorithms for the determination of aircraft position and the analysis of air traffic data. Access to the available data is provided through two main tools: a live API (which also supports the retrieval of historical data, although with some limitations) and an Impala shell (Apache Software Foundation 2020) that provides fast access to the full historical dataset, to which free access can be obtained for research and other non-profit purposes. However, these resources by themselves do not provide the data in appropriate data structures that make them amenable to advanced analyses available in R and other R packages.
We present openSkies, an R package that aims to provide a well-established and documented basis for the analysis of air traffic in R and to facilitate the future development of related algorithms. To that extent, access to the OpenSky Network live API and Impala shell is provided, as well as a decoder of raw ADS-B messages. A set of data structures that represent relevant concepts (such as aircraft, flights, or airports) is also implemented and used to store the data retrieved from the OpenSky Network. Relevant methods to process and analyze the corresponding data are associated with each data structure.
The package also includes functions to perform statistical analysis and to visualize the retrieved data, enabling, for example, the clustering of trajectories and the representation of aircraft and flight paths in a given airspace. Additional functionalities can be easily built on the already implemented classes and functions, either in future versions of openSkies or in other packages developed by the R community. In the following sections, we describe the different features of the package and demonstrate how each of them can be accessed and applied to real data.
Firstly, we describe the implemented data structures required to model air traffic (2). We then show the functions that allow instantiating said classes with data obtained from the OpenSky Network (3). Next, the currently available tools to visualize (4) and cluster (5) are presented. Finally, a tool to decode raw ADS-B messages is presented (6). Examples with real data are provided throughout the different sections to illustrate the features of the package.
In order to establish standardized data structures that could serve as the basis for future developments in air traffic analysis in R, a set of R6 classes representing the frequently involved entities was implemented. R6 was chosen as the class system due to the possibility to establish formal definitions of reference classes, as well as the smaller memory footprint and faster speeds of object instantiation, field access, and field setting compared to base R’s Reference Classes system.
Additionally, R6 classes are portable. Therefore, the classes defined in openSkies can be used by other packages.
In its current version (1.1.3), openSkies defines classes for the following entities: aircraft, airports, flight instances, flight routes, single aircraft state vectors, and series of aircraft state vectors (Figure 1).
Class "openSkiesStateVector" represents the state vector of an
aircraft at a given time. A state vector comprises multiple pieces of
information about the aircraft and its status, including positional
information (latitude, longitude, altitude, whether the aircraft is on
the ground) and trajectory information (velocity, track angle, vertical
rate). Sets of "openSkiesStateVector" are represented with class
"openSkiesStateVectorSet", which contains two fields: a list of
objects of class "openSkiesStateVector" and a logical, indicating if
the set of state vectors corresponds to a time series of the same
aircraft (field time_series). Class "openSkiesStateVectorSet"
contains methods to add more objects of class "openSkiesStateVector"
to the contained list of state vectors, get interpolations of any of the
numerical fields of the state vectors (uniformly spaced or at specific
time points), sort the list of state vectors by a specific field, get
the values of a given field for all the state vectors, and split the set
of state vectors into individual flight instances. The latter operation
is performed by firstly grouping the state vectors by aircraft and then
splitting them into flights based on two conditions: either the aircraft
staying on the ground for a given amount of time or the aircraft not
sending any status update for a given period (with default values of 300
seconds and 1800 seconds).
A flight instance is defined as a single flight performed by an
aircraft, including typically (but not necessarily) take-off, air
transit, and landing. Flights are represented class "openSkiesFlight",
which contain fields for the aircraft that performed the flight, the
departure and arrival times, the airports of origin and destination (as
objects of class "openSkiesAirport"), and a set of state vectors
describing the aircraft during the flight, which is an object of class
"openSkiesStateVectorSet" with field time_series=TRUE. The class
contains methods to get the state vector of the aircraft for a specified
time, calculate the duration of the flight, and calculate the distance
(based on features extracted from the trajectories) to another object of
class "openSkiesFlight".
Airports are represented by class "openSkiesAirport", with fields
describing, among others, the airport codes, its position (latitude,
longitude, and altitude), and its location in terms of administrative
divisions of different levels.
Flight routes, which differ from flight instances in that they are
designated paths followed regularly by aircraft to go from one airport
to another one, are represented by class "openSkiesRoute". The class
contains fields for the call sign and flight number associated with the
route, its operator, and its airports of origin and destination (objects
of class "openSkiesAirport").
Finally, aircraft are represented by class "openSkiesAircraft", which
contains fields describing the aircraft model, registration,
manufacturer, owner, and operator. Objects of this class can also
contain a field with an object of class "openSkiesStateVectorSet"
representing the history of known state vectors of the aircraft in a
given period.
While it is possible to manually instantiate any of the classes
previously described by providing values for the required fields with
information obtained from any source, a more convenient way is to
instantiate them from information automatically retrieved from the
OpenSky Network. To that extent, a series of functions that retrieve
data from the different resources offered by the OpenSky Network is
implemented. State vectors can be accessed with two functions. Firstly,
getSingleTimeStateVectors retrieves all the state vectors received at
a specified time as an "openSkiesStateVectorSet" object with field
time_series=FALSE. In the default behavior, state vectors received
from any aircraft at any location are returned. However, the results can
be filtered by specifying one or more aircraft and the boundaries of an
area of interest. On the other hand, getAircraftStateVectorsSeries
retrieves a time series of state vectors for a given aircraft with the
specified time resolution, in the form of an "openSkiesStateVectorSet"
object with field time_series=TRUE. Both functions can retrieve data
from the OpenSky Network through two resources integrated seamlessly. By
default, the live API is employed. The live API does not require
registration but imposes limitations (larger for anonymous users) on the
retrieval of historical data and the time resolution with which data can
be accessed. Additionally, large queries can take considerable amounts
of time, especially when retrieving time series of state vectors.
Therefore, the functions can also access the database through the Impala
shell, which requires registration but does not impose such limits and
performs queries significantly faster. Authentication for these two
functions can be performed by providing login details through the
username and password arguments, and access through the Impala shell
can be enabled by authorized users simply by setting argument
useImpalaShell=TRUE. New users can register through the OpenSky
Network website, and Impala shell access
should be directly requested to the OpenSky Network administrators.
In both cases, the resulting objects of class
"openSkiesStateVectorSet" can be used to instantiate objects of class
"openSkiesFlight" by calling the split_into_flights method.
Alternatively, functions to retrieve flight instances from the OpenSky
Network are available, based on the aircraft that performed them
(getAircraftFlights) or the airport of origin (getAirportDepartures)
or destination (getAirportArrivals). In all cases, it is possible to
include the time series of state vectors corresponding to each flight by
setting includeStateVectors=TRUE and airport metadata through
includeAirportsMetadata=TRUE.
Airport and aircraft metadata can also be retrieved independently with
the getAirportMetadata and getAircraftMetadata functions by
providing the ICAO 4-letter code of the airport of interest or the ICAO
24 bit address of the target aircraft, respectively.
Finally, information about routes can be accessed by providing the call
sign of the route to the getRouteMetadata function, which includes the
possibility to retrieve the metadata of the associated airports of
origin and destination if includeAirportsMetadata=TRUE.
In the following example, we aim to demonstrate some of the most common methods for retrieving air traffic data and analyzing it.
  library(openSkies)
  
  ## In the following example, we retrieve information about all flights
  ## that departed from Frankfurt Airport the 29th of January 2018 after 12 pm.
  ## It should be noted that such large requests can take a long time unless
  ## performed with the Impala shell
  ## It should also be noted that, in this and the following examples, the
  ## values for the username and password arguments should be substituted with
  ## personal credentials registered at the OpenSky Network
  flights <- getAirportArrivals(airport="EDDF", startTime="2018-01-29 12:00:00", 
                                endTime="2018-01-29 24:00:00", timeZone="Europe/Berlin",
                                includeStateVectors = TRUE, timeResolution = 60,
                                useImpalaShell = TRUE, username = "user",
                                password = "password")
  ## We can then easily check the amount of flights
  length(flights) # 316 flights
  
  ## Trajectories of the 316 flights can be obtained by retrieving
  ## the set of state vectors of each flight
  trajectories_frankfurt <- lapply(flights, function(f) f$state_vectors)
  
  ## It is also possible to retrieve all state vectors received from a 
  ## given aircraft in a given period of time. In the following example, we
  ## obtain all the state vectors received for aircraft with ICAO24 code
  ## 403003 between 12 PM of the 8th of October, 2020 and 6 PM of the 
  ## 9th of October, 2020
  stateVectors <- getAircraftStateVectorsSeries("403003", startTime = "2020-10-08 12:00:00", 
                                                endTime = "2020-10-09 18:00:00", 
                                                timeZone="Europe/London",
                                                timeResolution = 60,                         
                                                useImpalaShell = TRUE, 
                                                username = "user",
                                                password = "password")
  
  ## The ensemble of state vectors can then be split into individual
  ## flight instances, revealing that the aircraft performed 6 flights
  ## in the analyzed period
  flights <- stateVectors$split_into_flights() 
  length(flights) # 6 flights
  
  ## Let us get some additional information about the flights performed by
  ## this aircraft. For example, the maximum speed that it reached in km/h
  maxSpeed <- max(stateVectors$get_values("velocity", removeNAs = TRUE))/1000*3600
  
  ## The maximum speed that it reached is just above 210 km/h. This is well below 
  ## the speed typically used by commercial passenger jets, usually around 800 km/h
  ## Investigation of the aircraft model confirms that it is indeed a small CESSNA 152
  aircraftData <- getAircraftMetadata("403003")
  aircraftData$modelTwo main methods for visualization of air traffic are currently available in openSkies: plotting of flights and plotting of aircraft at a given time.
The first method aims to represent flights performed by one or more
aircraft over a given period of time. If only a single flight is to be
plotted, the plotRoute function can be used, which receives as its
main input an object of class "openSkiesStateVectorSet" with
time_series=TRUE. Alternatively, plotRoutes can be used for plotting
multiple flights, receiving in this case as input a list of
"openSkiesStateVectorSet" objects with time_series=TRUE. The color
of the routes can be manually set by providing a vector of color names
to pathColors and setting literalColors=TRUE. It is also possible to
color the routes according to the value of any property by providing
such values as a factor to pathColors and setting
literalColors=FALSE.
The second visualization method (plotPlanes) enables the plotting of
all aircraft flying over a defined area at a particular time. An
"openSkiesStateVectorSet" object with each state vector representing
the position of an aircraft at the desired time should be provided as
input, most frequently obtained through getSingleTimeStateVectors.
All plotting functions allow plotting onto any object of class ggmap
from the ggmap package
(Kahle et al. 2019). If no ggmap object is provided, one will be created
internally, with boundaries large enough to contain all the positions in
the provided "openSkiesStateVectorSet" objects, plus additional space
determined by the paddingFactor argument.
The following examples demonstrate the application of visualization functions. First, the formerly retrieved trajectories for the CESSNA 152 are plotted with different colors:
  ## First, let us obtain the trajectories of the flights performed
  ## by the CESSNA 152
  trajectories_CESNA152 <- lapply(flights, function(f) f$state_vectors)
  
  ## Then, we create a color palette
  library(viridis)
  colors <- magma(length(trajectories))
  
  ## Then, the trajectories are plotted
  plotRoutes(trajectories, pathColors = colors, lineSize = 1.2,
             lineAlpha = 0.8, includeArrows = TRUE,
             paddingFactor = 0.05)The resulting plot is shown in Figure 2. Next, we plot all the aircraft flying over Switzerland in a given instant:
  ## Firstly we retrieve the state vectors of all aircraft flying over Switzerland
  ## the 2nd of March, 2018 at 14.30 (London time)
  vectors_switzerland <- getSingleTimeStateVectors(time="2018-03-02 14:30:00", 
                                                   timeZone="Europe/London", 
                                                   minLatitude=45.8389,
                                                   maxLatitude=47.8229, 
                                                   minLongitude=5.9962, 
                                                   maxLongitude=10.5226, 
                                                   username="user", 
                                                   password="password", 
                                                   useImpalaShell = TRUE)
  ## Then, the aircraft are plotted
  plotPlanes(vectors_switzerland)The result is shown in Figure 3.
As a means to analyze trajectories, openSkies provides the
functionalities to perform clustering of trajectories. The first step is
to obtain a matrix of features of the trajectories to be clustered. This
can be achieved by providing a list of "openSkiesStateVectorSet"
objects to the getVectorSetListFeatures function.
The "openSkiesStateVectorSet" objects in the list should each
correspond to a single flight instance, and therefore it is advisable to
first split the data into individual flights with method
split_into_flights. It is possible to include track angles in the
generated features matrix by setting useAngles=TRUE, which can be
desirable if flights going along the same route but in opposite
directions should be classified separately.
The computed features matrix (or, alternatively, the list of
"openSkiesStateVectorSet" objects, in which case the features matrix
will be calculated internally) can then be passed to the clusterRoutes
function. A number of different clustering algorithms can be selected
through the method argument. Possible methods are dbscan, kmeans,
hclust, fanny, clara, and agnes.
The desired number of clusters can be specified through
numberClusters, although this argument is ignored if the DBSCAN
algorithm (Hahsler and Piekenbrock 2019) is used since it does not require a priori
determination of the number of clusters. Instead, the eps parameter
controls the size of the epsilon neighborhood to be passed to the
DBSCAN, which influences the number of found clusters (any two
trajectories will be considered to belong to the same cluster if their
calculated distance is below the threshold epsilon).
For all other methods (Maechler et al. 2019), if a number of clusters is not
chosen, it will be internally estimated. The results of clustering can
be visualized by providing the factor with the assigned clusters to the
pathColors argument of plotRoutes. In the following example, we
perform clustering of the previously obtained set of 316 flights
departing from Frankfurt airport (Figure 4).
  ## As an example, let´s cluster the previously retrieved 316 flights departing
  ## from Frankfurt airport into 8 clusters with the k-means algorithm
  clusters = clusterRoutes(trajectories_frankfurt, "kmeans", numberClusters = 8)
  
  ## The results can be visualized with plotRoutes
  plotRoutes(trajectories_frankfurt, pathColors = clusters$cluster, 
             literalColors = FALSE, paddingFactor = 0.1)
  
  ## The clusters differ in their broad area of destination. For example,
  ## clusters 4, 5 and 6 comprise mostly flights to North America, Japan and
  ## central Asia respectively. Due to their geographical proximity, it is expected
  ## that flights from cluster 5 are closer to flights from cluster 6 than
  ## to those assigned to cluster 4. Let us confirm this:
  flights[clusters$cluster==5][[1]]$distance_to_flight(flights[clusters$cluster==6][[1]])
  ## Returns 123.3449
  flights[clusters$cluster==5][[1]]$distance_to_flight(flights[clusters$cluster==4][[1]])
  ## Returns 348.4957
  
  ## The shorter distance between flights of clusters 5 and 6 is in accordance with
  ## the expectations.Finally, openSkies also provides a decoder of raw ADS-B messages,
implemented as an instance of R6 class "adsbDecoder" with the methods
required to decode messages. This tool is useful for users willing to
extract air traffic data from their own sources, such as private ADS-B
receivers. Firstly, the messages, typically available in hexadecimal
format, must be converted to binary format, which can be achieved with
the hexToBits function. The binary messages can then be passed to the
decodeMessage method, which decodes a single message. It should be
noticed that, due to the Compact Position Reporting format with which
the position of airborne aircraft is encoded, at least 2 consecutive
positional ADS-B messages are required in order to unequivocally
determine the position of the aircraft (referred to as even and odd
frames).
Therefore, the decoder caches the most recently even and odd decoded message, to decode the next complementary positional message.
It is also possible to provide a list of messages in binary format to
the decodeMessages method, which will directly provide positional
information about the aircraft if the list contains two complementary
even and odd positional messages. In the following example, we
demonstrate both the sequential and batch decoding of ADS-B messages:
  ## Let use the following set of 2 ADS-B messages in hexadecimal format
  ## to illustrate the ADS-B decoder
  message1 <- "8D40621D58C386435CC412692AD6"
  message2 <- "8D40621D58C382D690C8AC2863A7"
  
  ## First, the messages must be converted to binary format:
  message1_bin <- ADSBDecoder$hexToBits(message1)
  message2_bin <- ADSBDecoder$hexToBits(message2)
  
  ## We can then obtain positional information by sequentially decoding
  ## both messages. Note that no positional information is obtained from the
  ## data extracted from message1, since this is the first message of the two
  ## that, as a pair, encode positional information.
  message1_decoded <- ADSBDecoder$decodeMessage(message1_bin)
  message2_decoded <- ADSBDecoder$decodeMessage(message2_bin)
  message2_decoded$data$lat # The latitude is 52.2572 degrees North
  message2_decoded$data$lon # The longitude is 3.919373 degrees East
  
  ## We can also provide both messages as a list to decode them simultaneously:
  all_messages_decoded <- ADSBDecoder$decodeMessages(list(message1_bin, message2_bin))openSkies is in active development. Therefore, novel features are expected to be available through frequent updates.
Some of the features that will become available in the short and medium-term include:
New technologies such as ADS-B surveillance and initiatives like the OpenSky Network have largely increased the amount of air traffic data that is publicly available, as well as its accessibility. However, the data provided by said resources must be parsed and formatted adequately in order to be analyzed in-depth and serve as the starting point to develop new algorithms useful for the field, such as novel multilateration methods.
However, while appropriate toolboxes have been implemented in other languages (Olive and Basora 2019), such a standardized set of tools was not available in R until the development of openSkies, which aims to fill this void. One of the key features of our package is the automation and streamlining of the data engineering required in order to access air traffic data at a large scale and to enable data scientists to make meaningful analyses with it. The implemented portable data structures and functionalities provide a solid base to apply the vast range of functionalities provided by other R packages to air traffic data. The ecosystem created with future packages that interoperate with openSkies, together with reference datasets (Schäfer et al. 2020), should serve to standardize and expand the scope of air traffic data analysis in R.
The development of this work is supported by the Spanish Ministry of Science and Innovation (grant code PID2019-105471RB-I00) and the Regional Government of Andalusia (grant code P18-RT-1060).
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ayala, et al., "openSkies - Integration of Aviation Data into the R Ecosystem", The R Journal, 2021
BibTeX citation
@article{RJ-2021-095,
  author = {Ayala, Rafael and Ayala, Daniel and Vidal, Lara Sellés and Ruiz, David},
  title = {openSkies - Integration of Aviation Data into the R Ecosystem},
  journal = {The R Journal},
  year = {2021},
  note = {https://doi.org/10.32614/RJ-2021-095},
  doi = {10.32614/RJ-2021-095},
  volume = {13},
  issue = {2},
  issn = {2073-4859},
  pages = {590-599}
}