anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures

As the number of cyber-attacks continues to grow on a daily basis, so does the delay in threat detection. For instance, in 2015, the Office of Personnel Management discovered that approximately 21.5 million individual records of Federal employees and contractors had been stolen. On average, the time between an attack and its discovery is more than 200 days. In the case of the OPM breach, the attack had been going on for almost a year. Currently, cyber analysts inspect numerous potential incidents on a daily basis, but have neither the time nor the resources available to perform such a task. anomalyDetection aims to curtail the time frame in which anomalous cyber activities go unnoticed and to aid in the efficient discovery of these anomalous transactions among the millions of daily logged events by i) providing an efficient means for pre-processing and aggregating cyber data for analysis by employing a tabular vector transformation and handling multicollinearity concerns; ii) offering numerous built-in multivariate statistical functions such as Mahalanobis distance, factor analysis, principal components analysis to identify anomalous activity, iii) incorporating the pipe operator (%>%) to allow it to work well in the tidyverse workflow. Combined, anomalyDetection offers cyber analysts an efficient and simplified approach to break up network events into time-segment blocks and identify periods associated with suspected anomalies for further evaluation.

Robert J. Gutierrez , Bradley C. Boehmke , Kenneth W. Bauer , Cade M. Saie , Trevor J. Bihl
2017-08-04

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2017-039.zip

CRAN packages used

anomalyDetection, magrittr, tidyverse

CRAN Task Views implied by cited packages

WebTechnologies

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Gutierrez, et al., "anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures", The R Journal, 2017

BibTeX citation

@article{RJ-2017-039,
  author = {Gutierrez, Robert J. and Boehmke, Bradley C. and Bauer, Kenneth W. and Saie, Cade M. and Bihl, Trevor J.},
  title = {anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures},
  journal = {The R Journal},
  year = {2017},
  note = {https://doi.org/10.32614/RJ-2017-039},
  doi = {10.32614/RJ-2017-039},
  volume = {9},
  issue = {2},
  issn = {2073-4859},
  pages = {354-365}
}