The R Journal: article published in 2016, volume 8:1

R Packages to Aid in Handling Web Access Logs PDF download
Oliver Keyes, Bob Rudis and Jay Jacobs , The R Journal (2016) 8:1, pages 360-366.

Abstract Web access logs contain information on HTTP(S) requests and form a key part of both industry and academic explorations of human behaviour on the internet. But the preparation (reading, parsing and manipulation) of that data is just unique enough to make generalized tools unfit for the task, both in programming time and processing time which are compounded when dealing with large data sets common with web access logs. In this paper we explain and demonstrate a series of packages designed to efficiently read in, parse and munge access log data, allowing researchers to handle URLs and IP addresses easily. These packages are substantially faster than existing R methods from a 3-500% speedup for file reading to a 57,000% speedup in URL parsing.

Received: 2016-01-29; online 2016-06-13
CRAN packages: httr, ApacheLogProcessor, webreadr, readr, microbenchmark, urltools, httr, XML, lubridate, iptools, rgeolocate, Rcpp
CRAN Task Views implied by cited CRAN packages: WebTechnologies, HighPerformanceComputing, NumericalMathematics, ReproducibleResearch, TimeSeries


CC BY 4.0
This article is licensed under a Creative Commons Attribution 3.0 Unported license .

@article{RJ-2016-026,
  author = {Oliver Keyes and Bob Rudis and Jay Jacobs},
  title = {{R Packages to Aid in Handling Web Access Logs}},
  year = {2016},
  journal = {{The R Journal}},
  doi = {10.32614/RJ-2016-026},
  url = {https://doi.org/10.32614/RJ-2016-026},
  pages = {360--366},
  volume = {8},
  number = {1}
}