Conference Report: Why R? 2019

The ‘Conference Report: Why R? 2019’ article from the 2020-1 issue.

Michał Burdukiewicz (Warsaw University of Technology, Why R? Foundation) , Filip Pietluch (University of Wrocław) , Jarosław Chilimoniuk (University of Wrocław) , Katarzyna Sidorczuk (University of Wrocław) , Dominik Rafacz (Warsaw University of Technology) , Leon Eyrich Jessen (Technical University of Denmark) , Stefan Rödiger (Brandenburg University of Technology Cottbus–Senftenberg) , Marcin Kosiński (Gradient Metrics LLC, Why R? Foundation) , Piotr Wójcik (University of Warsaw, Data Science Lab)
2020-06-01
graphic without alt text
Figure 1: Why R? 2019 conference banner used for social media promotion.

1 Why R? 2019 conference

Why R? conferences have been the hallmark of the Why R? Foundation (whyr.pl). Our goal has been to establish a series of international R-related events in Poland. After three years, we are happy to announce that our main event, the Why R? conference, has become one of the largest annual R conferences in Central Europe.

Why R? 2019 was the third part of Why R? conference event. After the last edition that was held in Wrocław (Burdukiewicz et al. 2018), our conference has returned to Warsaw. A total of approximately 300 people from 20 countries attended the main conference event. The event took place from 26th to 29th September 2019 and was co-organised by the Faculty of Economic Sciences of the University of Warsaw (wne.uw.edu.pl/en/), a leading academic institution in Poland, having important achievements in quantitative methods and data science. We received major support from ML in PL Society (mlinpl.org), a group of young researchers, aiming to promote machine learning events in Poland, who shared their resources and experience to make the conference more accessible.

For the first time, this year the conference featured a language-agnostic data visualizations hackathon (whyr.pl/2019/hackathon). Such an event gives the Why R? community a chance to exchange experience and inspirations with the users of any other languages and tools.

2 Participants

In spite of the fact that Why R? events are aimed at experienced data science practitioners, each conference gathers a high percentage of students (around 30%). Our participants have very diverse scientific backgrounds, where mathematics (mainly statistics) and computer science are the most common. All of them have jobs related to data science, including professional R developers (programmers), data engineers, machine learning practitioners and business analysts. One of the key advantages of Why R? is that it gathers participants both from academia and the industry.

3 Conference program

graphic without alt text
Figure 2: Why R? 2019 conference programme.

The format of the conference was aimed at exposing participants to recent developments in the R language as well as a wide range of application examples. The event consisted of workshops, invited keynote talks, field-specific series of talks, lightning-talks, special interest groups and a full-day data visualizations hackathon. It offered extensive networking opportunities. The welcome party was held at the conference venue on the first day of lectures. In addition, many informal gatherings were organised during each conference day, as the event took place close to the Old Town.

To sum up, Why R? 2019 consisted of: one day of hackathon (60 attendees), one day of workshops (150 attendees), one evening of round tables, two days of lectures (250 attendees) and one evening Welcome paRty (100 attendees). In 2019 we hosted a total of 315 unique attendees. During lectures there were carried out: 6 keynote talks, 42 regular talks and 14 lightning talks. Below you can find the conference agenda.

graphic without alt text
Figure 3: Why R? 2019 conference agenda.

Materials from the conference are available on GitHub and YouTube: - abstracts github.com/WhyR2019/abstracts, - presentations github.com/WhyR2019/presentations - videos whyr.pl/youtube/

4 Data Visualizations Hackathon

On the day before the conference we organized the free Data Visualizations Hackathon. It was a great opportunity for networking and exchange of experiences between data scientists that use different programming languages. The challenge was based on the data from Google Places API, which allows to search for places in a particular area. Thanks to this API we gathered data related to places in Warsaw, their working hours and occupancy. Based on this source of data participants, divided into 10 teams, were asked to prepare useful business application powered data visualizations solutions and techniques.

5 Pre-meetings

graphic without alt text
Figure 4: Locations and dates of the main Why R? 2019 conference and Why R?-branded pre-meetings.

In 2019, Why R? 2019 was preceded by fourteen pre-meetings in eight countries. The purpose of those meetings was to provide the space for professional networking and knowledge exchange for practitioners and students, from the area of statistical machine learning, programming, optimization and data science. The Why R? Foundation supported organisation of pre-meetings financially and/or by sending speakers.

The organisation of pre-meetings would not be possible without the wonderful support of local R communities. Aside from the promotion of Why R? we had a great opportunity to interact with other R enthusiasts.

6 Workshops

graphic without alt text
Figure 5: Why R? 2019 workshops.

Why R? 2019 conference had a wide portfolio of workshops that are listed below. One can find materials from workshops at this GitHub repository github.com/WhyR2019/workshops

7 Invited talks

The invited talks topics included domain knowledge from statistics, computer science, natural sciences and economics. The speakers list presents as follows:

Marvin Wright

Random forests used to be everywhere, from Microsoft Kinect to meteorology, but their popularity considerably dropped with the advent of deep learning. During his keynote talk at Why R? 2019 Marvin R. Wright has shown that random forests still can be used in machine learning routines, making the whole process time- and cost-efficient.

Implementing a real-life machine learning solution is not only about the best performance. Marvin has shown that considering trade-off between performance and costs of the analysis, random forests are still unbeatable. Aside from the methodological background, Marvin has given an overview of random forest implementations in R (Wright and Ziegler 2017).

Marvin is a Postdoc at the Leibniz Institute for Prevention Research and Epidemiology in Bremen, Germany. He is the author of several R packages, including the fastest implementation of random forest in R, ranger. He holds a Ph.D. in Biostatistics from the University of Lübeck, supervised by Andreas Ziegler. In the past, Marvin worked at the University of Lübeck. He was a visiting researcher at the University of Copenhagen. Also, he spent some time in the automotive and health insurance industries. His main research interests are interpretable machine learning, genetic epidemiology and survival analysis.

Jakub Nowosad

Jakub Nowosad’s keynote lecture was a great opportunity to learn about geostatistics. Jakub, a co-author of the Geocomputation with R (Lovelace et al. 2019), has focused on tools used to solve real-life problems in spatial data analysis.

The growing importance of spatial data stimulates a rapid evolution of geostatistical methods. Jakub, as the active member of #rspatial community, not only presented cutting-edge tools but also gave his unique insight into the future of the spatial data analysis.

Jakub is an assistant professor in the Department of Geoinformation at the Adam Mickiewicz University in Poznan, Poland. His main research is focused on developing and applying spatial methods in order to expand our understanding of processes and patterns in the environment. He has extensive teaching experience in the fields of spatial analysis, geostatistics, statistics, and machine learning.

Sigrid Keydana

We know how accurate are our predictions but do we really know how certain they are? This question has been answered by Sigrid Keydana (RStudio) during her keynote lecture.

Sigrid has presented tfprobability, an interface to TensorFlow Probability, a tool for obtaining uncertainty estimates from deep neural networks. This exciting tool can be extended beyond a classic deep learning framework into complex hierarchical models.

Sigrid is an Applied Researcher at RStudio. She has experience as a psychologist, software developer and data scientist. She is passionate about exploring the borders of deep learning, especially by helping users to apply the power of deep learning in R.

Steph Locke

Machine learning models find their place in almost every area of our life, influencing things as small as the video recommendations on YouTube or as big as the length and severity of a sentence in a criminal procedure. With the growing importance of machine learning, it becomes more and more important to train models while keeping in mind their ethical consequences.

During her keynote talk at Why R? 2019, Steph Locke showed us ethical concerns about data science. Apart from pointing out existing issues, she has also presented solutions leading to more fair and transparent machine learning models.

Steph is the founder of a consultancy in the UK. Her talks, blog posts, conferences, and business all have one thing in common – they help people get started with data science. Steph holds the Microsoft MVP award for her community contributions. In her spare time, Steph plays board games with her husband and takes copious pictures of her doggos.

Wit Jakuczun

Wit Jakuczun from WLOG Solutions presented his talk about deploying - How to make R great for machine learning in (not only) Enterprise.

For many years software engineers have put enormous effort to develop best practices to deliver stable and maintainable software. How R users can benefit from this experience? Wit answered this question by going through several concepts and tools that are natural for software engineers but are often undervalued by R users.

Paula Brito

During her keynote lecture at Why R? 2019 Paula Brito has given a unique insight into the world of symbolic data, where data points are represented not as single values, but more complex structures, like sets or intervals (Noirhomme‐Fraiture and Brito 2011).

A classical paradigm of data science assumes that categorical variables, like gender or educational stage, are represented as the single value per observation. Paula has shown how to utilize her package, MAINT.Data, to model interval data, using its symbolic representation which leads to more accurate and robust models.

Paula is Associate Professor at the Faculty of Economics of the University of Porto, and member of the Artificial Intelligence and Decision Support Research Group (LIAAD) of INESC TEC, Portugal. Her current research focuses on the analysis of multidimensional complex data, known as symbolic data, for which she develops statistical approaches and multivariate analysis methodologies.

8 Round tables

Round tables are networking-oriented social mixers devoted to connecting people with similar interests. The exact points discussed during the round table and its style depend on the moderators who are shaping out the details, based on the general agenda provided by the Why R? organizers. The organizing committee both selects the topics of round tables and invites appropriate moderators.

Diversity in Data Science

This board aims to inspire members of affinity groups to pursue careers in data science. We hope that this platform for networking will reduce the diversity of R community. Moderator: Barbara Sobkowiak (Women in Machine Learning & Data Science Poland).

Career-planning in Data Science

Participants of WhyR will have a chance to learn from more experienced R enthusiasts about their career paths. Moderator: Kamil Kosiński (PwC).

Teaching Data Science

Practitioners will share their experiences in introducing their students to basic and advanced concepts of data science. Moderator: Patrick Schratz (Ludwig Maximilian University of Munich).

Data Visualizations

Discuss data visualizations good practices and approaches to various presentation challenges. Moderator: Michał Burdukiewicz (Warsaw University of Technology).

Ethics in Data Science

With the increased importance of machine learning, we are becoming more and more concerned about the ethics of data science. Moderator: Steph Locke (Locke Data).

9 Conference organizers

The organizing committee consisted of Klaudia Korniluk, Marcin Kosiński, Michał Burdukiewicz, Jarosław Chilimoniuk, Katarzyna Sidorczuk, Filip Pietluch, Weronika Puchała and Dominik Rafacz.

The quality of the scientific program of the conference was the achievement of Stefan (Brandenburg University of Technology Cottbus-Senftenberg), Piotr Wójcik (University of Warsaw) and Bernd Bischl (Ludwig Maximilian University of Munich).

10 Acknowledgements

We would like to express our gratitude to all our sponsors, the Faculty of Economic Sciences (University of Warsaw), ML in PL Society, local organizers of the pre-meetings and student helpers.

11 Additional information

Why R? 2019 website http://whyr.pl/2019 Corporate sponsors: PwC Poland, iDash, R Consortium, umping Rivers Ltd., Appsilon Data Science, RStudio, Inc., AnalyxGmbH, Pearson IOKI and WLOG Solutions.

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

J. J. Allaire, F. Chollet, RStudio, Google, Y. Tang, D. Falbel, W. V. D. Bijl and M. Studer. Keras: R Interface to ’Keras’. 2018. URL https://CRAN.R-project.org/package=keras [online; last accessed July 6, 2018].
P. Biecek. DALEX: Descriptive mAchine Learning EXplanations. 2018. URL https://CRAN.R-project.org/package=DALEX [online; last accessed July 6, 2018].
M. Binder, F. Pfisterer, B. Bischl, M. Lang and S. Dandl. mlr3pipelines: Preprocessing operators and pipelines for ’mlr3’. 2019. URL https://CRAN.R-project.org/package=mlr3pipelines. R package version 0.1.1.
M. Burdukiewicz, L. E. J. Marta Karas, M. Kosiński, B. Bischl and S. Rödiger. Conference report: Why r? 2018. The R Journal, 10(2): 572–578, 2018. URL https://journal.r-project.org/archive/2018-2/whyR.pdf.
M. Fasiolo, Y. Goude, R. Nedellec and S. N. Wood. Fast calibrated additive quantile regression. 2017. URL https://arxiv.org/abs/1707.03307.
M. Fasiolo, R. Nedellec, Y. Goude and S. N. Wood. Scalable visualisation methods for modern generalized additive models. 2018. URL https://arxiv.org/abs/1809.10632.
A. Gosiewska and P. Biecek. auditor: An R package for model-agnostic visual validation and diagnostic. ArXiv e-prints, 2018. URL http://adsabs.harvard.edu/abs/2018arXiv180907763G. Provided by the SAO/NASA Astrophysics Data System.
R. J. Hijmans, J. van Etten, J. Cheng, M. Mattiuzzi, M. Sumner, J. A. Greenberg, O. P. Lamigueiro, A. Bevan, E. B. Racine, A. Shortridge, et al. Raster: Geographic Data Analysis and Modeling. 2017. URL https://CRAN.R-project.org/package=raster [online; last accessed July 6, 2018].
W. M. Landau. The drake r package: A pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 3(21): 2018. URL https://doi.org/10.21105/joss.00550.
M. Lang, B. Bischl, J. Richter, P. Schratz and M. Binder. mlr3: Machine learning in r - next generation. 2019. URL https://CRAN.R-project.org/package=mlr3. R package version 0.1.4.
R. Lovelace, J. Nowosad and J. Muenchow. Geocomputation with R. CRC Press, 2019. Google-Books-ID: 8W2PDwAAQBAJ.
M. Noirhomme‐Fraiture and P. Brito. Far beyond the classical data models: Symbolic data analysis. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(2): 157–170, 2011. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.10112 [online; last accessed December 1, 2019].
E. Pebesma. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10(1): 439–446, 2018. URL https://doi.org/10.32614/RJ-2018-009.
M. Tennekes. tmap: Thematic maps in R. Journal of Statistical Software, 84(6): 1–39, 2018. DOI 10.18637/jss.v084.i06.
S. N. Wood. Generalized additive models: An introduction with r, second edition. CRC Press, 2017. URL https://books.google.dk/books?id=JTkkDwAAQBAJ.
M. N. Wright and A. Ziegler. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1): 2017. URL http://arxiv.org/abs/1508.04409 [online; last accessed December 1, 2019]. arXiv: 1508.04409.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Burdukiewicz, et al., "Conference Report: Why R? 2019", The R Journal, 2020

BibTeX citation

@article{RJ-2020-1-whyR,
  author = {Burdukiewicz, Michał and Pietluch, Filip and Chilimoniuk, Jarosław and Sidorczuk, Katarzyna and Rafacz, Dominik and Jessen, Leon Eyrich and Rödiger, Stefan and Kosiński, Marcin and Wójcik, Piotr},
  title = {Conference Report: Why R? 2019},
  journal = {The R Journal},
  year = {2020},
  note = {https://rjournal.github.io/},
  volume = {12},
  issue = {1},
  issn = {2073-4859},
  pages = {484-494}
}