3.1 Tools Used
The problem itself was solved using the python language, making use of established packages. From the scipy ecosystem: numpy, a library for multidimensional arrays and vectorized operations on them (Oliphant 2006); scipy, a library for scientific computing (McKinney 2010); pandas, a library for data analysis (McKinney 2011); matplotlib, for high-quality plots (Hunter 2007). Furthermore: the machine learning libraryscikit-learn (Pedregosa et al. 2011) and the high-level data visualization library seaborn (Waskom et al. 2014).
Calculations were run in the cloud on a linux virtual machine5.
Except for the development of a helper package, most programming was performed in interactive Jupyter notebooks (Kluyver et al. 2016).
The report was written in rmarkdown (Allaire et al. 2018) using knitr (Xie 2015) and bookdown (Xie 2016) to render the document into several output formats.
All work was tracked in version control.
For details, refer to: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview, accessed on 5.6.2019↩