PyConZA 2018

Bayesian Analysis in Python: A Starter Kit

Bayesian techniques present a compelling alternative to the frequentist view of statistics, providing a flexible approach to extracting a swathe of meaningful information from your data. The learning curve is somewhat steep, but the benefits of adding Bayesian techniques to your tool suite are enormous!

What are the bare essentials that you need to know to start applying Bayesian techniques? This talk will provide an entry level discussion covering the following topics:

What can Bayes do for me? (A brief introduction to Bayesian methods)
Understanding Markov Chain Monte Carlo. (MCMC is what happens behind the scenes)
What is Stan? (Writing models in Stan)
Using Stan in Python. (The PyStan package)

The talk will be peppered with useful tips for dealing with the initial challenges of using Stan with Python.

Introduction to Python for Data Science, Part 1

This is the first half of the 2-session tutorial

Python is a popular platform for doing Data Science. The two dominant libraries, pandas and sklearn, provide extensive functionality for data preparation, data manipulation and Machine Learning. This workshop will provide an introduction to using these libraries.

Specifically we’ll cover the following topics:

What is Data Science?
Grabbing data from various sources
Working with Series and DataFrame objects
Dealing with funky data (missing data and outliers)
Overview of Machine Learning
Keeping it simple using Nearest Neighbours
Capturing a trend: LinearRegression
Predicting categories: DecisionTreeClassifier
Binary outcomes: LogisticRegression
Using Pipeline to streamline your workflow
Cross Validation

The workshop will be intensely hands on, so you will definitely need a laptop. Instructions for getting everything set up will be provided prior to the workshop.

No prior knowledge of Data Science or Machine Learning is assumed, although it will be helpful if you have worked with a spreadsheet before and are moderately competent with basic Python.

We will work with a diverse selection of data sets and perform a variety of analyses. Along the way we’ll build and submit an entry to a Kaggle competition. By the end of the day you will be functionally competent to venture forth on your own Data Science projects.

Setup instructions

Please ensure that you have the following installed and tested:

Python 3
Jupyter
Modules: numpy, pandas, scipy, matplotlib and sklearn.

Two easy ways to get all of the above are:

install Anaconda or
use datawookie/jupyterhub Docker image.

Introduction to Python for Data Science, Part 2

This is the second half of the 2-session tutorial.