Python Programming for Data Science

Installing Python

It is recommended to install Python, IDLE, Spyder, JupyterLab and VSCode using Mambaforge which uses the mamba package manager and the conda-forge community channel. The mamba package manager addresses many previous installation issues encountered with the conda package manager used in Anaconda or Miniconda.


It is useful to have a basic understanding of the Markdown syntax so you can write detailed notes in your JupyterLab interactive Python notebooks as you learn.

Basic DataTypes (str, int, float and bool)

Procedural Programming

This guide is a beginner guide and will look at the inbuilt Python programming language and the concept of basic procedural programming. Procedural programming takes place line by line in the order specified for example within a script file.

Code Blocks and Debugging

So far we have only looked at procedural programming where we executed every line, line by lines. It is now worthwhile understanding the concept of code blocks. The Spyder 5 Debugger is used in this guide to visualize how these code blocks operate. if, elif, else code blocks can be used to execute code dependant on a condition. A for loop code block can be used to repeat a block of code over an iterable object and a while loop can be used to repeat code while a condition is satisfied. Functions can be used to partmentalize code, particularly code that is going to be used several times. We also discuss how functions have their own local environment (namespace). Finally we end up discussing the try, except, except, finally code blocks which are used for error handling.

Object Orientated Programming (OOP)

Python is an Object Orientated Programming (OOP) language where everything we interact with is an object… Each object has a class, which initially can be conceptualised as an abstract blueprint which defines how to create a new object and outlines the properties and functionality behind an object. The properties can be thought of as data belonging to the object and are known as attributes. The functionality can be thought of as functions belonging to the object known as methods. We use the dot syntax to access an attribute or method from an instance of an object. because methods are functions they have to be called using parenthesis (which enclose any mandatory positional or optional keyword input argument). Object Orientated Programming is used all over Python and this beginner tutorial into OOP using the inbuilt Python Programming Language will help you understand the workings behind commonly used Python Libraries.


The numpy Library

Pythons inbuilt data structures such as lists, tuples and dicts are not optimised for numeric operations. For numeric operation we should instead use the numpy library which is based around the ndarray data structure. numpy is an abbreviation for numeric Python. numpy should be considered as the primary data science library as most other data science libraries build upon numpy. Having a basic understanding of numpy will help when it comes to looking at the other data science libraries particularly pandas.

The pandas Library

The pandas library is built around three data structures. The index is normally a numpy arange array object or sometimes a list of strings. The series is essentially a numpy array where each value is tied to a corresponding index and the series has a series name. The series can be conceptualised as a column with each value in the series shown to the right hand side of its corresponding index value. Finally there is the dataframe which is a collection of series which all share a common index. The dataframe conceptually is analogous to an Excel spreadsheet and every operation to manipulate data that can be done in Excel can be done programmatically in pandas. The dataframe is one of the most commonly used data structures within data science.

The matplotlib Library

This guide will look at the use of matplotlib, which is an abbreviation for the matrix plotting library i.e. is a library which plots data from numpy ndarrays. When using matplotlib, typically the pyplot module is used, an abbreviation for Python plot. This guide will first explore the use of pyplot via procedural programming and then look at using object orientated programming (OOP) which increases flexibility. This guide looks at common 2D plots such as a line plot, scatterplot, bar graph, histogram, pie chart, boxplot, violinplot and 3D plots such as contour and surface plots.

The seaborn Library

This guide will look at the use of seaborn, which is a wrapper around the matplotlib plotting library optimised for the pandas dataframe data structure. seaborn includes functions to set a consistent style and palette across plots. The seaborn plotting library includes a number of plots. These are output either in the matplotlib AxesSubplot (axes level) or the seaborn FacetGrid (figure level). seaborn splits data in dataframes using categories in categorical series to give multiple lines for example in a line plot.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.