# matplotlib

This guide will look at the use of matplotlib, which is an abbreviation for the matrix plotting library. When using matplotlib, typically the pyplot module is used, an abbreviation for Python plot. Before getting started with matplotlib you should be comfortable with the ndarray data structure as these will be used to store the data that is to be plotted.

This guide will first explore the basics of pyplot via procedural programming and then look at further customisation using object orientated programming (OOP). This guide will use the Spyder 5 IDE as it has a Variable Explorer which fully supports matplotlib.

It is recommended to make sure that you are comfortable with the Python Programming language including functional programming and object-orientated programming. You should also have familiarity with the numpy library and the creation of basic ndarrays. Use of matplotlib builds upon this skillset. If you are a beginner, please see my other Python guides first:

## Automatic vs Inline Plotting

By default the plot displays as a static image within the Spyders plot pane.

We can change the plot backend by going to Tools → Preferences:

Then Ipython console and graphics. We can change the backend from Inline to Automatic. Then select Apply:

Once we reset our kernel plots will instead display as a series of interactive figures within their own respective windows:

This guide will use Automatic plotting so we can examine what is going on as we plot.

## Importing the Data Science Libraries

We need to first import our data science libraries, numpy, pandas which we need to use to create data structures and the pyplot module of the matplotlib library which we will use to plot.

We will run this code cell in the Kernel, so we can access code-completion options for these libraries:

#%% data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Procedural Programming

Once this is loaded, we can view the code completion for the pyplot module when we type in:

plt.

In procedural based programming, functions are called from the pyplot module. If we select the Help Tab in Spyder and search for:

plt.plotting

We are presented with a table giving the name and description of each commonly used plotting function:

We can also view this documentation in the console using:

? plt.plotting

Let's have a look at a selection of these:

Let's create some data:

xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=10)
ydata = np.sin(xdata)

This can be viewed in the variable explorer:

We have chosen only a small number of points num=10 so it is easier to visualise:

Let's have a look at using the first 7 of these functions. We will have a look at these by first glancing through their docstring and then inputting them line by line in the console, watching how the plot gets constructed:

plt.figure(num=1, figsize=(4,3), dpi=200)

Note we have assigned num to an integer, this gives Figure 1. If we hadn't assigned this keyword argument to a number, the next unused number which would have also been 1 would have been used. If we had already created a Figure 1. Using num=1 would have selected the existing Figure 1.

We have set figsize to a tuple of (xratio, yratio) to (4, 3). For a 100 % dpi this would have given a 400 dots by 300 dots figure. We have however set the dots per inch dpi to 200 which instead gives us a figure that is 800 by 600 dots. The figsize and dpi will take on default values if not specified.

We see in the Console that a Figure object is created.

Let's now use plt.axes, to add axes to the figure:

plt.axes()

We see that with no keyword input arguments, a single axes, spanning the dimensions of the Figure are added:

This is another object known as an AxesSubplot.

Now let's add a plot. The brief docstring isn't so descriptive.

*args mean that the function takes a variable number of positional input arguments. We will use the simplest form of two and in this case these correspond to the x and y data respectively which must be numpy arrays of equal size. **kwargs mean we have multiple keyword input arguments, we will leave all of these as their default for just now:

plt.plot(xdata, ydata)

We see a line display on the AxesSubplot and the dimensions of the AxesSubplot automatically scale for the data:

We see that the line is a Lines2D object:

We can use the functions xlabel, ylabel and title to add an xlabel, ylabel and title to the AxesSubplot:

plt.xlabel("x")
plt.ylabel("y")
plt.title("y=sin(x)")

The title displays correctly but the xlabel and the ylabel are truncated off the edge of the figure.

We can see that these are all Text objects:

Sometimes to display these correctly, we need to configure a tight layout:

plt.tight_layout()

Let's have a look at using some of the functions to customise the Axes and to add a grid:

#%% axes customisations
plt.xlim(left=-6, right=6)
plt.ylim(bottom=-1.5, top=1.5)
plt.xticks(ticks=[-5, 0, 5])
plt.yticks(ticks=[-1, 0, 1])
#%% grid
plt.minorticks_on()
plt.grid(which="both")
#%% tight layout
plt.tight_layout()

Let's now create more data.

xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=10)
ydata = np.sin(xdata)
ydata2 = np.cos(xdata)

To each plot we will add a label and when we call the function plt.legend() it will automatically use these legends:

plt.plot(xdata, ydata, label="y=sin(x)")
plt.plot(xdata, ydata2, label="y=cos(x)")
plt.legend()

plt.title("mathematical functions")

By default matplotlib places the legend somewhere within the AxesSubplot where it minimises overlap with data. The position can be changed with the keyword input argument to a string "upper right", "upper left", "lower left", "lower right", "right", "center left", "center right", "lower center", "upper center" or "center" for example:

plt.legend(loc="center")

We can change the position of the legend using the keyword input argument bbox_to_anchor and assigning to a tuple. The co-ordinates are normalised with respect to the figure. 0,0 is the left,bottom and 1,1 is the right,top. For example:

plt.legend(bbox_to_anchor=(1.1, 0.5))

Let's now have a look at the exponential function:

#%% data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% generate data
xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=10)
ydata = np.sin(xdata)
ydata2 = np.cos(xdata)
ydata3 = np.exp(xdata)
#%% plot basics
plt.figure(num=1, figsize=(4,3), dpi=200)
plt.axes()
plt.plot(xdata, ydata3)
plt.xlabel("x")
plt.ylabel("y")
plt.title("mathematical functions")
#%% grid
plt.minorticks_on()
plt.grid(which="both")
#%% tight layout
plt.tight_layout()


By default the Axes are in linear. These can be toggled from "linear" to "log" using the function xscale and yscale respectively. Using a log axes with an exponential function will make it easier to visualise the small changes at the beginning:

plt.yscale("log")

And the log of an exponential is a straight line.

We can use this as an example of creating a Figure with two AxesSubplots. Instead of using:

plt.axes()

We use:

plt.subplot(nrows, ncols, num)

For example for one row and two columns:

plt.figure(num=1, figsize=(4,3), dpi=200)
plt.subplot(1, 2, 1)
plt.subplot(1, 2, 2)

We can create a Figure with two Axes. The first AxesSubplot will be configured to show the exponential in linear and the second AxesSubplot will be configured to show the exponential in log:

#%% data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% generate data
xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=10)
ydata3 = np.exp(xdata)
#%% plot basics
plt.figure(num=1, figsize=(4,3), dpi=200)
#%% subplot 1
plt.subplot(1,2,1)
plt.plot(xdata, ydata3)
plt.xlabel("x")
plt.ylabel("y")
plt.yscale("linear")
#%% subplot 2
plt.subplot(1,2,2)
plt.plot(xdata, ydata3)
plt.xlabel("x")
plt.ylabel("y")
plt.yscale("log")
plt.tight_layout()

## Object Orientated Programming

With procedural programming we created a Figure, AxesSubplot and then Line2D object using functions directly from pyplot. When we created these we did not assign them to object names. An analogy is:

5

vs:

num = 5

In the later case, we can view the object num on the variable explorer and we can later interact with it:

plt.figure(num=1, figsize=(4,3), dpi=200)

We assign this to an object name:

fig = plt.figure(num=1, figsize=(4,3), dpi=200)

fig now displays in the variable explorer and a number of methods can be called from the figure:

The Figure method add_axes is roughly equivalent to plt.axes and the figure method subplot is roughly equivalent to plt.subplot. When these are called using object orientated programming, the AxesSubplot is typically assigned to an object name seen on the variable explorer. Let's create a single AxesSubplot spanning the figure:

ax = fig.add_axes()

Note the AxesSubplot by default spans the entire figure:

We do not see the square box enclosing the AxesSubplot that we see when we use plt.axes. The reason for this is the Figure method add_axes uses a keyword input argument rect which specifies placement of this square box with respect to the figure. rect requires four normalised floats which correspond to (xstart, ystart, xwidth, yheight) and these use normalised values corresponding to the figure. If we do not specify rect, the default values are take as (0, 0, 1, 1). This means the contents of the box of the AxesSubplot spans the entire Figure and as a consequence the box itself and any labels added will be outside the Figure and therefore not shown:

fig = plt.figure(num=1, figsize=(4,3), dpi=200)
ax = fig.add_axes(rect=(0, 0, 1, 1))

We can see what is going on if we use which gives a 0.01 spacing on all four sides:

fig = plt.figure(num=1, figsize=(4,3), dpi=200)
ax = fig.add_axes(rect=(0.01, 0.01, 0.98, 0.98))

More typically: we would start at the bottom left at a normalised Figure co-ordinate (0.2, 0.2) which would give room for an xlabel and a ylabel as well as the xticks, yticks, xtick labels and ytick labels. The xwidth and xheight can be specified as 0.7 respectively which leaves 0.1 at the right and 0.1 at the top respectively:

fig = plt.figure(num=1, figsize=(4,3), dpi=200)
ax = fig.add_axes(rect=(0.2, 0.2, 0.7, 0.7))

The AxesSubplot ax has a number of methods. The most common ones have equivalents to the pyplot functions. Many of the methods have a complementary get and set method to read and write properties respectively, the pyplot function normally has equivalent behaviour to the set method:

For example, the pyplot functions we used earlier:

fig = plt.figure(num=1, figsize=(4,3), dpi=200)
lines = plt.plot(xdata, ydata)

The Lines2D now displays on the AxesSubplot:

In the variable explorer, the variable lines is a list of Lines2D objects. The single line can be accessed using index 0:

The AxesSubplot is available as an attribute from the Figure and Lines2D object:

ax == fig.axes[0]
x == lines[0].axes

Recall that lines is a list so we need to select index 0 to get the line of interest (the only line in this case). As a Figure can contain multiple AxesSubplots, the attribute axes is a list and we need to once again select index 0 (the only AxesSubplot in this list). A Line2D can only belong to one AxesSubplot so this attribute is the axes directly.

Likewise the figure can be accessed from the AxesSubplot and Line2D as the attribute figure. In either case these can only belong to one figure so the attribute is singular:

fig == ax.figure
fig == lines[0].figure

Finally the Line2D can be accessed from the AxesSubplot as an attribute lines followed by the correct index. This is a list as an AxesSubplot can hold multiple Line2D:

When a Figure or AxesSubplot are created, they are automatically selected. Any function from the pyplot will correspond to the Figure or AxesSubplot selected. The functions gcf and gca can be used to get the current Figure or current AxesSubplot so they can be assigned to an object name:

plt.figure(num=1, figsize=(4,3), dpi=200)
plt.axes()
fig = plt.gcf()
ax = plt.gca()

A previous Figure created by procedural programming can be reselected using the figure number:

# create new figure using number 1
plt.figure(num=1)
# create new figure using the next unused number, in this case 2
plt.figure()
# reselect figure 1 and assign to an object name
fig = plt.figure(1)

With the above in mind it is worthwhile, examining some of the pyplot functions and comparing them to Figure methods and AxesSubplot methods:

Note some of the methods such as legend can be called from both the Figure and AxesSubplot as it gives a legend for every AxesSubplot on a Figure or a legend for a single AxesSubplot respectively.

## Subplots

Instead of creating a Figure and AxesSubplot separately:

fig = plt.figure(num=1, figsize=(4,3), dpi=200)
ax = fig.add_axes(rect=(0.2,0.2,0.7,0.7))

It is more common to use the pyplot function subplots which outputs an ndarray, the 0th index is the Figure and the 1st object is the AxesSubplot or list of AxesSubplots.

fig, ax = plt.subplots()

By default if all keyword input arguments are left at their default, a single AxesSubplot displays and the AxesSubplot is automatically placed on the Figure allowing spacing for the xticks, yticks, xtick labels, ytick labels, xlabels, ylabels and title:

The keyword input arguments nrows and ncols can be assigned to create a Figure with multiple AxesSubplots:

fig, ax = plt.subplots(nrows=2, ncols=2)

To select the AxesSubplot we must index into the ndarray ax. For example:

ax[0, 1].plot(xdata, ydata)

This function also takes in the keyword input arguments from the figure function, to set the figure num, figsize and dpi:

fig, ax = plt.subplots(nrows=2, ncols=2, num=10, figsize=(4,3), dpi=200)

We can also use subplots_mosaic which has a keyword input argument as a rectangular list of equally sized lists of string values:

fig = plt.figure()
ax = fig.subplot_mosaic([["plot", "semilogy"],
["step", "semilogx"]])

ax is a Python dictionary and each AxesSubplot can be accessed by indexing using the name of the key:

ax["plot"].plot(xdata, ydata3)

The semilogy plot is the same as a standard plot with the yscale automatically set to "log":

ax["semilogy"].semilogy(xdata, ydata3)

The step plot steps between neighbouring data points instead of drawing a straight line between them:

ax["step"].step(xdata, ydata3)

## Color

Color (US spelling) is an attribute found widely across matplotlib. In matplotlib we can use the following color designations:

String names and 1 letter abbreviations exist for common colors. Other colors can be designated using the normalised (r, g, b) tuple or more commonly the hexadecimal value. Under the hood, the color of a pixel on a computer screen is controlled by three Light Emitting Diodes (LEDs), a red LED, a green LED and a blue LED. The intensity of these three LEDS are seen by three types of sensors in our eyes and our brain maps the color-ratio seen by these sensors as a color.

fig = plt.figure()
ax = fig.subplot_mosaic([["plot", "semilogy"],
["step", "semilogx"]])
ax["plot"].plot(xdata, ydata3, color="red")
ax["semilogy"].semilogy(xdata, ydata3, color="blue")
ax["step"].step(xdata, ydata3, color="green")

An example of using the hexadecimal values:

fig = plt.figure()
ax = fig.subplot_mosaic([["plot", "semilogy"],
["step", "semilogx"]])
ax["plot"].plot(xdata, ydata3, color="#FFC000")
ax["semilogy"].semilogy(xdata, ydata3, color="#00B050")
ax["step"].step(xdata, ydata3, color="#7030A0")

## Line2D

Matplotlib has a number of different plot types. The following all create a Line2D plot.

As they all generate a Line2D, they all contain keyword input arguments to manipulate the properties of the Line2D object such as the linewidth which is a float and the linestyle.

fig = plt.figure()
ax = fig.subplot_mosaic([["plot", "semilogy"],
["step", "semilogx"]])
lines = ax["plot"].plot(xdata, ydata3, color="#FFC000", linestyle="solid", linewidth=5)
lines2 = ax["semilogy"].semilogy(xdata, ydata3, color="#00B050", linestyle="dashed", linewidth=1)
lines3 = ax["step"].step(xdata, ydata3, color="#7030A0", linestyle="dashdot", linewidth=3)

Lines by default do not have markers to indicate each separate datapoint but these can optionally be turned on using the keyword input argument marker and assigning it to:

fig = plt.figure()
ax = fig.subplot_mosaic([["plot", "semilogy"],
["step", "semilogx"]])
lines = ax["plot"].plot(xdata, ydata3, color="#FFC000", linestyle="dashed", linewidth=1, marker="o")
lines2 = ax["semilogy"].semilogy(xdata, ydata3, color="#00B050", linestyle="dashed", linewidth=1, marker="v")
lines3 = ax["step"].step(xdata, ydata3, color="#7030A0", linestyle="dashed", linewidth=1, marker="s")

If enabled, the marker can have a fillstyle which can be assigned to the keyword input argument markerfillstyle:

markeredgecolor, markerfacecolor and markerfacecoloralt, all use the same color encoding as previously discussed with color (which refers to the line in a Line2D).

The markersize and markeredgewidth use float values, similar to the linewidth.

fig = plt.figure()
ax = fig.subplot_mosaic([["plot", "semilogy"],
["step", "semilogx"]])
lines = ax["plot"].plot(xdata, ydata3, color="#FFC000", linestyle="dashed", linewidth=1,
marker="o", markersize=10, fillstyle="top", markeredgewidth=2,
markeredgecolor="#00B050", markerfacecolor="#7030A0", markerfacecoloralt="#ff0000")
lines2 = ax["semilogy"].semilogy(xdata, ydata3, color="#00B050", linestyle="dashed", linewidth=1,
marker="v", markersize=10, fillstyle="top", markeredgewidth=2,
markeredgecolor="#00B050", markerfacecolor="#7030A0", markerfacecoloralt="#ff0000")
lines3 = ax["step"].step(xdata, ydata3, color="#7030A0", linestyle="dashed", linewidth=1,
marker="s", markersize=10, fillstyle="top", markeredgewidth=2,
markeredgecolor="#00B050", markerfacecolor="#7030A0", markerfacecoloralt="#ff0000")

If we examine the variable explorer, we can access the line in the plot using:

lines[0]
ax["plot"].lines[0]

We can use the pyplot function getp, to get the properties of the Line2D object:

plt.getp(lines[0])

These are additional keyword input arguments of the plot function or other associated functions that generate a Line2D and can be amended using the associated pyplot function setp:

plt.setp(lines[0], linewidth=5)

Let's recreate xdata, ydata and ydata2 and have a look at a single AxesSubplot:

xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=10)
ydata = np.sin(xdata)
ydata2 = np.cos(xdata)

Let's have a look at using variable numbers of input arguments with the plot function. Instead of supplying *args as just x and y, we can supply x1, y1, x2, y2. Note that when we do so the default styling is used for each line added to the AxesSubplot. We cannot provide labels for each of the lines in the plot function so must use the keyword argument labels in the AxesSubplot method legend:

fig, ax = plt.subplots()
lines = ax.plot(xdata, ydata, xdata, ydata2)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend(labels=["y=sin(x)", "y=cos(x)"], bbox_to_anchor=(1.05, 0.5))
fig.tight_layout()

In the variable explore we see that lines is not a list with two AxesSubplots:

Recall we can use the pyplot functions getp and setp to get and set properties of an object such as a Line2D. If we use:

plt.setp(lines[0])

We can see from the ydata that this matches our variable ydata. The default color in hexadecimal is "#1f77b4", the linestyle is solid "-" and the linewidth is 1.5 by default.

plt.getp(lines[1])

We would see that ydata matches our variable ydata2. The default color in hexadecimal is "#ff7f0e" and once again that the linestyle is solid "-" and linewidth is 1.5 by default.

The labels are "_child0" and "_child1". We can use the pyplot function setp to change these properties. Note that we have provided the labels using the pyplot function setp keyowrd input argument label and therefore do not need to use the AxesSubplot legend keyword input argument labels:

fig, ax = plt.subplots()
lines = ax.plot(xdata, ydata, xdata, ydata2)
plt.setp(lines[0], label="y=sin(x)", linewidth=3, linestyle=":")
plt.setp(lines[1], label="y=cos(x)", linewidth=3, linestyle=":")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend(bbox_to_anchor=(1.05, 0.5))
fig.tight_layout()

Going back to the plot function with each x, y pair we can provide an optional fmt string, for example x, y, fmt or x1, y1, fmt1, x2, y2, fmt2. The fmt string combines the one letter abbreviation for color, the 1-2 letter abbreviation for the linestyle and the 1 letter abbreviation for symbol. For example:

fig, ax = plt.subplots()
lines = ax.plot(xdata, ydata, "r:o", xdata, ydata2, "b--x")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend(labels=["y=sin(x)", "y=cos(x)"], bbox_to_anchor=(1.05, 0.5))
fig.tight_layout()

The functions vline, hline and axline create a vertical line, horizontal line and infinitely long arbitrary line along an axes. This functions all have slightly different positional input arguments but share the same keyword input arguments as the plot function as they all generate a Line2D object:

fig, ax = plt.subplots()
lines = ax.plot(xdata, ydata, xdata, ydata2)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend(labels=["y=sin(x)", "y=cos(x)"], bbox_to_anchor=(1.05, 0.5))
vline = ax.axvline(x=0, ymin=0.25, ymax=0.75, linestyle=":", color="#606060")
hline = ax.axhline(y=0, xmin=0, xmax=1, linestyle=":", color="#606060")
axline = ax.axline(xy1=(-6,-1), xy2=(6,1), linestyle=":", color="#606060")
fig.tight_layout()

Note that in the case of vline, x is in the same dimension as the data, while ymin and ymax are normalised floats where 0 is the bottom end of the AxesSubplot and 1 is the top end of the AxesSubplot. Conversely for hline, y is in the same dimension as the data, while xmin and xmax are normalised floats where 0 is the left end of the AxesSubplot and 1 is the right end of the AxesSubplot. In axline the two datapoints xy1 and xy2 are in the same dimension as the data.

If we have a look at the AxesSubplot attribute lines:

ax.lines

We see that it is a list of length 5. The index corresponds to the order of the line added i.e. the blue line from plot is index 0, the orange line from plot is index 1, the vline is index 2, the hline is index 3 and the axline is index 4.

The grid can be thought of as a collection of lines. It has the keyword input arguments visible which can be assigned to a bool True or False (True is the default value if the AxesSubplot method grid is called). It also has the keyword input argument which, which can be used to select the "major" axis, "minor" axis or "both" axes. It also has the keyword input argument axis which can be set to select the "x" axis, "y" axis, or "both" axes. The remaining keyword input arguments match those of a Line2D:

grid = ax.grid(visible=True, which="major", axis="both", color="#606060", linestyle=":")

Note that the AxesSubplot method grid has no return value and therefore the object name grid is NoneType on the variable explorer:

We can configure a grid with major and minor gridlines:

ax.grid(which="major", axis="both", color="#606060", linestyle="-")
ax.minorticks_on()
ax.grid(which="minor", axis="both", color="#606060", linestyle=":", linewidth=0.5)

There are other objects known as LineCollections, somewhat analogous to the grid:

The keyword input arguments in hlines and vlines use dimensions analogous to the data. These LineCollections have similar keyword input arguments to a Line2D but only have lines and do not have any marker options associated with the lines:

vlines = ax.vlines(x=(-6, 0, 6), ymin=-0.5, ymax=0.5, linestyle=":", color="#606060")
hlines = ax.hlines(y=(-0.75, 0.75), xmin=-2, xmax=2, linestyle=":", color="#606060") 

If we have a look in the variable explorer, we can see that hlines and vlines are LineCollection objects and not a singular Line2D:

These are found in the collections attribute of the AxesSubplot. In this example

ax.lines

corresponds to the two lines created using the plot function. And:

ax.collections

corresponds to the two lines collections created by vlines and hlines respectively.

If we have a look at the properties of hlines using the pyplot function getp:

plt.getp(hlines)

We see that there is singular and plural properties for example color and colors which can be used to set the color of all lines in the line collection or colors which can be used to set each line to an individual color.

## PathCollection

Instead of using a line with markers, we can focus on markers directly using a scatter plot. The scatter plot function creates a PathCollection object:

Without using the optional keyword input arguments in the scatter function, we are using more or less identical syntax to the plot function. We can use identical lines of code to create the Figure, AxesSubplot and to customise the AxesSubplot:

fig, ax = plt.subplots()
scatter = ax.scatter(xdata, ydata)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.grid(which="major", axis="both", color="#606060", linestyle="-")
ax.minorticks_on()
ax.grid(which="minor", axis="both", color="#606060", linestyle=":", linewidth=0.5)
fig.tight_layout()

Notice on the variable explorer, scatter is a PathCollection:

It is found under the collections attribute of the AxesSubplot:

ax.collections[0] == scatter

We can use the pyplot function getp to get the properties of the PathCollection object:

plt.getp(scatter)

Similar to the LineCollection, we see that there is the potential for singular and plural values. Most notably in facecolor/facecolors, edgecolor/edgecolors, sizes. In the case of a PathCollection the linewidth/linewidths and linestyle/linestyles refer to the lines around each marker. If we have a look at the brief docstring that pops up when inputting the pyplot function scatter we see there are the keyword input arguments s and c. These are used for setting a singular size or color across all the datapoints.

For example:

scatter = ax.scatter(xdata, ydata, s=50, c="#ff0000")

It is less usual to use individual sizes and colors for each datapoint however we can demonstrate this using the numpy random module:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=10)
ydata = np.sin(xdata)
rsizes = np.random.randint(25, 150, 10)
rlinewidths = 0.5*np.random.randint(1, 10, 10)
rfacecolors = np.array([np.random.rand(3) for i in range(10)])
redgecolors = np.array([np.random.rand(3) for i in range(10)])
#%% plot basics
fig, ax = plt.subplots()
scatter = ax.scatter(xdata, ydata, sizes=rsizes, facecolors=rfacecolors, edgecolors=redgecolors, linewidths=rlinewidths)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.grid(which="major", axis="both", color="#606060", linestyle="-")
ax.minorticks_on()
ax.grid(which="minor", axis="both", color="#606060", linestyle=":", linewidth=0.5)
fig.tight_layout()

When there are a lot of overlapping datapoints, the alpha parameter is normally used to give the datapoints a degree of transparency. This can be used to distinguish if there are a lot of datapoints in an area for example if we switch the y data to exp(x) and up the datapoints to 500 and set an alpha of 0.2:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=500)
ydata3 = np.exp(xdata)
rsizes = np.random.randint(25, 150, 500)
rlinewidths = 0.5*np.random.randint(1, 10, 500)
rfacecolors = np.array([np.random.rand(3) for i in range(500)])
redgecolors = np.array([np.random.rand(3) for i in range(500)])
#%% plot basics
fig, ax = plt.subplots()
scatter = ax.scatter(xdata, ydata3, sizes=rsizes, facecolors=rfacecolors, edgecolors=redgecolors, linewidths=rlinewidths, alpha=0.2)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.grid(which="major", axis="both", color="#606060", linestyle="-")
ax.minorticks_on()
ax.grid(which="minor", axis="both", color="#606060", linestyle=":", linewidth=0.5)
fig.tight_layout()


We can see that the lower values of y are more densely populated than the higher values of y.

This is more clear if we use a constant color and size:

fig, ax = plt.subplots()
scatter = ax.scatter(xdata, ydata3, s=50, c="#ff0000", alpha=0.2)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.grid(which="major", axis="both", color="#606060", linestyle="-")
ax.minorticks_on()
ax.grid(which="minor", axis="both", color="#606060", linestyle=":", linewidth=0.5)
fig.tight_layout()

## Patches

There are a number of plot types which use patches.

Let's demonstrate with a bar graph for example. In a bar graph, every x value has an associated height:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.arange(start=0, stop=5, step=1)
ydata = random.randint(1, 10, 5)

#%% plot basics
fig, ax = plt.subplots()
bar = ax.bar(x=xdata, height=ydata)
ax.set_xlabel("x")
ax.set_ylabel("random value")
fig.tight_layout()

In the variable explore we see that bar is a BarContainer:

If we have a look at one of these bars:

bar[0]

We see that it is patch of the type Rectangle:

And if we have a look at its properties using the pyplot function getp:

plt.getp(bar[0])

We see the properties x, y, xy, width and height which define the rectangle. However we also see alpha, edgecolor, facecolor, linewidth, linestyle and hatch which define the visual aspects of each Rectangle patch. These visual aspects are common to all patches of differing shapes.

edgecolor, linewidth and linestyle apply to the edge of the patch. hatch, alpha and facecolor apply to the patch face. We have seen how to use colors, alpha, linewidth and linestyle before. Let's have a look at hatch in more detail. The hatch is essentially the patches style. Like linestyle it uses a series of strings to specify different patterns. We have 10 single patterns, 10 double patterns and 10 mixed patterns:

hatch_styles = ["/", "\\", "|", "-", "+", "x", "o", "O", ".", "*"]
hatch_styles2 = ["//", "\\\\", "||", "--", "++", "xx", "oo", "OO", "..", "**"]
hatch_styles3 = ["/o", "\\|", "|*", "-\\", "+o", "x*", "o-", "O|", "O.", "*-"]

Recall that in Python \ indicates we are going to insert an escape character in a string. In "\\" the first \ indicates we are going to insert an escape character and the second "\" indicates that the escape character to be inserted is \.

We can have a look at these hatch styles on a stacked bar plot. The stacked bar plot is created using multiple bar plots. The bars to be stacked use the previous bars as bottom values. This code will use the pyplot method setp to change the properties of all Rectangles in a BarCollection and also use setp within a for loop to change the individual properties of each individual Rectangle:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.arange(start=0, stop=10, step=1)
ydata = random.randint(1, 10, 10)
ydata2 = random.randint(1, 10, 10)
ydata3 = random.randint(1, 10, 10)

#%% plot basics
fig, ax = plt.subplots()
bar = ax.bar(x=xdata, height=ydata)
bar2 = ax.bar(x=xdata, height=ydata2, bottom=ydata)
bar3 = ax.bar(x=xdata, height=ydata2, bottom=ydata+ydata2)
hatch_styles = ["/", "\\", "|", "-", "+", "x", "o", "O", ".", "*"]
hatch_styles2 = ["//", "\\\\", "||", "--", "++", "xx", "oo", "OO", "..", "**"]
hatch_styles3 = ["/o", "\\|", "|*", "-\\", "+o", "x*", "o-", "O|", "O.", "*-"]
plt.setp(bar, linestyle="-", linewidth=2, edgecolor="#000000", facecolor="#00b050")
for idx in range(10):
plt.setp(bar[idx], hatch=hatch_styles[idx])
plt.setp(bar2, linestyle="-", linewidth=2, edgecolor="#000000", facecolor="#7030a0")
for idx in range(10):
plt.setp(bar2[idx], hatch=hatch_styles2[idx])
plt.setp(bar3, linestyle="-", linewidth=2, edgecolor="#000000", facecolor="#ffc000")
for idx in range(10):
plt.setp(bar3[idx], hatch=hatch_styles3[idx])
ax.set_xlabel("x")
ax.set_ylabel("random value")
fig.tight_layout()

The green bar is singular hatch styles, the purple bar is double hatch styles and the orange bar is mixed hatch styles:

A barh is similar to a bar plot however uses y, width and left instead of x, height and bottom:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.arange(start=0, stop=10, step=1)
ydata = random.randint(1, 10, 10)
ydata2 = random.randint(1, 10, 10)
ydata3 = random.randint(1, 10, 10)

#%% plot basics
fig, ax = plt.subplots()
bar = ax.barh(y=xdata, width=ydata)
bar2 = ax.barh(y=xdata, width=ydata2, left=ydata)
bar3 = ax.barh(y=xdata, width=ydata2, left=ydata+ydata2)
hatch_styles = ["/", "\\", "|", "-", "+", "x", "o", "O", ".", "*"]
hatch_styles2 = ["//", "\\\\", "||", "--", "++", "xx", "oo", "OO", "..", "**"]
hatch_styles3 = ["/o", "\\|", "|*", "-\\", "+o", "x*", "o-", "O|", "O.", "*-"]
plt.setp(bar, linestyle="-", linewidth=2, edgecolor="#000000", facecolor="#00b050")
for idx in range(10):
plt.setp(bar[idx], hatch=hatch_styles[idx])
plt.setp(bar2, linestyle="-", linewidth=2, edgecolor="#000000", facecolor="#7030a0")
for idx in range(10):
plt.setp(bar2[idx], hatch=hatch_styles2[idx])
plt.setp(bar3, linestyle="-", linewidth=2, edgecolor="#000000", facecolor="#ffc000")
for idx in range(10):
plt.setp(bar3[idx], hatch=hatch_styles3[idx])
ax.set_xlabel("random value")
ax.set_ylabel("x")

Let's return to our basic bar plot and use the associated AxesSubplot bar_label method to label each Rectangle patch using the height as the label:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.arange(start=0, stop=5, step=1)
ydata = random.randint(1, 10, 5)

#%% plot basics
fig, ax = plt.subplots()
bar = ax.bar(x=xdata, height=ydata)
labels = ax.bar_label(container=bar, labels=ydata)
ax.set_xlabel("x")
ax.set_ylabel("random value")
fig.tight_layout()

Another plot type with patches and labels is a pie plot. A pie Wedge is a non-rectangular patch. The plot function has a positional input argument x. It has keyword arguments explode which can be a list of normalised floats used to offset edge wedge from the centre and labels which can be assigned to a list of strings or numeric values to label each wedge.

The startangle, radius and centre can be changed for the pie plot. For now these will be left at their default value.

Further customisation is available using colors, wedgeprops and textprops which will be left at the default None to show the default settings.

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
xdata = np.arange(start=0, stop=5, step=1)
ydata = random.randint(1, 10, 5)

#%% plot basics
fig, ax = plt.subplots()
pie = ax.pie(x=ydata, explode=None, labels=ydata)
fig.tight_layout()

If we have a look at the variable explorer we see that pie is actually a tuple of two lists. The first list is a list of Wedge objects and the second list is a list of Text objects:

We can use tuple unpacking to assign these to two different variables:

pie_wedges, pie_labels = ax.pie(x=ydata, explode=None, labels=ydata)

If we use the pyplot function getp to have a look at the properties of one of the Wedge patches, we see the same options available as when we looked at a Rectangle patch:

plt.getp(pie_wedges[0])

Note the AxesSubplot pie method has the keyword input argument wedgeprops. wedgeprops is a dictionary and the keys that can be added are the keyword input arguments above in the format of a string.

fig, ax = plt.subplots()
wedgedict = {"linestyle": "-", "linewidth": 0.8, "hatch": "*", "alpha": 0.5, "edgecolor": "#000000"}
pie_wedges, pie_labels = ax.pie(x=ydata, explode=None, labels=ydata, wedgeprops=wedgedict)
fig.tight_layout()

The settings defined in wedgeprops will apply to all wedges:

This is typically used with the pyplot function setp and a for loop to tailor each Wedge:

fig, ax = plt.subplots()
explodes = [0.1, 0, 0, 0, 0]
colors = ["#ff0000", "#00b050", "#7030a0", "#ffc000", "#0070C0"]
hatch_styles = ["*", "o", "|", "x", "//"]
wedgedict = {"linestyle": "-", "linewidth": 0.8, "hatch": "*", "alpha": 0.5, "edgecolor": "#000000"}
pie_wedges, pie_labels = ax.pie(x=ydata, explode=explodes, labels=ydata, wedgeprops=wedgedict)
for idx in range(5):
plt.setp(pie_wedges[idx], facecolor=colors[idx], hatch=hatch_styles[idx])
fig.tight_layout()

We can also see the result of emphasising one of the edges using an explode value of 0.1.

By default Wedges are drawn from a radius of 1 to the centre which is designated as (0, 0) and text labels are placed at 1.1 i.e. outside the wedge. If a radius value below 1 is set in wedgeprops, the wedge will begin drawing from this radius value towards the centre (0,0). If a width value is set, that is lower than 1, the wedge won't draw completely towards the centre. This can be used to create a donut:

fig, ax = plt.subplots()
explodes = [0.1, 0, 0, 0, 0]
colors = ["#ff0000", "#00b050", "#7030a0", "#ffc000", "#0070C0"]
hatch_styles = ["*", "o", "|", "x", "//"]
wedgedict = {"linestyle": "-", "linewidth": 0.8, "hatch": "*", "alpha": 0.5, "edgecolor": "#000000", "width": 0.5, "radius": 0.8}
pie_wedges, pie_labels = ax.pie(x=ydata, explode=explodes, labels=ydata, wedgeprops=wedgedict, labeldistance=0.5)
for idx in range(5):
plt.setp(pie_wedges[idx], facecolor=colors[idx], hatch=hatch_styles[idx])
fig.tight_layout()

If we use:

plt.getp(pie_labels[0])

We see the properties that we have for the Text labels:

In the AxesSubplot pie method there is the keyword input argument textprops. We can create a dictionary using the above as dictionary keys to adjust the Text properties of the labels:

fig, ax = plt.subplots()
explodes = [0.1, 0, 0, 0, 0]
colors = ["#ff0000", "#00b050", "#7030a0", "#ffc000", "#0070C0"]
hatch_styles = ["*", "o", "|", "x", "//"]
wedgedict = {"linestyle": "-", "linewidth": 0.8, "hatch": "*", "alpha": 0.5, "edgecolor": "#000000", "width": 0.5, "radius": 0.8}
textdict = {"fontsize": 20, "color": "#ffffff", "backgroundcolor": "#000000"}
pie_wedges, pie_labels = ax.pie(x=ydata, explode=explodes, labels=ydata, wedgeprops=wedgedict, textprops=textdict, labeldistance=0.5)
for idx in range(5):
plt.setp(pie_wedges[idx], facecolor=colors[idx], hatch=hatch_styles[idx])
fig.tight_layout()

A histogram splits a distribution of 1 dimensional data into bins, displaying it as a bar graph:

Let's generate random data using the rand and randn distributions:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
ydata = random.rand(100)
ydata2 = random.randn(100)

Now let's have a look at a hist of the random and normal distributions:

#%% plot basics
fig, ax = plt.subplots(nrows=2, ncols=1)
histrand = ax[0].hist(ydata, label="rand")
histrandn = ax[1].hist(ydata2, label="randn")
ax[0].set_ylabel("counts")
ax[1].set_ylabel("counts")
ax[1].set_xlabel("bins")
fig.legend(bbox_to_anchor=(1.0, 1.0))
fig.tight_layout()

If we have a look in the variable explorer we see that histrand is a tuple with 3 elements:

If we have a look at each index of each tuple, we see that index 0 is the height of each bar, index 1 gives the lower and upper boundaries of each bar, note that there are 11 values corresponding to the lower and upper boundaries of 10 bars and index 2 is our BarContainer.

We have seen how to edit the BarContainer before using setp:

#%% plot basics
fig, ax = plt.subplots(nrows=2, ncols=1)
histrand = ax[0].hist(ydata, label="rand")
plt.setp(histrand[2], facecolor="#0070C0", edgecolor="#000000", linewidth=1, hatch="*")
histrandn = ax[1].hist(ydata2, label="randn")
plt.setp(histrandn[2], facecolor="#00b050", edgecolor="#000000", linewidth=1, hatch="o")
ax[0].set_ylabel("counts")
ax[1].set_ylabel("counts")
ax[1].set_xlabel("bins")
fig.legend(bbox_to_anchor=(1.0, 1.0))
fig.tight_layout()

We can't really see the shapes of the distributions as we have a low number of datapoints and hence a low number of automatically generated bins. Let's sample more data going from 100 to 100000 datapoints:

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
ydata = random.rand(100000)
ydata2 = random.randn(100000)

And now as we have more data let's increase the number of bins to 100:

#%% plot basics
fig, ax = plt.subplots(nrows=2, ncols=1)
histrand = ax[0].hist(ydata, bins=100, label="rand")
plt.setp(histrand[2], facecolor="#0070C0", edgecolor="#000000", linewidth=1, hatch="*")
histrandn = ax[1].hist(ydata2, bins=100, label="randn")
plt.setp(histrandn[2], facecolor="#00b050", edgecolor="#000000", linewidth=1, hatch="o")
ax[0].set_ylabel("counts")
ax[1].set_ylabel("counts")
ax[1].set_xlabel("bins")
fig.legend(bbox_to_anchor=(1.0, 1.0))
fig.tight_layout()

We can now see the shape of the distribution.

## Text and Annotate

We have seen Text objects on a pie plot and annotations on a bar plot. We can also add our own text, annotations or arrows to an AxesSubplot:

To use text, we need an x and y value which uses the same co-ordinate system as the data:

text = ax.text(-2, 2000, 'randn', horizontalalignment='center', verticalalignment='center')

This displays as a Text object in the variable explorer:

And is listed under the texts attribute of the AxesSubplot.

text == ax.texts[0]

We can see what properties we can change for a Text object by using the pyplot function getp:

plt.getp(text)

We have already explored changing some of these using the pie chart. Let's now have a look at AxesSubplot method annotate and we will return to our scatter plot with a small number of points to demonstrate this.

We essentially need text in the form of a string, xy in the form of a tuple denoting the datapoint to be annotated, using the dimensions of the data and xytext in the form of a tuple denoting the xy position. If we want an arrow, we need to specify an arrowprops dictionary with the key "arrowstyle":

annotation = ax.annotate(text="point1", xy=(-6.28, -0), xytext=(-6, 0.5), arrowprops={"arrowstyle": "simple"}, horizontalalignment='center', verticalalignment='center')

Although the Annotation object is text with an arrow it is listed under text:

It can be accessed under the AxesSubplot texts attribute:

annotation == ax.texts[0]

If we use the pyplot function getp to look at its properties we can see that these are very similar to the Text object.

plt.getp(annotation)

There are no methods (at the time of writing) to change arrowprops once an Annotate object has been created.

The AxesSubplot method arrow is generally more difficult to use than the method annotate. An arrow can be drawn without text by setting the text to an empty string:

annotation = ax.annotate(text="", xy=(-6.28, -0), xytext=(-6, 0.5), arrowprops={"arrowstyle": "simple"}, horizontalalignment='center', verticalalignment='center')

If we have a look at the plot above, we can see the majorticklabels of the xaxis with -6, -4, -2, 0, 2, 4 and 6 displayed. If we select the AxesSubplot xaxis attribute and then the method get_majorticklabels we can view these and we can see that these are displayed as Text objects:

ax.xaxis.get_majorticklabels()

If we have a look at the method get_major_ticks we see the underlying array:

ax.xaxis.get_major_ticks()

We can set these to new values using the associated set method set_ticks and assigning it to a numpy array:

ax.xaxis.set_ticks(np.arange(start=-6, stop=9, step=3))

The set_ticklabels method can be used to set the strings of the tick labels to a list of strings of equal size to the number of ticks:

ax.xaxis.set_ticklabels(["-six", "-three", "zero", "three", "six"])

Similar changes can be made for the y axis using the equivalent yaxis methods.

## Latex

Latex strings can be used in matplotlib. These are enclosed in:

r""

The equation editor in Microsoft Word seems to have the best what you see is what you get converter:

In this case giving the following:

r"$\left(x+a\right)^n=\sum_{k=0}^{n}{\binom{n}{k}x^ka^{n-k}}$"

But isn't perfect, in this case \funcapply had to be removed:

r"$\beta=\sin\funcapply(\alpha)$"

However Latex is seen to work with the xlabel, ylabel, title, legend, annotation and tick labels:

annotation = ax.annotate(text=r"$\left(x+a\right)^n=\sum_{k=0}^{n}{\binom{n}{k}x^ka^{n-k}}$", xy=(-6.28, -0), xytext=(-4.8, -0.5), arrowprops={"arrowstyle": "simple"}, horizontalalignment='center', verticalalignment='center')
ax.xaxis.set_ticks(np.arange(start=-6, stop=9, step=3))
ax.xaxis.set_ticklabels([r"$-\alpha_2$", r"$-\alpha_1$", r"$\alpha_0$", r"$\alpha_1$", r"$\alpha_2$"])
ax.set_xlabel(r"$\alpha$")
ax.set_ylabel(r"$\beta$")
ax.set_title(r"$\beta=\sin(\alpha)$")
plt.legend(labels=[r"$\beta=\sin(\alpha)$"])
plt.tight_layout()

## Tick Parameters

The AxesSubplot method tick_params can used to modify the appearance of the AxesSubplot ticks.

? ax.tick_params

For example if we really wanted to customise the tick parameters of the x axis:

ax.xaxis.set_ticks(np.arange(start=-6, stop=9, step=3))
ax.xaxis.set_ticklabels([r"$-\alpha_2$", r"$-\alpha_1$", r"$\alpha_0$", r"$\alpha_1$", r"$\alpha_2$"])
ax.tick_params(axis="x", which="major", direction="in", length=15, width=5, color="#00b050", labelcolor="#0070c0")
ax.set_xlabel(r"$\alpha$", color="#ffc000")

It is rare to make such changes but sometimes it is done to indicate that a plot belongs to an AxesSubplot, this can be important when a twin AxesSubplot is used.

The sin function and exp function for example have the same x data points but scale very differently in y.

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
#%% generate data
xdata = np.linspace(start=-2*np.pi, stop=2*np.pi, num=100)
ydata = np.sin(xdata)
ydata2 = np.cos(xdata)
ydata3 = np.exp(xdata)

We can plot these on the same figure using a twinx AxesSubplot and color in each y Axes so it is obvious which AxesSubplot each plot belongs to:

#%% plot basics
fig, ax = plt.subplots()
scatter = ax.scatter(xdata, ydata, c="#ff0000", s=30)
ax.set_ylabel(r"$\beta=\sin(\alpha)$", color="#ff0000")
ax.tick_params(axis="y", which="major", direction="in", length=5, width=1, color="#ff0000", labelcolor="#ff0000")

ax2 = ax.twinx()
scatter = ax2.scatter(xdata, ydata3, c="#00b050", s=30)
ax2.set_ylabel(r"$\gamma=\exp(\alpha)$", color="#00b050")
ax2.tick_params(axis="y", which="major", direction="in", length=5, width=1, color="#00b050", labelcolor="#00b050")

ax.set_xlabel(r"$\alpha$")

fig.tight_layout()

Normally it is better to use subplots for this kind of comparison.

## BoxPlot and ViolinPlot

#%% data science libraries
import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
random.seed(0)
#%% generate data
ydata = random.rand(100000)
ydata2 = random.randn(100000)

Visually we examined them as a histogram. These can be simplified using other plot types:

Let's have a look at a boxplot:

#%% plot basics
fig, ax = plt.subplots(nrows=1, ncols=2)
boxrand = ax[0].boxplot(ydata)
ax[0].set_xlabel("rand")
boxrand2 = ax[1].boxplot(ydata2)
ax[1].set_xlabel("randn")

In the variable explore we can see that boxrand is a dictionary.

So the data is represented as a median value in a box. The box encloses 50 % of the data. The whiskers (with caps) show the minimum and maximum values excluding any fliers (outliers). We can see that rand is centred around 0.5 and randn is centred around 0 as expected.

An associated plot is the violinplot. Instead of displaying a box it displays a kernel density estimation function which for a high number of datapoints is analogous to the shape seen in the histogram:

#%% plot basics
fig, ax = plt.subplots(nrows=1, ncols=2)
violinrand = ax[0].violinplot(ydata)
ax[0].set_xlabel("rand")
violinrandn = ax[1].violinplot(ydata2)
ax[1].set_xlabel("randn")

This gives a dictionary of collections in the variable explorer:

## 3D Data Meshgrid

Some plotting functions require a row of x values, a column of y values and a matrix of z values.

Others require a matrix of x values, a matrix of y values and a matrix of z values:

And other plotting functions require the data as equally sized vectors for example series within a dataframe:

The meshgrid function can convert a row of x values and a column of y values to matrices with the same dimensions as z2:

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% data in matrix format
x = np.array([5, 6, 7, 8, 9])[np.newaxis, :]
y = np.array([1, 2, 3, 4])[:, np.newaxis]
z2 = np.array([[0, 1, 1, 1, 0],
[1, 2, 3, 2, 1],
[1, 2, 3, 2, 1],
[0, 1, 1, 1, 0]])
#%% meshgrid
(x2, y2) = np.meshgrid(x, y)
#%% data in dataframe format
data = pd.DataFrame({"x": x2.flatten(),
"y": y2.flatten(),
"z": z2.flatten()})

## 3D Visualisation Colormap

Let's create linearly spaced x and y values as vectors and then use the meshgrid function on these to create matrices and then finally use a mathematic expression involving the exponential function and the x and y matrices to create z data:

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% x, y as vectors
xvec = np.linspace(-2, 2, 10)[np.newaxis, :]
yvec = np.linspace(-2, 2, 10)[:, np.newaxis]
#%% x, y data as matrices
xdata, ydata = np.meshgrid(xvec, yvec)
#%% lambda expression for z data
zfun = lambda x, y: x * np.exp(-x**2 - y**2)
zdata = zfun(xdata, ydata)
del(xvec)
del(yvec)

Let's examine xdata, ydata and zdata in the variable explorer:

If we expand these matrices we see a colormap and visually without reading the values we can use the colormap to view the intensity of the data.

For xdata we can see every row is the same but we are increasing in numeric value as we go across the columns:

For ydata conversely we can see every column is the same but we are increasing in numeric value as we go across the rows:

For the zdata we see the form of the mathematical expression:

This colormap is the basis of representing 3D data using a 2D figure and is used in the following plots:

The matshowplot uses a series of squares to display each cell in the matrix. The x values are the column index and the y values are the row index:

fig, ax = plt.subplots()
matshow = ax.matshow(zdata)
ax.set_xlabel("column index")
ax.set_ylabel("row index")

In the variable explorer, we see that the object matshow is an AxesImage. This is mappable.

We can use the Figure method colorbar to add a colorbar to a mappable plot within an AxesSubplot. This allows us to view the intensity levels in z:

fig, ax = plt.subplots()
matshow = ax.matshow(zdata)
ax.set_xlabel("column index")
ax.set_ylabel("row index")
colorbar = fig.colorbar(mappable=matshow, ax=ax)

The colorbar now displays as a Colorbar object on the variable explorer:

If we want we can change the orientation:

fig.colorbar(mappable=matshow, ax=ax, orientation="horizontal")

The colormap used is quite subjective and there are a huge number to choose from:

list(plt.colormaps)

The default is "viridis", we can change the colormap to the more common ones using the mappable objects method set_cmap.

bone is a common black and white colormap:

matshow.set_cmap("bone")

magma is also a common colormap:

matshow.set_cmap("magma")

jet is also common:

matshow.set_cmap("jet")

The AxesSubplot plotting functions pcolor and pcolormesh are similar to matshow however allow a x, y and z value to be specified. pcolor and pcolormesh are very similar however pcolormesh has been optimised for larger matrices.

fig, ax = plt.subplots()
pcolor = ax.pcolor(xdata, ydata, zdata)
ax.set_xlabel("x")
ax.set_ylabel("y")
colorbar = fig.colorbar(mappable=pcolor, ax=ax)
pcolor.set_cmap("magma")

pcolor is a PolyColor collection within the variable explorer:

These methods also have the keyword input arguments vmin and vmax which can be used to set the limits. Say for example we only want to view the positive values:

fig, ax = plt.subplots()
pcolor = ax.pcolor(xdata, ydata, zdata, vmin=0)
ax.set_xlabel("x")
ax.set_ylabel("y")
colorbar = fig.colorbar(mappable=pcolor, ax=ax)
pcolor.set_cmap("magma")

Let's now have a look at athe AxesSubplot methods contour and contourf which give a contour and contour filled representation respectively:

fig, ax = plt.subplots()
contour = ax.contour(xdata, ydata, zdata)
ax.set_xlabel("x")
ax.set_ylabel("y")
colorbar = fig.colorbar(mappable=contour, ax=ax)
contour.set_cmap("magma")

By default only a small number of levels are given, we can up this to 20:

fig, ax = plt.subplots()
contour = ax.contour(xdata, ydata, zdata, levels=20)
ax.set_xlabel("x")
ax.set_ylabel("y")
colorbar = fig.colorbar(mappable=contour, ax=ax)
contour.set_cmap("magma")
fig, ax = plt.subplots()
contourf = ax.contourf(xdata, ydata, zdata, levels=20)
ax.set_xlabel("x")
ax.set_ylabel("y")
colorbar = fig.colorbar(mappable=contourf, ax=ax)
contourf.set_cmap("magma")

contour and contourf are both a QuadContourSet in the variable explorer:

We can see that the contour representation is poor because we only have a small number of datapoints (10×10=100). Let's increase the number of points in each linspace function by 10 fold and have a look at the matshow, pcolormesh (instead of pcolor), contour and contourf using very similar code to the above. We can see these plots becoming more similar:

## Axes3DSubplot

To create a 3D plot we require a Figure with a Axes3DSubplot opposed to a Figure with an AxesSubplot which only works in 2D. The Figure add_subplot method has a keyword input argument projection which has a default value of "2d" and can be changed to "3d":

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% fig with 3D Axes
fig = plt.figure()
ax = fig.add_subplot(projection="3d")

Now ax is an Axes3DSubplot as seen in the Variable Explorer:

This gives us access to additional plot types such as plot_wireframe, plot_surface, contour3d and contourf3D. We also have additional methods such as set_zlabel which apply to the new axis. Let's explore these with a low number of datapoints first and then a higher number of datapoints:

fig = plt.figure()
wireframe = ax.plot_wireframe(xdata, ydata, zdata)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
fig = plt.figure()
surface = ax.plot_surface(xdata, ydata, zdata)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
fig = plt.figure()
contour = ax.contour3D(xdata, ydata, zdata)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
contour.set_cmap("magma")
fig = plt.figure()
contourf = ax.contourf3D(xdata, ydata, zdata, levels=20)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")
contourf.set_cmap("magma")

Now increasing the data points by 10 fold in x and 10 fold in y:

By default the surface and wireframe plots use a single color:

surface = ax.plot_surface(xdata, ydata, zdata, color="#00b050")

However the keyword input argument cmap can be used:

surface = ax.plot_surface(xdata, ydata, zdata, cmap="jet")

These 3D plots are interactive and can be rotated to other views using the mouse:

## Images

You'll have noticed that, some of the 2D plots of matrices are similar to images. We can also examine images in matplotlib using the functions imread, which reads an image and stores it as a matrix and the AxesSubplot method imshow which displays the image within an axes. The image should be stored in the same location as the Python script file.

When using imread, we need to specify the file name and extension.

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% import data
#%% basic plot
fig, ax = plt.subplots()
ax.imshow(img)
ax.set_xlabel("x")
ax.set_ylabel("y")

The image displays the color of each pixel.

In the variable explorer we can see that the image is a 3D array.

It has 1440 pages (y positions) by 1920 rows (x positions) by 3 columns (color r, g, b):

Changing the Axis view to 2; index 0 shows the red values, index 1 shows the green values and index 2 shows the blue values. In this case we are viewing the top left corner pixels which are sky blue and thus have a higher intensity of blue, followed by green and a lower intensity of red:

Recall that for the plotting functions, color can be a vector of normalised floats, therefore we need to normalise these values by dividing through by 255:

[61/255, 115/255, 205/255]

We can therefore create a line plot with the same color as the top left hand corner.

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% fig with subplots
fig, ax = plt.subplots()
#%% data
x = np.arange(start=0, stop=12, step=2)
y = np.arange(start=0, stop=24, step=4)
#%% basic plot
myline = ax.plot(x, y, color=[61/255, 115/255, 205/255], linewidth=5)
ax.set_xlabel("x")
ax.set_ylabel("y")

We see that this is close to sky blue as expected:

We can view the data on the three channels as subplots using the Figure method subplot_mosaic and the pcolormesh plotting function with a black and white colormap such as "bone" and a vmin of 0 and a vmax of 255. Finally in the last subplot we can use imshow:

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#%% data
#%% figure
fig = plt.figure()
#%% basic plots
ax = fig.subplot_mosaic([["red", "green"],
["blue", "all"]])
pcolormeshred = ax["red"].pcolormesh(img[:, :, 0], cmap="bone", vmin=0, vmax=255)
pcolormeshgreen = ax["green"].pcolormesh(img[:, :, 1], cmap="bone", vmin=0, vmax=255)
pcolormeshblue = ax["blue"].pcolormesh(img[:, :, 2], cmap="bone", vmin=0, vmax=255)
ax["all"].imshow(img)

The images are upside down so we must invert the rows. Let's do this and use different colormaps for each channel to reflect their red, green and blue nature respectively:

#%% import data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#%% style
sns.set_style(style="white")
#%% data
ax["all"].imshow(img)