Python and MatPlotLib: Histogram Plotting

Perquisite Libraries

In this guide we will look at creating a histogram plot of randomly generated data. We will need to first start by loading three libraries, numpy, pandas and matplotlib.pyplot. These are typically loaded as np, pd and plt respectively.

Python

Configuring the Layout of Figures

Before creating any figures, you should adjust your preferences for how you wish to display figures. The default option is inline which means all figures will be printed to the Console as shown:

If instead you want the Figures to be shown as a separate Window, you can change the setting to Automatic. To do this go to Tools → Preferences:

To do this go to Tools → Preferences:

Next on the left hand menu select iPython console:

Select Graphics:

Change the setting from Inline to Automatic:

Select Apply:

Now go to Consoles and Restart the Kernal:

When rerunning your code, your figure will be in a separate window opposed to being inline within the Console:

Note Spyder Version 3.3 may give a stream of errors instead of making a plot. If you have this version (installed by default with the Anaconda March 2019 installer) you should close down Spyder and then update both Anaconda and Spyder. To do this open the Anaconda PowerShell Prompt and type in:

Python

Note it is also possible to toggle between the two settings without restarting the Kernal using the following commands:

Python

In these guides, the setting automatic will be applied and the figures will all be shown as separate windows.

Function figure

To create a new figure we can use the following function. Leaving the input argument empty will create a new figure:

Python

To view the figure we need to show it:

Python

If no figures are open this will be “Figure 1”. We can also specify the figure number using:

Python

Now that we have Figure 1000, if we once again type in:

Python

We will get Figure 1000 +1 i.e. Figure 1001

The figures can be closes using the x on the top right corner or by using the command close with the input argument being the figure number in our case 1, 1000 and 1001:

Python

The command:

Python

Will close all open figures.

Random Number Generators

In order to make this reproducible, we will first rest the seed of the random number generator to 0.

Python

Let us now have a look at the random normal function (randn). This should give a normal randomly distributed number (around the origin). We can create an array of 10 randomly distributed numbers.

Python

For a small set of numbers it is fine to view these as a vector however to understand the distribution it may be more useful to plot them as a histogram:

Histogram Plot

However it is also useful to show this data set as a histogram:

Python

Bins and Range

This creates a Histogram however there is insufficient data to see how the distribution works, to rectify this we can increase n. In the example above MatPlotLib has determined what it thinks is the lower and upper bound of the histogram bins and used ~10 bins by default. We can add additional arguments to set these.

Python

Here we can see the shape of a Normal distribution begin to take place. The data still isn’t good enough to be completely sure.

Colour and Transparency

We can measure three times and overlay the three plots on a single graph. To distinguish the three plots we can use the additional input argument color (US spelling). Primary and Secondary colours aswell as Black and White can be encoded using a single letter string and also a string of the full name of the colour.

Single Letter
String
Full StringRGBRGBHex
rred[1,0,0][255/255,
0/255,0/255]
#ff0000
ggreen[0,1,0][0/255,255/255,0/255]#00ff00
bblue[0,0,1][0/255,0/255,0/255] #0000ff
yyellow[1,1,0][255/255,255/255,0/255] #ffff00
ccyan[0,1,1][0/255,255/255,255/255] #00ffff
mmagenta[1,0,1] [255/255,0/255,255/255] #ff00ff
kblack[0,0,0][0/255,0/255,0/255]#000000
wwhite[1,1,1][255/255,255/255,255/255]#ffffff

For more fine tuning colours can be specified as a vector of [r,g,b] values. Many programs list this vector of [r,g,b] values between 0 and 255 but Python recognises these are normalised values between 0 and 1. For instance the standard colours in Microsoft Word are as follows. Using these colours may be useful if you want to keep consistency with plots and a Word Document for instance.

Microsoft Word RGBHex
[192/255,0/255,0/255]#c00000
[255/255,0/255,0/255]#ff0000
[255/255,192/255,0/255]#ffbf00
[255/255,255/255,0/255]#ffff00
[146/255,208/255,80/255]#92d050
[0/255,176/255,80/255]#00b050
[0/255,176/255,240/255]#00b0f0
[0/255,112/255,96/255]#007060
[0/255,32/255,96/255]#002060
[112/255,48/255,160/255]#6f30a0
Python

Now these figures appear to overlay however it is hard to see them because the last plotted histogram (blue) covers the first plotted histogram (green) which in term covers the zeroth plotted histogram. This can be amended with some transparency:

Python

This data overlays pretty well, I will modify it so a is -1 with respect to b and c is +1 with respect to b so you can see the effects of transparency in more detail:

Python

Sometimes [r,g,b] is combined with the alpha parameter to give a set of 4 values [r,g,b,a] where the 4th value corresponds to the alpha value or transparency. We can use these numeric values to return red, green and blue and apply the alpha values we had earlier.

Python

Another colouring system is called the Hexadecimal or hex colouring system, in this colouring system each channel corresponds to 2 characters ranging from 0 to F (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) and 16×16=256 (0-255 with 0 indexing). once again we have 3 rgb channels or 4 channels if we include the alpha. This reproduces the chart from earlier:

Python

In general people just look up the colour values and apply the one they want but for a more detailed explanation behind the fundamentals see

Line Width and Edge Colour

Moving back to a single chart and noting that the statistics aren’t good enough, we’ll increase the n, 100 fold to 10000 and we’ll look at also adding outlines to each bar by specifying additional arguments, edgecolor and linewidth.

Python

Now we can see it resembles a Normal Gaussian Function, that the lines of each bar are black and darkened, and the bars are the same colour as the green taken from Microsoft Word.

Pattern

We can change the hatching of the bar using the additional input argument hatch

Python

Toggling through the different hatch styles we get:

Plot Labels, Labels and Legend

The axes can be labelled using xlabel (line 7), ylabel (line 8) and the title can be labelled (line 9). The plot can be assigned a label set as a string (line 6). This will show up as a legend if a legend is specified for the plot (line 10).

Python

The location of the legend can be set using the input argument loc and assigning it to a string or a number. Note once again that English US is used for center opposed to the English UK version centre. Unfortunately with the numerical input, it is implemented in the following way:

Location StringLocation Integer
best0
upper right1
upper left2
lower left3
lower right4
right5
center left6
center right7
lower center8
upper center9

Opposed to using the shape of the number square which would have made much more sense.

Here we can explicitly set, the location of the legend (line 10).

Python

You’ll notice the title above is quite long. If we want to separate it onto two lines we can use the special character \n. Note that we have also had to specify the value of n i.e. 10000 on line 1 and on line 9. We can instead reference the variable n we assigned on line 1, by typing in %i (i for integer) where we want the integer n specified. After we close the quotes we then put a % and enclose the variable we want to reference. In this case we only specify one variable.

Python

With this done we can change n=1000000 in line 1 and it will autoupdate:

Python

If we wanted to reference more we would need to put in more markers for them %i (integer), %f (float) or %s string and then at the end put in multiple inputs in the %( ).

Python

Grid Lines

One can also enable gridlines on their plot using the command grid, the first input argument b is a Boolean specifies whether they are enabled or disabled, the second input argument which specifies which gridline to be amended the minor, major or both minor or major gridlines. The color argument specifies the colour of the gridline and one may also change the linestyle for instance : for a dotted line (this will be covered in more detail when line plots are examined).

Python

Subplot

Supposing we wanted two Histograms on the same subplot for instance if we are interested in comparing the random distribution to the random normal distribution, we could use a subplot. For the subplot, we specify the number of rows and then the number of columns of the subplot, the third input argument is the position, with position 1 being the first subplot,position 2 the second and so on and so forth. The positions start to the top left and go row wise until the last element is reached:

Python

For comparison this is a 2 by 2 plot.

Python

Obviously you may be limited when it comes to axes, legends, titles etc. when you try to fit them into a smaller screen. So we will remove the titles and the legends and instead only plot a single supertitle using the function suptitle. We will also remove the x-axis of the top histogram as it is shared with the bottom histogram. The second dataset can be seen to be between 0 and 1 and it is clear that there are not enough bins so we will increase these also.

Python

We can see that the random function rand, distributes the numbers between 0 and 1 and with the exception to the end bins which go half under 0 and half over 1, give approximately equal values per bin. Whereas the random randn, distributes the numbers around the centre in the form of a Gaussian.

Advertisements

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.