Python and NumPy: Indexing of NumPy Vectors

Indexing an Element from a Vector

This will look at some of the methods for indexing NumPy Arrays, Let us begin with a vector. We can create one using the arange function. Recall that when we use the arange function we select an upper bound but use 0th order indexing, which means we go up to the upper bound but never reach it and because we are using integer steps our maximum is 1 below it. For example in the case below the upper bound is 5 but the maximum value we have is 4. Ensure you understand this as it is crucial when it comes to indexing.

Python
[0 1 2 3 4]

Here the result is:

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

This vector has deliberately been chosen as in this case the 0th index is 0, the 1st index is 1, the 2nd index is 2 and so on and so forth.

To index a NumPy array we type in the array name and then use the index in [ ] square brackets. Say for example we wanted the first element of a, to get this we would use:

Python
1

If instead we wanted the third index we would use:

Python
3

And this can readily be repeated for the other indexes. You can try for index 0, 2 and 4 for practice.

In the example above the location of each element matched its value. We will now instead create a random array of 5 integers from 1 to 11 (upper limit of 11 minus 1 integer which is 10) and repeat the process above. We will set the random seed to 0 so this can be reproduced.

Python
[6 1 4 4 8]

Below the array b, we will place the array a which shows the element locations:

\displaystyle b=\left[ {\begin{array}{*{20}{c}} 6 & 1 & 4 & 4 & 8 \end{array}} \right]

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

We can see that the 0th element is 6, the 1st element is 1, the 2nd element 4, the 3rd element is also 4 and the 4th element is 8. Lets check the 1st and 3rd element as before:

Python
1
Python
4

This is as expected.

Calculating the Size of a Vector

Return to a:

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

Python
(5,)

This tells us that b is a shape of dimensions 5, because Python uses indexing from 0, opposed to 1, this means our elements are from 0 to 5-1=4 opposed to between 1 and 4.

In the above we get the function shape from the np library and specify b as an input argument. However b is a already a np array, this means it is possible to dot index the np function directly from the variable b. For example the above can be rewritten as:

Python
(5,)

This returns the same value. Recall with the NumPy library and type in np followed by a . and then press [tab]:

Python

After creating a numpy variable, we can type in the variable name followed by a . and then press [tab] to reference applicable NumPy functions available for that variable:

Python

Out of Bounds Error

\displaystyle b=\left[ {\begin{array}{*{20}{c}} 6 & 1 & 4 & 4 & 8 \end{array}} \right]

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

As we see the shape is 5 (as we specified when creating b). This means the maximum element is 4 (5 minus 1 due to starting at the 0th element). If we try and get the 5th element of b we will get an error because it is out of bounds:

Python
Traceback (most recent call last):

  File "<ipython-input-5-c133823c3717>", line 1, in <module>
    b[5]

IndexError: index 5 is out of bounds for axis 0 with size 5

Negative Indexing an Element from a Vector

It is also possible to use negative Indexing. Lets create another vector c using arange to show the element locations with respect to their negative position, this time we will start with the negative dimension of b and stop at 0, going down in steps of 1.

Python

[-5 -4 -3 -2 -1]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

Once again we can put this alongside b:

\displaystyle b=\left[ {\begin{array}{*{20}{c}} 6 & 1 & 4 & 4 & 8 \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

Starting from the right and going right to left, we can see that we are first at the -1 element. -1 means we are -1 away from the upper bound. Recall when the array was created, the upper bound was set to 5 but we never actually reach the value of 5 due to 0 order indexing. 0 order indexing means we go up to our upper bound, but never include it. We are in fact -1 away and this corresponds to the same -1 used for negative indexing. Thus the -1st element is 8, the -2nd element is 4, the -3rd element is 4, the -4th element is 1 and the -5th element is 6. Any other negative value is out of bounds for this example.

Python
8
Python
4

Indexing using a Colon

To select an entire vector we can index using a colon. So for instance:

Python

This returns

[0 1 2 3 4]

[0 1 2 3 4]

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

\displaystyle a=\left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

While this may seem rather pointless when it comes to vectors, it is important for objects with multiple dimensions such as Matrices, we’ll look at these later.

If we want to instead only select the 0th to 2nd element we can use:

Python
[0 1]

\displaystyle \left[ {\begin{array}{*{20}{c}} 0 & 1 \end{array}} \right]

Recall that once again we are using 0 order indexing so we are going up to the upper bound of 2 but not reaching it and once again are because we are using integer, the value is 1 less than 2. We can also make a selection using negative indexes:

Python
[1 2]

\displaystyle \left[ {\begin{array}{*{20}{c}} 1 & 2 \end{array}} \right]

If this is confusing have a look below:

\displaystyle a=\left[ {\begin{array}{*{20}{c}} {+0} & {+1} & {+2} & {+3} & {+4} \end{array}} \right]

\displaystyle b=\left[ {\begin{array}{*{20}{c}} {+6} & {+1} & {+4} & {+4} & {+8} \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

Indexing from a lower bound of -4 to an upper bound of -2 means we start from -4 and go up by steps of 1 to the upper bound of -2. Recall that because we are using 0 order indexing, we go to the upperbound but never reach it. So our maximum element per this is selection is -4 minus an integer value which is -3. In the data above this is also equivalent from indexing from a lower bound of 1 to an upper bound of 3. Once again due to the 0 order indexing we go to 3 but never reach it so our value is 3 minus 1 which is 2. Once again understanding 0 order indexing is crucial.

Indexing Using a Comma

It is also possible to index variables that aren’t beside each other using a comma. Lets have a look at the NumPy Arrays a, b and c:

\displaystyle a=\left[ {\begin{array}{*{20}{c}} {+0} & {+1} & {+2} & {+3} & {+4} \end{array}} \right]

\displaystyle b=\left[ {\begin{array}{*{20}{c}} {+6} & {+1} & {+4} & {+4} & {+8} \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

Supposing we want a NumPy Array at 0,2 and 4 we could type:

Python
[6 4 8]

This gives:

\displaystyle \left[ {\begin{array}{*{20}{c}} 6 & 4 & 8 \end{array}} \right]

Once again this can be created using negative element locations. In this case:

Python
[6 4 8]

This gives:

\displaystyle \left[ {\begin{array}{*{20}{c}} 6 & 4 & 8 \end{array}} \right]

Note we do not need to list values only once in order and we can use positive and negative element locations:

\displaystyle a=\left[ {\begin{array}{*{20}{c}} {+0} & {+1} & {+2} & {+3} & {+4} \end{array}} \right]

\displaystyle b=\left[ {\begin{array}{*{20}{c}} {+6} & {+1} & {+4} & {+4} & {+8} \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

For example:

Python
[1 1 8 4]

\displaystyle \left[ {\begin{array}{*{20}{c}} 1 & 1 & 8 & 4 \end{array}} \right]

Indexing Using a Double Colon and a Step Size

Previously we indexed:

Python

\displaystyle a=\left[ {\begin{array}{*{20}{c}} {+0} & {+1} & {+2} & {+3} & {+4} \end{array}} \right]

\displaystyle b=\left[ {\begin{array}{*{20}{c}} {+6} & {+1} & {+4} & {+4} & {+8} \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

This gave us every value following a step size of 2.

[6 4 8]

\displaystyle \left[ {\begin{array}{*{20}{c}} 6 & 4 & 8 \end{array}} \right]

It is also possible to directly specify a selection of the full dataset at a given step size. To do this we index using two colons followed by the step size.

Python
[6 4 8]

\displaystyle \left[ {\begin{array}{*{20}{c}} 6 & 4 & 8 \end{array}} \right]

Conditional Selection

It is possible also to use Conditional Selection, that is selection of an array depending on Logical True/False Criterion. For this the vectors need to be the same size or comapred to a scalar, The logical operators comparing a to b can be written as a function or using the operators:

Python

Re-examining our array b, lets select the data that is equal to the scalar of 4:

\displaystyle a=\left[ {\begin{array}{*{20}{c}} {+0} & {+1} & {+2} & {+3} & {+4} \end{array}} \right]

\displaystyle b=\left[ {\begin{array}{*{20}{c}} {+6} & {+1} & {+4} & {+4} & {+8} \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

Python
[False False  True  True False]

[False False  True  True False]

\displaystyle \left[ {\begin{array}{*{20}{c}} {\text{False}} & {\text{False}} & {\text{True}} & {\text{True}} & {\text{False}} \end{array}} \right]

\displaystyle \left[ {\begin{array}{*{20}{c}} {\text{False}} & {\text{False}} & {\text{True}} & {\text{True}} & {\text{False}} \end{array}} \right]

This logical output can then be used to index into a vector of equal dimensions. For instance:

Python
[4 4]

[4 4]

[4 4]

\displaystyle \left[ {\begin{array}{*{20}{c}} 4 & 4 \end{array}} \right]

\displaystyle \left[ {\begin{array}{*{20}{c}} 4 & 4 \end{array}} \right]

\displaystyle \left[ {\begin{array}{*{20}{c}} 4 & 4 \end{array}} \right]

\displaystyle a=\left[ {\begin{array}{*{20}{c}} {+0} & {+1} & {+2} & {+3} & {+4} \end{array}} \right]

\displaystyle b=\left[ {\begin{array}{*{20}{c}} {+6} & {+1} & {+4} & {+4} & {+8} \end{array}} \right]

\displaystyle c=\left[ {\begin{array}{*{20}{c}} {-5} & {-4} & {-3} & {-2} & {-1} \end{array}} \right]

Using the same vector this may not appear immediately useful. If we instead look at the values of a where b is equal to 4:

Python
[2 3]

We can think of this as where b equals 4:

\displaystyle \left[ {\begin{array}{*{20}{c}} 6 & 1 & 4 & 4 & 8 \end{array}} \right]

This a number of True or False values:

\displaystyle \left[ {\begin{array}{*{20}{c}} {\text{False}} & {\text{False}} & {\text{True}} & {\text{True}} & {\text{False}} \end{array}} \right]

These then get applied to the array:

\displaystyle \left[ {\begin{array}{*{20}{c}} 0 & 1 & 2 & 3 & 4 \end{array}} \right]

\displaystyle \left[ {\begin{array}{*{20}{c}} {\text{0}\left[ {\text{False}} \right]} & {\text{1}\left[ {\text{False}} \right]} & {\text{2}\left[ {\text{True}} \right]} & {\text{3}\left[ {\text{True}} \right]} & {\text{4}\left[ {\text{False}} \right]} \end{array}} \right]

Only the values that are True are shown in the final array:

\displaystyle \left[ {\begin{array}{*{20}{c}} 2 & 3 \end{array}} \right]

Lets look at a practical example here this is useful. First lets generate some data and have a look at it in the variable editor:

Python

The x data looks normal:

From the y data we can see the index at 5 is immediately an anomaly compared to the rest of the data. If plotted for instance, it would drown out all the other data just giving a spike.

We may hence want to select a new set of data x1 and y1 which doesn’t have this unwanted spike. To this we could use the criterion less than:

Python

We can now see all the data is good except for index 5:

We can then create x1 and y1 using this criterion:

Python

As we can see the new x1 and y1 data does not contain the datapoint where the spike was, it has been removed by conditional logic indexing:

Note the lines above can be combined:

Python

On the other hand if this was a set of scientific measurements say for example x was a range of concentrations in a chemistry lab and y was the result of their chemical reaction on a scientific instrument. One could use the greater than symbol to look at the bad data point. They could then use conditional logic, in this case to determine the bad datapoint and remeasure it.

Python
Advertisements