Python Programming with Spyder

Part 3: Object Orientated Programming and use of the Python dot Syntax to reference an object from a module or class. An introduction to the inbuilt and standard Python modules and libraries.

Python is an Object Orientated Programming (OOP) language where everything we interact with is an object. This guide will use the Spyder 5 IDE to explore some of the main concepts behind object orientated programming focusing specifically on the dot syntax where we reference an object contained within a second object (the second object can be thought of as a container or box, the box is itself an object that can store other objects).

This guide continues from Part 1 Python Procedural Programming and Part 2 Python Code Blocks. If you are not familiar with these concepts please have a look at these guides first:

Video

Python modules

A module is a Python script where a number of objects are assigned.

The assignment operator = is used for assignment:

object_name = value

It is perhaps more useful to approaching the line above from the centre, then the right hand side and then the left hand side. Doing so translates the syntax to English language, producing the sentance:

assign the value to the object_name

Notice that in the English language syntax we have additional words to make the sentence sound grammatically correct. In Python often we want to simplify things to increase typing efficiency and therefore do not include these filler words.

In Spyder in the script editor, let's assign the float value 3.14 to the object name pi using the assignment operator =.

We can then save this Python script by using File→Save as…

And in this case save it as script0.py:

Now we will select File→New File…

We will then save it as script1.py

Now that both Python script files are saved, for convenience, we can right click the tabs and select Split Vertically:

Now we will look at multiple ways of accessing the objects defined in script0 in the new script1. script0 can be considered a module i.e. a script file where objects are assigned.

from module import object

We can import an object from a module using the from and import keywords. This has the general form:

from module import object

Note that when we reference the Python module (script0.py) we do not include the .py extension.

It may be helpful to conceptualize the module as a box and the object as an item in the box.

The line above therefore would become:

from box import item

In English language syntax we would probably approach this statement from the right hand side first and change this to:

import the item from the box

Notice the analogy of the approach discussed when using the assignment operator i.e. changing the order and removing the filler words.

Let's add the following code to our script files and save each script.

script0.py

pi = 3.14

script1.py

from script0 import pi
print(pi)

In the files tab we can see both scripts are in the same folder. Highlighting script1.py we can run it:

We can see that the float object pi is imported and displays on the variable explorer. script1 will also print the output to the console.

objects: variables and functions

In Python we can use the assignment operator to assign another object i.e. use the form:

object_name_0 = value_0
object_name_1 = value_1

We can also define a function using the def keyword, followed by a space and then the functions name. On the same line parenthesis is used to enclose any input arguments (none are supplied for this particular function) and this line ends with a colon : which is an indication to begin a code block.

Any code belonging to the code block is indented by 4 spaces. The first three lines of the code block are the document string """ """.

Functions can optionally include a return statement which can be used to return an output.

def function_name(*args, **kwargs):
    """
    docstring
    """
    code
    return output

In English we can think of the top line as being:

define the object function_name to be a function (which takes the following positional input arguments *args and keyword input arguments **kwargs):

The code belonging to this function is …

Use the code above to create an output from a calculation involving *args and **kwargs

return the output

In Python, object_name_0, object_name_1 and function_name are all objects. object_name_0 and object_name_1 are both variables which have merely been assigned a value (using the assignment operator). function_name on the other hand is a function and a function is designed to operate on provided input arguments (if applicable) to perform some operation and return an output variable or print an output variable to the console.

Let's restart the Kernel (select Consoles and then Restart Kernel) and update script0 to be of this form.

script0.py

pi = 3.14
e = 2.72

def greeting():
    """
    prints hello to the console.
    """
    print("hello")

When we run script0 directly, notice that e and pi display on the variable explorer but the greeting function doesn't.

Recall that in Python we have the concept of scope. We can have a look at the local scope of the console otherwise known as local directory of the console by typing in:

dir()

At the moment we can ignore the Python datamodel methods which begin and end with a double underscore __ (also colloquially known as double underscore or __dunder__ methods), the private object names which begin with an underscore _ and exit and quit which are always present in the console's namespace.

Specific of interest to us we can see the objects e, greeting and pi are all in the consoles local directory.

In the console we can look at the three objects by typing in their object name:

pi
e
greeting

In the case of the pi and e (variables), the values assigned to these object names are printed to the console while in the case of greeting (a function) we are only informed that it is a function.

In other words all we have done is reference the objects above. In the case of a function we can either reference it or call it. To call a function we need to provide parenthesis alongside any required input arguments (this particular function has None as it will always print the same static word hello).

Notice that as we type the function name followed by open parenthesis, that the tool tip appears and displays the functions docstring.

In this case the function has no input arguments but still must called by use of parenthesis.

greeting()

from module import object1, object2, …

We can use the same notation as previously used to import a single object, to import multiple objects. All we need to do is include a comma as a delimiter for all the objects we wish to import. i.e. we import multiple objects using the form:

from module import object_0, object_1, object_2, ...

Once again it can be useful to think of the module as a box:

from box import item_0, item_1, item_2, ...

And to construct the syntax in English language:

import the items item_0, item_1, item_2, and … from the box

Let's restart the Kernel (select Consoles and then Restart Kernel) and add the following code to our scripts.

script0.py

pi = 3.14
e = 2.72

def greeting():
    """
    prints hello to the console.
    """
    print("hello")

script1.py

# %% Imports
from script0 import pi, e, greeting
# %% printing variables (instances)
print(pi)
print(e)
# %% calling functions
greeting()

We can run the zeroth cell of script1:

Once again we will see e and pi appear on the variable explorer. The function greeting will not display but it will be in the consoles local directory which we can check once again by using:

dir()

Because these are assigned, running the remaining cells will print the values of the variables and call the function as expected.

from module import *

Let's restart the Kernel (select Consoles and then Restart Kernel). Now instead of importing each object individually * (which denotes all) can be used to import all objects available from a module.

from module import *

Once again conceptualizing the module as a box:

from box import *

And taking * to be all we can construct the English language syntax:

import all items from the box

Let's update our scripts to be of that form:

script0.py

pi = 3.14
e = 2.72

def greeting():
    """
    prints hello to the console.
    """
    print("hello")

script1.py

# %% Imports
from script0 import *
# %% printing variables (instances)
print(pi)
print(e)
# %% calling functions
greeting()

While the code executes with e and pi displaying on the variable explorer and e, pi and greeting being imported into the consoles name space you will see several warnings in script1.py.

The * import is not normally recommended as it unclear where each object is imported from. Also if you are dealing with large scripts with hundreds to thousands of lines of code each, it is possible that both of these scripts may use a common object name. Take the analogy of having two large boxes full of Lego and emptying (importing all Lego bricks) on the floor. You will be unsure of what box each Lego brick came from.

We can illustrate this by creating three scripts, script1 will import everything from script0 and script2 and both of these scripts will have an object name e defined. When the 0th code block is ran, e becomes the float 2.72 as imported from script0.

However when the next code block is ran, e gets reassigned to the str "e" as imported from script2. In the scenario where the user was not interested in the object e from script2 but only interested in other possible objects from script2 (not included for clarity). It is likely that they would run into other issues such as a TypeError due to e being accidentally redefined from the float 2.72 to the str "e".

import module and module.object

We can also directly import the module using the form:

import module

Then we can access objects defined from the module using the dot syntax.

module.object_1

It may be helpful of thinking once again as the module as a box and the . as an arrow → i.e.

box→object_1

The definition of this syntax is essentially taking object_1 out of this box (in this case module).

Let's restart our kernel and update our scripts to the following lines of code.

script0.py

pi = 3.14
e = 2.72

def greeting():
    """
    prints hello to the console.
    """
    print("hello")

script1.py

# %% Imports
import script0
# %% calling functions
script0.greeting()

If we run the code notice that the objects pi and e do not display on the variable explorer and the function greeting was executed as can be seen from the console output.

If we look at the local directory of the console:

dir()

We can see that only script0 is listed:

We can look at the local scope of script0 using:

dir(script0)

We can see the autocompletion in the console if we type in the name of the module followed by a dot (the console uses Jedi completion and doesn't have full integration with kite so a tab is required to show the list of objects which can be called from the module).

Moving over to the script editor which uses Kite, we can see that the function greeting, the text (variables) pi and e are shown.

import module as alias

We can also import the module as a custom object name or alias.

import module as alias

We then need to use dot indexing from the alias object name:

alias.object

script0.py

pi = 3.14
e = 2.72

def greeting():
    """
    prints hello to the console.
    """
    print("hello")

script1.py

# %% Imports
import script0 as alias
# %% calling functions
alias.greeting()

Notice that when we look in the local directory of the console we get alias and not script0. We can access the objects pi, e and greeting from alias using the same dot syntax as before.

class attributes and methods

Let's go back to script0 and have a look at the variables pi and e in more detail. If we type in either variable or instance object name followed by a dot, we see that both of these objects have a number of other objects which can be referenced from them. An object that can be referenced with respect to another object is termed an attribute. We can see the clear analogy with the module and object with the instance and attribute:

module.object
instance.attribute

Note that the list of objects that can be referenced from both variables pi and e have identical object names.

They are identical because both e and pi are instances of the same class. i.e. the float class which can be thought of as a blueprint for this datatype.

Notice that if we type the name of the class followed by a dot, that we once again get the same list:

Within the list we see that we have objects that are listed as text and we have objects which are functions. Let's have a look at the text objects which are known as attributes. These can be thought of as a property (or another object which is simply read off as a property the float). We access these in an identical manner to how we access objects defined in an imported module. i.e. the float object can be conceptualized as a module with the object names real and imag for example defined. These can be accessed in the same manner. For example:

pi.real

Where we are taking the object real from the float object pi. Note when we attempt to take the object real directly from the class float we are informed that it is an attribute.

Now let's have a look at the functions. Functions don't generally just read off a property from an object, they perform some action and this action usually requires code to act upon either input arguments or attributes of the object the function is called from. If we reference the function is_integer from both the float class and the instance pi, we get informed that they are a method or function respectively. Analogous to the behaviour we get when reference the print function:

float_is_integer
pi.is_integer
print

To call these, we need to use parenthesis and here we can see there is a difference in the method and the function. The method requires the input argument self which denotes a placeholder input argument that must be an instance of the float class:

Because the function is instead being called directly from the float class, self is implied:

The two line of codes are therefore equivalent:

float.is_integer(pi)
pi.is_integer()

The difference between these two terms is subtle.

The function is_integer called from the float instance pi is a method of the float class.

A method is called directly from a class and it's first input positional argument must be self.

A method for all other intents and purposes is identical to a function.

creating a custom class

We have used several classes inbuilt Python classes. However so far we have only considered a class as an abstract sort of blueprint. We have seen above how a class has methods (which can be conceptualised as actions analogous to functions) and attributes (objects which are just properties that can be read off the instance).

Under the hood to first proximation we can consider the class as a grouping of functions.

However we have seen that a class can have multiple instances and the functions different results when different instances are used. In our blueprint we must therefore use methods opposed to functions, and the first input argument of a method is always self.

Let's look at the top line where we classify the class. To classify a class we use the keyword class followed by the name of the class. The name of the class does not follow the normal snake_case that we have became familiar with instead CamelCase the convention to name third-party classes where the first letter of every word is capitalized and we do not use any spaces or underscores. After the class name we use parenthesis, not to enclose input arguments but instead to specify the parent or parent classes. In Pythons Object Orientated Programming (OOP), everything is an object and if no parent class is specified, object is taken as the default. For now we wills tick to object being the parent class:

class CamelCase(object):
    pass
class CamelCase():
    pass

If we type in object followed by a dot, we see a number of inbuilt methods, particularly ones beginning and ending with a double underscore. These are known as datamodel methods but are colloquially referred to as dunder methods.

When we create our own custom class, if we don't define any of these datamodel methods, then our custom class will follow the behaviour of the object class.

__init__ datamodel method

The __init__ datamodel is an abbreviation for initialization. This method is ran when we instantiate an instance of a class.

For example:

pi = 3.14

Or more explicitly:

pi = float(3.14)

Here the number 3.14 is defined by the user and the __init__ datamodel for this inbuilt class uses this value to assign it to the attribute real. Note that we can't view the code that defines inbuilt classes in Python because it is written in the C: programming language.

Let's conceptualize our own Coordinate class, that has x_in and y_in positional input arguments which are supplied by the user during instantiation.

For clarity we will create attributes x_att and y_att that offset this value by 1.

class Coordinate(object):
    def __init__(self, x_in, y_in):
        self.x_att = x_in + 1
        print(f"x_att assigned to {x_in + 1}")
        self.y_att = y_in + 1
        print(f"y_att assigned to {y_in + 1}")


c1 = Coordinate(1, 2)

To the bottom we will instantiate the class with the values 1 and 2. Let's run the above through the Spyder debugger.

In the first line we classify our class:

We then add any datamodel methods defined in the parent class object to our class:

Now we define our own __init__ method:

We then return out of our function.

We now have a line of code to instantiate our class to C1. We can select step into, to step into the method which instantiates this instance:

Line 9 assigns self to c1, x_in to 1 and y_in to 2. These are now accessible within the __init__ methods code block. We see these in the __init__ methods local name space on the variable explorer:

The attribute x_att is assigned. We will print it in the next step:

The y_att is now assigned:

We will once again print it on the next step:

We return out of the __init__ method:

Now we are at the end of the script:

We now see in the variable explorer, the object c1, an instance of the Coordinate class with the attributes x_att and y_att:

In the example I used different names for the input arguments and attributes and included a calculation to create modified values for the sake of clarity.

More generally we would remove the print statements within the __init__ method and it is a typical convention to give the attributes the same object name as the input arguments.

class Coordinate(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


c1 = Coordinate(1, 2)

We can now access the attributes x and y from the isntance c1:

The __init__ method can be used to call another method during instantiation. To call a method from another method we must call it from the instance self. As we are calling it from the instance self, self is implied and not included as a positional input argument (line 3).

class Coordinate(object):
    def __init__(self, x, y):
        self.create_attributes(x, y)
        
    def create_attributes(self, x1, y1):
        self.x2 = x1
        self.y2 = y1


c1 = Coordinate(1, 2)

Let's have a look at this using the debugger:

Let's step into the method which instantiates the class:

We are in the __init__ methods namespace and we see the variables x and y:

Selecting step, takes us to the line which calls the create_attributes method. We can also step into this method:

Now we see we are in the namespace of this method and have x1 and x1 (instead of x and y):

Now we will return three times, one return for each of the methods and one to indicate the end of the script:

And now if we examine the instance c1, we see we have the attributes called x2 and y2 this time and also the create_attributes method:

This method can be called again and new x1 and y1 values can be input which will give updated attributes x2 and y2:

c1.create_attributes(2, 4)

Again the code above used different input argument names to the names of the attributes for clarity. The same names can be used for the input arguments for both methods and the attributes, recalling that each method uses its own local scope:

class Coordinate(object):
    def __init__(self, x, y):
        self.create_attributes(x, y)
        
    def create_attributes(self, x, y):
        self.x = x
        self.y = y


c1 = Coordinate(1, 2)

get and set methods

Currently there are no restrictions on what the user can set x and y to. If they set them to a str for instance, it is likely code which involves a numeric operation that expects an int and therefore will likely lead to a TypeError later down the line. It can be common to assert the type of input arguments, in this case to int.

Sometimes we want to make a method or attribute private, to do so we begin the object name with a single _.

In this example there is an associated get and set method for each attribute _x and _y. The user can see that _x and _y are private attributes and it is safer to access these using the get method and to reassign these using the set method, opposed to manually accessing the attribute directly and possibly breaking something. In this case due to use of a wrong data type but in other cases other attributes could be dependent on the value of _x and need to also be updated when _x is updated:

class Coordinate(object):
    def __init__(self, x, y):
        assert type(x) == int
        assert type(y) == int
        self.set_x(x)
        self.set_y(y)
        
    def set_x(self, x):
        assert type(x) == int
        self._x = x

    def set_y(self, y):
        assert type(y) == int
        self._y = y
        
    def get_x(self):
        return self._x 

    def get_y(self):
        return self._y
        

c1 = Coordinate(1, 2)

The get and set methods work as intended:

c1.get_x()
c1.set_x(2)
c1.get_x()
c1.set_x("2")

__repr__ and __str__ datamodel methods

Let's have a look at our instance c1. We can type it into the console using:

c1

This will print the representation of the object.

Alternatively we can use the function repr which will return the representation of the object:

a = repr(c1)

We can also use the int class to convert the object to a str:

b = str(c1)

And the print function to print the object as a str:

c = str(c1)

The print function of course has no return statement.

The behaviour of the four functions is controlled by two datamodel methods __repr__ and __str__ and both of these expect a str to be returned. __repr__ is a formal representation and generally mimics how one would instantiate the class. Its behaviour controls what the function repr outputs and what is displayed in the console when the object name is typed. __str__ is more of an informal representation. Its behaviour controls what the class str outputs and what displays in the console when print is used. Let's define both these methods continuing the example above. In __repr__ we can make a str return to match the form Coordinate(x, y) using a formatted str with the attributes _x and _y. In __str__ we can informally represent the str in the form (x; y) once again using a formatted str. For conciseness we will only add these two methods below. We can also collapse the other methods in Spyder:

    def __repr__(self):
        string = f"Coordinate({self._x}, {self._y})"
        return string
    
    def __str__(self):
        string = f"({self._x}; {self._y})"
        return string

Now when we repeat the commands above, we see the defined behaviour when outputting to a cell, using repr, str and print respectively. It is clear to see that outputting to a cell and the use of repr match to what is returned in the method __repr__ meanwhile casting to a str or using print match to what is returned in the method __str__:

mathematical operators

We have seen before that the use of the + operator behaves differently for a str and an int or float. This is because the __add__ operator is defined to work differently for each class.

The str, int and float are the inbuilt classes. Normally when we use these methods our attribute from an instance self will interact with its corresponding attribute from an instance other. In our case these are going to be int so we will rely on the built in behaviour of the + operator for the int class when defining our __add__ method. The return statement is then normally setup to return an instance of our own class, so must match the expected form of input arguments when we instantiate a class.

    def __add__(self, other):
        new_x = self._x + other._x
        new_y = self._y + other._y
        return Coordinate(new_x, new_y)

c1 = Coordinate(1, 2)
c2 = Coordinate(4, 6)
c3 = c1 + c2

We see that the new instance c3 has the _x attribute x and _y attribute 8 as expected:

datamodel methoddefines behaviour of
function or operator
__init__init statement
__str__print
__repr__cell output
__len__len
__add__+
__sub__
__mul__*
__pow__**
__truediv__/
__matmul__@
__floordiv__//
__mod__%
__eq__==
__ne__!=
__lt__<
__le__<=
__gt__>
__ge__>=
__and__&
__or__|
__xor__^
__lshift__<<
__rshift__>>
__iadd__+=
__isub__-=
__imul__*=
__ipow__**=
__idiv__/=
__ifloordiv__//=
__imod__%=
__iand__&=
__ior__\=
__ixor__^=
__ilshift__<<=
__irshift__>>=

In the case of the Coordinate class we can define most of the operators that the int class has defined using very similar code to the above for each respective datamodel method and operator. However what might be particularly useful for this class is to assign an operator to calcualte the distance between two co-ordinates. We will use the __matmul__ datamodel method which defines the behaviour of the @ operator not otherwise used by the int class. This method will return a float distance:

    def __matmul__(self, other):
        dx = self._x - other._x
        dy = self._y - other._y
        distance = (dx ** 2 + dy ** 2) ** 0.5
        return distance

c1 = Coordinate(1, 2)
c2 = Coordinate(4, 6)
dist = c1 @ c2

For convenience all the code, for the above renamed as the class Coordinate2D is:

class Coordinate2D(object):
    def __init__(self, x, y):
        assert type(x) == int
        assert type(y) == int
        self.set_x(x)
        self.set_y(y)
        
    def set_x(self, x):
        assert type(x) == int
        self._x = x

    def set_y(self, y):
        assert type(y) == int
        self._y = y
        
    def get_x(self):
        return self._x 

    def get_y(self):
        return self._y
    
    def __repr__(self):
        string = f"Coordinate({self._x}, {self._y})"
        return string
    
    def __str__(self):
        string = f"({self._x}; {self._y})"
        return string

    def __add__(self, other):
        new_x = self._x + other._x
        new_y = self._y + other._y
        return Coordinate2D(new_x, new_y)
    
    def __matmul__(self, other):
        dx = self._x - other._x
        dy = self._y - other._y
        distance = (dx ** 2 + dy ** 2) ** 0.5
        return distance

c1 = Coordinate2D(1, 2)
c2 = Coordinate2D(4, 6)
dist = c1 @ c2

inheritance

Let's rename the Coordinate class above Coordinate2D and create another class Coordinate3D which uses Coordinate2D as a parent class.

class Coordinate3D(Coordinate2D):
    pass

c1 = Coordinate3D(1, 2)
c2 = Coordinate3D(4, 6)
dist = c1 @ c2

If we just type in pass, we can see that the code works as expected and we have essentially inherited everything from the parent class Coordinate2D. You can copy the Coordinate2D class and Coordinate3D class in the code cells above and rerun it through the debugger if not following this:

We can redefine a method, in the child class. Doing so will ignore the class inherited from the parent class. We can for example add additional methods such as set_z and get_z and call set_z in addition to the set_x and set_y defined in the parent class:

class Coordinate3D(Coordinate2D):
    def __init__(self, x, y, z):
        assert type(x) == int
        assert type(y) == int
        self.set_x(x)
        self.set_y(y)
        assert type(z) == int
        self.set_z(z)

    def set_z(self, z):
        assert type(z) == int
        self._z = z

    def get_z(self):
        return self._z

c1 = Coordinate3D(1, 2, 3)
c2 = Coordinate3D(4, 6, 9)

Note the __repr__ and __str__ method are inherited from the parent class and thus c1 input into the console does not mention the _z attribute. The method get_z works as expected however:

With the __init__ method above we have copied 4 lines of code from the original __init__ method. Sometimes we want to inherit the __init__ method from the parent class but add additional functionality instead of copying all the code from the parent class.

To do this we can use the super method to return the parent class. The input arguments for this are the ChildClass and the instance self. We can then use a dot to access all the methods defined from the parent class. In this case we can call the method __init__ from the parent class and provide the required input arguments.

class Coordinate3D(Coordinate2D):
    def __init__(self, x, y, z):
        super(Coordinate3D, self).__init__(x, y)
        assert type(z) == int
        self.set_z(z)

    def set_z(self, z):
        assert type(z) == int
        self._z = z

    def get_z(self):
        return self._z

c1 = Coordinate3D(1, 2, 3)
c2 = Coordinate3D(4, 6, 9)

When supplying the input arguments we do not include self because the expression:

super(Coordinate3D, self).__init__(x, y)

When provided to the parent class mimics, i.e. self is implied as we are calling the method from the instance self:

self.__init__(x, y)

Note that is possible to use the above syntax to use any method from the parent class:

super(ParentClass, self).parent_method(*args, **kwargs)

We can do so for example from the method add and the method repr:

    def __add__(self, other):
        added_2d = super(Coordinate3D, self).__add__(other)
        new_x = added_2d.get_x()
        new_y = added_2d.get_y()
        new_z = self._z + other._z
        return Coordinate3D(new_x, new_y, new_z)

    def __repr__(self):
        repr_2d = super(Coordinate3D, self).__repr__()
        string = repr_2d[:-1] + f", {self._z})"
        return string


c1 = Coordinate3D(1, 2, 3)
c2 = Coordinate3D(4, 6, 9)
c3 = c1 + c2

importing a class from another module

The code in script0 is 26 lines (left hand side). We could go on and on defining all the other datamodel methods as well as additional attributes and methods and we could create detailed docstrings for each method making the lines of code easily a couple hundred of lines.

The code on the right hand side is only 5 lines and perhaps at any given point in time we are only interested in using a subset of the class methods. In this scenario it does not make sense editing script0 but instead importing it and using it.

Now let's reference a class from another file.

script0.py

# Defining a class

class Coordinate(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


    def distance(self, other):
        dx = self.x - other.x
        dy = self.y - other.y
        return (dx**2 + dy**2)**0.5
    
    
    def __repr__(self):
        return (f"Coordinate({self.x} , {self.y})")
    
    
    def __str__(self):
        return (f"({self.x} , {self.y})")
    
    
    def __add__(self, other):
        new_x = self.x + other.x
        new_y = self.y + other.y
        return(Coordinate(new_x, new_y))

script1.py

# %% Imports
import script0
# %% Instantiate Coordinates
c = script0.Coordinate(1,2)
d = script0.Coordinate(3,4)
# %% Access Attributes
xcoord_c = c.x
# %% Call Method
pythogoras_len = c.distance(d)

builtin modules

Up until now we have been writing all our own code so we could understand how importing objects from another a module works. The Python programming language has a number of inbuilt modules. Builtin modules are modules that are builtin to Python and typically written in another programming language such as C for performance purposes. As a consequence we cannot access the source code of these modules but we can import them and use the objects defined within them.

sys module

Let's start by skimming over the sys (system) module. We can import it as if it is a file.

import sys

Because it is a builtin module, Kite (the autocompletion) will list the most commonly used objects from the module at the top followed by the list of objects available alphabetically.

We can use the dot notation to access the list object path and can assign it to an object name so it displays on the variable explorer:

import sys
path_list = sys.path

Now we can expand the list using the variable explorer. What we see is a list of where Python will look for objects to import from if it cannot find an object in the consoles directory or within the currently opened folder.

If we move script2 to a subfolder and attempt to import it we will get a ModuleNotFoundError. This is because it is in a different folder to the script being executed script0 and not present in the path:

As sys.path is a list of str, we can append a str object to the end of the path using the list append method.

We can copy and paste the path from Windows Explorer to a relative str. The relative str will automatically convert / to // because / is used to place escape character within a Python str. e.g. \t for tab and \n for new line. In the case of a file path / must be converted to // within the str context, the first / denotes placement of a special character within the str and the second / denotes that the special character is /.

subfolder = r"C:\Users\Phili\Documents\Python_Scripts\subfolder"
sys.path.append(subfolder)

Now when we attempt to import script2 it will also look within the subfolder and successfully import it. The dir() command will find the object script2 in the consoles directory.

We can also see that path_list on the variable explorer shows our subfolder added to the end of the path.

Restarting the Kernel will restore sys.path to its original state.

The modules in index 0 (builtin modules) on the sys.path list are not accessible as .py script files. Under the hood these are written in other programming languages such as C:\ and are known as builtin modules. We can use dot notation to view the object builtin_module_names from the sys module and assign it to the variable builtin_modules.

import sys
builtin_modules = sys.builtin_module_names

In the variable explorer we see that this is a list. We can scroll through it ignoring the private modules at the top that begin with a single underscore _, these private modules are not designed for the average Python user but exist for the purpose of Python maintainers. Within the builtin module names we can see some of the most commonly used modules such as sys, builtins, math and time.

builtins module

We see one of the modules is called builtins. If we use dot indexing from the module builtins we see a list of all the inbuilt objects in standard Python:

import builtins

This can be useful to reference as a beginner as Kite lists the most commonly used builtin objects and then everything else alphabetically. In this list you will quickly see what is a class, a function or an instance of a class (shown as text). Among this you will see a many Error Classes which are used to flag up common errors.

There is no point importing anything from this module as by default these are already builtin (by definition). for example the class builtins.str is the class str.

math module

Let's have a look at the builin module math. Once again we can import from the module as if it was a script file.

import math

Then we can use dot indexing from the module math to view mathematical functions and mathematical text objects.

We can access two constants pi, e and a function sqrt from this module using dot indexing from the module.

import math
x = math.pi
y = math.e
z = math.sqrt(4)

Alternatively, we can import each of these objects individually. The instances pi and e will display on the variable explorer however the function sqrt will not but it will be seen in the consoles directory i.e. following the same behaviour when we imported our own custom functions from our custom module.

from math import pi, e, sqrt
pi
e
sqrt_4 = sqrt(4)
print(dir())

time module

The builtin time module is used for time access operations for example when running a Python script. Let's import it:

import time

Now we can use dot notation from time. We can use the sleep method to pause the script for a specified time. This can be useful when printing output to the console, ensuring the user has time to read the output. Once again we can expand the tooltip:

For example if we want to print "hello" to the console and display it for 10 seconds, then execute code which prints a lot of rapid information to the console (in this case a for loop of spaces) and finally ends with a print of "goodbye" we can use:

import time
print("hello")
time.sleep(10)
for i in range(100):
    print(" ")
print("goodbye")

standard modules

Additional modules are found in the site-packages folder. There are a number of standard modules which are included in every single Python installation. The Python Module Index gives more details about these Standard Modules.

We can go to this folder and access the source code of these modules by opening the .py file in Spyder or NotePad++.

The file location using a Windows 10 Environmental Variable and an Anaconda installation is more generally:

C:\Users\%UserProfile%\Anaconda3\Lib

In Linux the location has the form (change your username and select your current Python version):

/home/philip/anaconda3/lib/python3.8

random module

Now let's have a look at the inbuilt module random which can be used to generate random numbers. Once again we can import from the module as if it was a script file.

import random

Then we can use dot indexing from the module random to view the functions available.

We can use the function seed, to return the random seed to its starting position (this will always produce the same random number for consistency). We can use the functions random (note the function random is called random and the module is called random making the reference random.random), randint and choice to generate a random number between 0 and 1, a random integer with a specified lower and upper integer bound and from a list of choices respectively. In this example, all the objects when used are called from the random module directly.

import random
random.seed(0)
a = random.random()
random.seed(0)
b = random.randint(0, 10)
random.seed(0)
c = random.choice(["Heads", "Tails"])

We can also open the random.py file. We see that it is quite involved with a length of 28,802. However at the top of the file we can see that it generally uses imports from other modules.

A class Random is defined:

A method seed is defined:

And you can scroll through the file to see how the methods random, randint and choice are defined. In general a deep understanding of the code in the module file is not required although it can be education to look through the source code.

Since we only used four objects from this module, we can also import them directly.

from random import seed, random, randint, choice
seed(0)
a = random()
seed(0)
b = randint(0, 10)
seed(0)
c = choice(["Heads", "Tails"])

keyword module

Let's have a look at something much simpler, the keyword.py module. In this case we can see that the module essentially includes a list object line 17-52, kwlist which is a list of type str of all the keywords. A function is defined iskeyword on line 55.

We can import this module

import keyword

Then we can use dot indexing from the module keyword to view both the function iskeyword and the list object kwlist.

We can check if the str "def" is a keyword and assign the output to x and assign y to be the list object kwlist.

import keyword
def_is_keyword = keyword.iskeyword("def")
kw_list = keyword.kwlist

We can see def_is_keyword in the variable explorer with he value True and kw_list in the variable explorer. We can open kw_list and view it in more detail, in this case seeing kw_list as a keyword:

fractions module

Let's have a look at the fractions.py file. At the top we can see that this module is dependent on other modules. We can see that this module imports the class Decimal from another module decimal. The math, numbers, operator, re and sys module are all imported.

We can also glance through the definition of the class Fraction.

In Spyder if we intend to use fraction objects regularly within our script, it may be more convenient to import the fractions module it as a 2 letter abbreviation fr.

import fractions as fr

Now we can use dot notation from fr. Let's create a Fraction class, we see the autocompletion gives us the docstring instructing us how to initialize a new instance (which under the hood uses the classes __init__ method).

Now we can create two instances a and b and perform a mathematical operation between them to get a third instance c.

import fractions as fr
a = fr.Fraction(5,10)
b = fr.Fraction(2,10)
c = a + b

We can see the instances of the Fraction class; a, b and c display in the variable explorer. Let's expand one of these. To the top we see a number of private objects which begin with an underscore. We then see the objects that can be called from the object c (which is a Fraction instance).

We see these in the Kite autocompletion when we type in the instance c followed by a dot.

There are a number of datamodel methods which can also be viewed.

The datamodel method __add__ which we examined earlier defines the behaviour of the + operator between the instance self and the instance other. The datamodel method __str__ defines how an instance of the fraction class operates when the functions str or print are used and the datamodel method __repr__ defines how an instance of the fraction class operates when the function repr is used (as we also seen before).

statistics module

Let's have a look at the statistic.py file. We can see that this module imports the math, numbers and random module and also imports specific objects from other modules.

Let's import the module statistics:

import statistics

Now we can use dot notation from statistics. Let's have a look at using the mean function, we see the autocompletion gives us the docstring instructing us how to use this function.

Let's create a list of data and carry out some basic statistics on it:

data = [1, 1, 1, 1, 2, 2, 3, 3, 4]
import statistics

data_mean = statistics.mean(data)

data_variance = statistics.variance(data)
data_stdev = statistics.stdev(data)

data_median = statistics.median(data)
data_mode = statistics.mode(data)

datetime module

The datetime module is a module for working with date and time variables. Let's import the module datetime:

import datetime

Now we can use dot notation from datetime. Supposing we are interested in the datetime class (called from the datetime module), we can type it with open parenthesis which will show a popup balloon tooltip with part of the docstring.

If we follow the link at the bottom of the tooltip it will open the full docstring in the Spyder help pane.

We can also highlight the object of interest and type [Ctrl] + [ i ] to inspect the object in the help pane.

We can a datetime instance and assign it to the variable a. We can then assign a timedelta instance to the variable b and create a new datetime instance c by adding a to b.

import datetime
a = datetime.datetime(year=2021, month=1, day=1)
b = datetime.timedelta(days=1)
c = a + b

data science packages

The site-packages folder in an Anaconda3 installation contains a number of data science libraries which are usually in the form of Python Packages. In windows explorer these site-packages are essentially subfolders.

The site-package contains a number of Python modules. Each site-package must however include an __init__.py file which is initialized when then site package is referenced. This follows the analogy of the __init__ datamodel method being involved when a class is instantiated.

The first 14 lines of code in the __init__.py module contain a number of lines of code which import (all indicated by a *) objects from most of the other modules of the package. Note that there is no object name specified before the dot in all of the imports which indicates that we are importing these modules from the same folder as the __init__.py file.

Line 17 also imports matplotlib (another package) or data science library that seaborn is built upon. seaborn can be considered as a wrapper around matplotlib and includes a number of functions which can be used to rapidly create plots that are commonly used for data visualization. All of these plots could in theory be created in matplotlib directly but doing so would normally require much more lines of code to get to the same result.

In the seaborn folder (package) we see the subfolders colors, external and test. These folders are all subpackages and these subpackages all also contain their own __init__.py file.

We import the data science library in an identical manner as importing a module. For commonly used data science libraries we generally use a 2-3 word alias. In the case of seaborn this is sns:

import seaborn as sns

And once seaborn is imported as the alias we can use dot notation to call items from the package. Let's use the lineplot function to create a lineplot.

The only two keyword input arguments we need to provide are the x and y data to be plotted which we will define as numeric lists.

# %% Imports
import seaborn as sns
# %% Generate Data
x_data = [1, 2, 3, 4, 5]
y_data = [0.95, 2.05, 2.95, 4.05, 4.95]
# %% Create a Line Plot
sns.lineplot(x=x_data, y=y_data)

The plot is shown on the plots pane in Spyder but looks kind of bland (using matplotlibs normal plot defaults).

seaborn has a number of inbuilt styles. The function set_style can be used to apply one of these styles to all figures made via seaborn or directly via matplotlib. We can have a look at the docstring for this function and use the 0th style listed, "darkstyle".

Rerunning this code with the darkgrid style will create a better looking plot:

# %% Imports
import seaborn as sns
sns.set_style("darkgrid")
# %% Generate Data
x_data = [1, 2, 3, 4, 5]
y_data = [0.95, 2.05, 2.95, 4.05, 4.95]
# %% Create a Line Plot
sns.lineplot(x=x_data, y=y_data)

Covering deeper usage of seaborn requires perquisite knowledge of matplotlib which is not the focus of this guide.

There are a multitude of datascience libraries available in the Anaconda installation. The most notable however are the ones that are considered the primary data science libraries which are conventionally imported using a 2-3 word letter alias:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

additional packages

Although the Anaconda Python distribution contains the most commonly used data science libraries, there are literally thousands of Python packages available to download and install. These can be installed using conda install commands in the Anaconda Powershell Prompt (Windows 10) or Terminal (Linux). conda install commands should be used opposed to pip install commands where possible (a google search of conda install "package to be installed") will usually take you to the Anaconda website with install instructions.

conda install commands check for inter-module/inter-package dependencies and will aim to address any conflicts. This does not happen with pip installs which can result in problems. Recall for example that seaborn was a wrapper for matplotlib and used matplotlib in the background. A newer version of matplotlib may change the behaviour of a seaborn plotting function therefore breaking seaborn functionality. seaborn is therefore designed to work with a specific version of matplotlib.

Let's use the python-docx package as an example:

conda search --channel conda-forge python-docx
conda install --channel conda-forge python-docx

Now that this module is installed, the docx package displays.

We can then copy and paste their example code in script0 and comment out the line of code for adding a picture. Running script0 generates the demo.docx file which can be opened in Microsoft Word:

careful consideration of object names

Care should be taken when creating object names and python script file names. If script1.py was renamed docx.py then the line of code:

from docx import Document

Would examine the current folder and find the docx.py module. It would then stop its search for docx. Within our custom docx module it would then look for the class Document and import it. The docx package in the site-packages folder would not be examined.

Beginners in particular sometimes run into this problem if they name their their first scripts for example numpy.py or pandas.py when taking notes on these libraries.