Python Language Server¶
The Python Language Server is used with the Python Interpreter to conveniently show identifiers and docstrings while typing code.
JupyterLab with Jedi¶
Input the prefix for example p
followed by a ⭾
to view a list of identifiers which begin with the prefix:
The output should look as follows. The ↑
and ↓
keys or the mouse can be used to scroll through the list of identifiers:
When the identifier is a function, its docstring can be viewed by inputting the function name with open parenthesis for example print()
and pressing ⇧
and ⭾
:
The docstring should display as a popup balloon, like the following:
If the object
class is input followed by a .
and ⭾
the identifiers that belong to it are seen:
The list of identifiers displays:
For the object
class most of the identifiers are datamodel identifiers which begin and end with a double underscore. Therefore an object.__
prefix followed by a ⭾
will display the datamodel identifiers:
The datamodel identifiers should look like:
If the str
class is input followed by a .
and ⭾
the identifiers that belong to it are seen:
The identifiers should look like:
Everything in Python is based on the object
class and therefore also have datamodel identifiers which begin and end with a double underscore. Therefore a str.__
prefix followed by a ⭾
will display the datamodel identifiers:
A class is callable and the docstring of the initialisation is seen when the class name is input followed by open parenthesis and ⇧
and ⭾
for example str()
and ⇧
and ⭾
:
The initialisation signature should look like:
VSCode with Pylance¶
In VSCode Settings, the default Python Language Server is Pylance:
In VSCode when Pylance is enabled alongside a Python interpreter, identifiers will display in response to a prefix inputting:
p
The output should look as follows. The ↑
and ↓
keys or the mouse can be used to scroll through the list of identifiers.
When the identifier is a function, its docstring can be viewed by inputting the function name with open parenthesis for example print()
:
If the object
class is input followed by a .
the identifiers that belong to it are seen. For the object
class most of these are datamodel identifiers begin and end with a double underscore:
If the str
class is input followed by a .
the identifiers that belong to it are seen:
Everything in Python is based on the design pattern of the object
class and therefore have object
based datamodel identifiers which can be seen by scrolling down:
A class is callable and the docstring of the classes initialisation signature is seen by inputting the class name followed by open parenthesis for example str()
:
The initialisation signature should look like:
Directory of Identifiers (dir)¶
In Python, the directory function dir
treats identifiers as a directories. The function can be used to view a list of identifier names in the current scope:
dir?
Docstring: Show attributes of an object. If called without an argument, return the names in the current scope. Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it. If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns: for a module object: the module's attributes. for a class object: its attributes, and recursively the attributes of its bases. for any other object: its attributes, its class's attributes, and recursively the attributes of its class's base classes. Type: builtin_function_or_method
dir()
['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__session__', '__spec__', '_dh', '_i', '_i1', '_i2', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'open', 'quit']
The cell above has a large output, it can be right clicked in JupyterLab and scrolling can be enabled for the Output. This cell setting will persist after the kernel has been restarted:
In VSCode on the other hand, the output will be truncated by default and can be viewed as a scrolling output or with a text editor. Unfortunately the scrolling setting does not persist after the kernel has been restarted:
The scrolling output is useful for displaying a long Collection
or a functions docstring in full but without it taking the main focus of the notebook file:
dir()
['In', 'Out', '_', '_2', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__session__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'open', 'quit']
When dir
is used, the identifiers above unfortunately aren't grouped by category. To do so a custom function dir2
will be imported from a custom module called categorize_identifiers
, that is in the same folder as the interactive Python notebook file. The function variables
will also be imported from the same module, which can be used to view variables:
from categorize_identifiers import dir2, variables
The dir2
function will pretty print the list of identifiers above in a dict
format grouped by category:
dir2()
{'constant': ['In', 'Out'], 'method': ['get_ipython', 'exit', 'quit', 'open', 'dir2', 'variables'], 'datamodel_attribute': ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '__session__'], 'internal_attribute': ['_ih', '_oh', '_dh', '_', '__', '___', '_i', '_ii', '_iii', '_i1', '_i2', '_2', '_i3', '_3', '_i4', '_i5']}
IPython Identifiers¶
Many of the identifiers listed here are additions from IPython and relate to previously input and output values. In particular:
|internal attribute|meaning|description|alias| |—|—|—|—| |_ih|input history|list of input history|In| |_oh|output history|dict of output history, key is ipython cell, value is cell output. items are only added to the dict when the cell has an output.|Out| |_1|output for cell 1|output for cell 1, only exists when cell 1 has an output| |_2|output for cell 2|output for cell 2, only exists when cell 2 has an output| |_3|output for cell 3|output for cell 3, only exists when cell 3 has an output| |_i|last input||| |_|last output||| |_ii|2nd last input||| |__|2nd last output||| |_iii|3rd last input||| |___|3rd last output|||
Note that there is a difference between a function that has a return
value and a function that has a print
statement:
def fun_r():
return 'hello world!'
def fun_p():
print('hello world!')
When these functions are called without assignment. The behaviour of the first function is to display the return value in the cell:
fun_r()
'hello world!'
When assignment of the function call is made to a variable, there is no cell output:
return_val_r = fun_r()
The behaviour of the second function is to always print
the value. printing looks similar to the cell output however notice that the formatting characters ''
which enclose the str
instance are processed and not shown when printed:
fun_p()
hello world!
Notice when the second function call is assigned to a variable that the second function continues to print
:
return_val_p = fun_p()
hello world!
Notice that the value of return_val_p
is None
because the function fun_p
has no return
value:
return_val_p == None
True
The JupyterLab Variable Inspector Extension gives a Variable Explorer. Which can be accessed by right clicking blank space in the notebook and selecting Open Variable Inspector. Note that this is not available on the conda
channel and is therefore not preinstalled in the Anaconda (base)
Python environment:
This displays in a new tab which can be repositioned and used to view Variables:
In VSCode there is a Variable tab:
For convenience the custom function variables
will be used to output variables to the cell output:
variables()
Type | Size/Shape | Value | |
---|---|---|---|
Instance Name | |||
return_val_r | str | 12 | hello world! |
return_val_p | NoneType | None |
The purpose of parenthesis during a function call is to provide a function with input data to work on. The following function can be defined which has a parameter with a default value 'world!'
def fun_r(parameter='world'):
return f'hello {parameter}!'
This function can be used as before:
fun_r()
'hello world!'
However parameter
can be assigned to a new value:
fun_r(parameter='earth!')
'hello earth!!'
functions
can also be referenced, which does not perform any action but merely reads off details about the function:
fun_r
<function __main__.fun_r(parameter='world')>
identifiers can be functions or instances; functions are typically called using parenthesis but can be referenced whereas instances are normally just referenced.
A function that is bound to another instance (and accessed via that instance) is known as a method:
'hello'.upper()
'HELLO'
An instance that is bound to another instance (and accessed via that instance) is known as an attribute:
'hello'.__class__
str
There are subtle differences in the five terms above and functions/methods and instances/attributes are often used interchangable with identifiers being the umbrella term.
The IPython internal instances can be examined. Notice that _oh
only has the keys 2
, 6
, 10
, 12
, 14
, 15
and 16
because these are the only cells that return
a value (to the cell output):
print('_2', _2)
print()
print('_ih', _ih)
print()
print('_oh', _oh)
print()
print('_dh', _dh)
_2 ['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__session__', '__spec__', '_dh', '_i', '_i1', '_i2', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'open', 'quit'] _ih ['', "get_ipython().run_line_magic('pinfo', 'dir')", 'dir()', 'dir()', 'from categorize_identifiers import dir2, variables', 'dir2()', "def fun_r():\n return 'hello world!' \n\ndef fun_p():\n print('hello world!')", 'fun_r()', 'return_val_r = fun_r()', 'fun_p()', 'return_val_p = fun_p()', 'return_val_p == None', 'variables()', "def fun_r(parameter='world'):\n return f'hello {parameter}!' ", 'fun_r()', "fun_r(parameter='earth!')", 'fun_r', "'hello'.upper()", "'hello'.__class__", "print('_2', _2)\nprint()\nprint('_ih', _ih)\nprint()\nprint('_oh', _oh)\nprint()\nprint('_dh', _dh)"] _oh {2: ['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__session__', '__spec__', '_dh', '_i', '_i1', '_i2', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'open', 'quit'], 3: ['In', 'Out', '_', '_2', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__session__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'open', 'quit'], 7: 'hello world!', 11: True, 12: Type Size/Shape Value Instance Name return_val_r str 12 hello world! return_val_p NoneType None, 14: 'hello world!', 15: 'hello earth!!', 16: <function fun_r at 0x0000027D9D5974C0>, 17: 'HELLO', 18: <class 'str'>} _dh [WindowsPath('C:/Users/phili/OneDrive/Documents/GitHub/python-notebooks/builtins_module_object')]
exit
and quit
are IPython additions to exit the IPython shell. The IPython shell can be exited using:
exit
exit()
quit
quit()
whereas the Python shell can only be exited using:
exit()
open
is also available in the namespace directly to make it easier to open
files.
Datamodel Identifiers¶
The identifiers starting and ending with a double underscore are known colloquially as dunder identifiers. The official term is datamodel identifiers as they follow a consistent design pattern. Python uses object orientated programming and everything in Python is based on an object
:
dir2()
{'attribute': ['return_val_r', 'return_val_p'], 'constant': ['In', 'Out'], 'method': ['get_ipython', 'exit', 'quit', 'open', 'dir2', 'variables', 'fun_r', 'fun_p'], 'datamodel_attribute': ['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '__session__'], 'internal_attribute': ['_ih', '_oh', '_dh', '__', '_i', '_ii', '_iii', '_i1', '_i2', '_2', '_i3', '_3', '_i4', '_i5', '_i6', '_i7', '_7', '_i8', '_i9', '_i10', '_i11', '_11', '_i12', '_12', '_i13', '_i14', '_14', '_i15', '_15', '_i16', '_i17', '_17', '_i18', '_i19', '_i20'], 'internal_method': ['_', '___', '_16', '_18']}
The __name__
(dunder name) gives the name of the notebook or script file being executed:
__name__
'__main__'
When Python is being directly executed from the notebook or a script file, its name is '__main__'
, when it is imported __name__
will match the file name without any file extension.
if __name__ == '__main__':
print('code is executed directly')
else:
print('code was imported')
code is executed directly
The __doc__
(dunder doc) is the docstring of the module or notebook file which is a str
instance:
__doc__
'Automatically created module for IPython interactive environment'
The docstring for the notebook file is automatically generated.
__package__ == None
True
__loader__ == None
True
__spec__ == None
True
The Builtins Module (__builtins__)¶
Every Python notebook and script file has access to the identifiers in Pythons builtins
module. These can be accessed directly or using the __builtins__
attribute:
dir2(__builtins__)
{'constant': ['Ellipsis', 'False', 'None', 'NotImplemented', 'True'], 'method': ['abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'breakpoint', 'callable', 'chr', 'compile', 'copyright', 'credits', 'delattr', 'dir', 'display', 'divmod', 'eval', 'exec', 'execfile', 'format', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'locals', 'max', 'min', 'next', 'oct', 'open', 'ord', 'pow', 'print', 'repr', 'round', 'runfile', 'setattr', 'sorted', 'sum', 'vars'], 'lower_class': ['bool', 'bytearray', 'bytes', 'classmethod', 'complex', 'dict', 'enumerate', 'filter', 'float', 'frozenset', 'int', 'list', 'map', 'memoryview', 'object', 'property', 'range', 'reversed', 'set', 'slice', 'staticmethod', 'str', 'super', 'tuple', 'type', 'zip'], 'upper_class': ['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BaseExceptionGroup', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'EncodingWarning', 'EnvironmentError', 'Exception', 'ExceptionGroup', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'NotADirectoryError', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'WindowsError', 'ZeroDivisionError'], 'datamodel_attribute': ['__IPYTHON__', '__debug__', '__doc__', '__name__', '__package__', '__spec__'], 'datamodel_method': ['__build_class__', '__import__', '__loader__']}
Notice the categories below:
- attribute (instances)
- constant
- method (function)
- datamodel method
- class
- lower class
- upper class
The builtins identifiers are normally called instances and functions however the terms attributes and methods are just as valid as they are identifiers defined in the builtins
module.
For the builtins
module all the attributes are constants and in uppercase:
dir2(__builtins__, print_output=False)['constant']
['Ellipsis', 'False', 'None', 'NotImplemented', 'True']
For example the constants True
and False
are the only two instances of the bool
class:
True, type(True)
(True, bool)
False, type(False)
(False, bool)
And None
is the solo instance of the NoneType
class:
None, type(None)
(None, NoneType)
In Python PascalCase
is typically used for third-party classes. In builtins
however the most the commonly used classes are in lower case and the classes typically have a shorthand way of instantiating an instance with data:
dir2(__builtins__, print_output=False)['lower_class']
['bool', 'bytearray', 'bytes', 'classmethod', 'complex', 'dict', 'enumerate', 'filter', 'float', 'frozenset', 'int', 'list', 'map', 'memoryview', 'object', 'property', 'range', 'reversed', 'set', 'slice', 'staticmethod', 'str', 'super', 'tuple', 'type', 'zip']
The class themselves act as functions which cast data from one builtins
class to another:
'hello', type('hello')
('hello', str)
str(2), type(2), type(str(2))
('2', int, str)
The classes in builtins
that are in PascalCase
are the error classes which will be raised when a problem is encountered. These are not normally instantiated directly by the user but will be encountered a lot when getting started with Python:
dir2(__builtins__, print_output=False)['upper_class']
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BaseExceptionGroup', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'EncodingWarning', 'EnvironmentError', 'Exception', 'ExceptionGroup', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'NotADirectoryError', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'WindowsError', 'ZeroDivisionError']
The Object Base Class (object) and Object Orientated Programming¶
Everything in Python is based on the object
base class. If its identifiers are examined, notice that most of these are datamodel identifiers:
dir2(object)
{'datamodel_attribute': ['__doc__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
Although most of these datamodel identifiers are defined in the object
class, they are not typically used directly. Instead an equivalent function is used from builtins
or operator:
dir2(__builtins__, show=['method', 'lower_class'])
{'method': ['abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'breakpoint', 'callable', 'chr', 'compile', 'copyright', 'credits', 'delattr', 'dir', 'display', 'divmod', 'eval', 'exec', 'execfile', 'format', 'get_ipython', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'locals', 'max', 'min', 'next', 'oct', 'open', 'ord', 'pow', 'print', 'repr', 'round', 'runfile', 'setattr', 'sorted', 'sum', 'vars'], 'lower_class': ['bool', 'bytearray', 'bytes', 'classmethod', 'complex', 'dict', 'enumerate', 'filter', 'float', 'frozenset', 'int', 'list', 'map', 'memoryview', 'object', 'property', 'range', 'reversed', 'set', 'slice', 'staticmethod', 'str', 'super', 'tuple', 'type', 'zip']}
import operator
dir2(operator)
{'method': ['abs', 'add', 'and_', 'call', 'concat', 'contains', 'countOf', 'delitem', 'eq', 'floordiv', 'ge', 'getitem', 'gt', 'iadd', 'iand', 'iconcat', 'ifloordiv', 'ilshift', 'imatmul', 'imod', 'imul', 'index', 'indexOf', 'inv', 'invert', 'ior', 'ipow', 'irshift', 'is_', 'is_not', 'isub', 'itruediv', 'ixor', 'le', 'length_hint', 'lshift', 'lt', 'matmul', 'mod', 'mul', 'ne', 'neg', 'not_', 'or_', 'pos', 'pow', 'rshift', 'setitem', 'sub', 'truediv', 'truth', 'xor'], 'lower_class': ['attrgetter', 'itemgetter', 'methodcaller'], 'datamodel_attribute': ['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__'], 'datamodel_method': ['__abs__', '__add__', '__and__', '__call__', '__concat__', '__contains__', '__delitem__', '__eq__', '__floordiv__', '__ge__', '__getitem__', '__gt__', '__iadd__', '__iand__', '__iconcat__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__inv__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__itruediv__', '__ixor__', '__le__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__not__', '__or__', '__pos__', '__pow__', '__rshift__', '__setitem__', '__sub__', '__truediv__', '__xor__'], 'internal_method': ['_abs']}
In other words, the datamodel method defined in the class defines the behaviour of the builtins
function when used on an instance of the object
class:
help(object)
Help on class object in module builtins: class object | The base class of the class hierarchy. | | When called, it accepts no arguments and returns a new featureless | instance that has no instance attributes and cannot be given any. | | Built-in subclasses: | anext_awaitable | async_generator | async_generator_asend | async_generator_athrow | ... and 90 other subclasses | | Methods defined here: | | __delattr__(self, name, /) | Implement delattr(self, name). | | __dir__(self, /) | Default dir() implementation. | | __eq__(self, value, /) | Return self==value. | | __format__(self, format_spec, /) | Default object formatter. | | Return str(self) if format_spec is empty. Raise TypeError otherwise. | | __ge__(self, value, /) | Return self>=value. | | __getattribute__(self, name, /) | Return getattr(self, name). | | __getstate__(self, /) | Helper for pickle. | | __gt__(self, value, /) | Return self>value. | | __hash__(self, /) | Return hash(self). | | __init__(self, /, *args, **kwargs) | Initialize self. See help(type(self)) for accurate signature. | | __le__(self, value, /) | Return self<=value. | | __lt__(self, value, /) | Return self<value. | | __ne__(self, value, /) | Return self!=value. | | __reduce__(self, /) | Helper for pickle. | | __reduce_ex__(self, protocol, /) | Helper for pickle. | | __repr__(self, /) | Return repr(self). | | __setattr__(self, name, value, /) | Implement setattr(self, name, value). | | __sizeof__(self, /) | Size of object in memory, in bytes. | | __str__(self, /) | Return str(self). | | ---------------------------------------------------------------------- | Class methods defined here: | | __init_subclass__(...) from builtins.type | This method is called when a class is subclassed. | | The default implementation does nothing. It may be | overridden to extend subclasses. | | __subclasshook__(...) from builtins.type | Abstract classes can override this to customize issubclass(). | | This is invoked early on by abc.ABCMeta.__subclasscheck__(). | It should return True, False or NotImplemented. If it returns | NotImplemented, the normal algorithm is used. Otherwise, it | overrides the normal algorithm (and the outcome is cached). | | ---------------------------------------------------------------------- | Static methods defined here: | | __new__(*args, **kwargs) from builtins.type | Create and return a new object. See help(type) for accurate signature. | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __class__ = <class 'type'> | type(object) -> the object's type | type(name, bases, dict, **kwds) -> a new type
The Docstring (__doc__)¶
The datamodel __doc__
is the docstring:
object.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
This is normally accessed in IPython using the ?
operator:
object?
Init signature: object() Docstring: The base class of the class hierarchy. When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any. Type: type Subclasses: type, async_generator, bytearray_iterator, bytearray, bytes_iterator, bytes, builtin_function_or_method, callable_iterator, PyCapsule, cell, ...
More details can be seen using the help
function.
Instantiation and Construction (__init__ and __new__)¶
Each class has an initialisation signature __init__
which is a method used by the constructor __new__
to supply a new instance self
with the instance data when it is constructed. When the ?
is used on the class name, the docstring associated with the initialisation signature will show. The docstring for the object
class states that no instance data is required in order to initialise an instance however initialisation data is optional for the str
and int
instances:
object?
Init signature: object() Docstring: The base class of the class hierarchy. When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any. Type: type Subclasses: type, async_generator, bytearray_iterator, bytearray, bytes_iterator, bytes, builtin_function_or_method, callable_iterator, PyCapsule, cell, ...
str?
Init signature: str(self, /, *args, **kwargs) Docstring: str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. Type: type Subclasses: StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...
int?
Init signature: int(self, /, *args, **kwargs) Docstring: int([x]) -> integer int(x, base=10) -> integer Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero. If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int('0b100', base=0) 4 Type: type Subclasses: bool, IntEnum, IntFlag, _NamedIntConstant, Handle
The following object
instances can be assigned:
object_instance1 = object()
object_instance2 = object()
Instances for builtins
classes can also be assigned:
str_instance1 = str('hello')
bytes_instance1 = bytes(b'hello')
bytearray_instance1 = bytearray(b'hello')
int_instance1 = int(1)
bool_instance1 = bool(True)
float_instance1 = float(3.14)
variables()
Type | Size/Shape | Value | |
---|---|---|---|
Instance Name | |||
return_val_r | str | 12 | hello world! |
return_val_p | NoneType | None | |
str_instance1 | str | 5 | hello |
bytes_instance1 | bytes | 5 | b'hello' |
bytearray_instance1 | bytearray | 5 | bytearray(b'hello') |
int_instance1 | int | 1 | |
bool_instance1 | bool | True | |
float_instance1 | float | 3.14 |
Although the docstring for __repr__
is examined, it is __new__
that is invoked to create a new instance during instantiation and __new__
calls __init__
to initialise this instance with instance data.
As many of these builtins
classes are frequently used, they can be instantiated shorthand:
str_instance2 = 'hello'
bytes_instance2 = b'hello'
bytearray_instance2 = bytearray(b'hello')
int_instance2 = 1
bool_instance2 = True
float_instance2 = 3.14
variables()
Type | Size/Shape | Value | |
---|---|---|---|
Instance Name | |||
return_val_r | str | 12 | hello world! |
return_val_p | NoneType | None | |
str_instance1 | str | 5 | hello |
bytes_instance1 | bytes | 5 | b'hello' |
bytearray_instance1 | bytearray | 5 | bytearray(b'hello') |
int_instance1 | int | 1 | |
bool_instance1 | bool | True | |
float_instance1 | float | 3.14 | |
str_instance2 | str | 5 | hello |
bytes_instance2 | bytes | 5 | b'hello' |
bytearray_instance2 | bytearray | 5 | bytearray(b'hello') |
int_instance2 | int | 1 | |
bool_instance2 | bool | True | |
float_instance2 | float | 3.14 |
Informal and Formal String Representation (__str__ and __repr__)¶
A Python object
has two types of str
representation, formal and informal. The formal representation is shown in the cell output:
object_instance1
<object at 0x27d9c2606a0>
object_instance2
<object at 0x27d9c260640>
The informal representation is shown when the instance is printed:
print(object_instance1)
<object object at 0x0000027D9C2606A0>
print(object_instance2)
<object object at 0x0000027D9C260640>
The datamodel methods __repr__
and __str__
define the behaviour of the repr
function and str
class:
repr?
Signature: repr(obj, /) Docstring: Return the canonical string representation of the object. For many object types, including most builtins, eval(repr(obj)) == obj. Type: builtin_function_or_method
str?
Init signature: str(self, /, *args, **kwargs) Docstring: str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. Type: type Subclasses: StrEnum, DeferredConfigString, FoldedCase, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ...
repr(object_instance1)
'<object object at 0x0000027D9C2606A0>'
str(object_instance1)
'<object object at 0x0000027D9C2606A0>'
For the object
class both str
representations are identical, the difference can be seen more clearly in the str
class itself because the str
has escape characters that are used for formatting. The escape characters are processed when printing applying the formatting. The following str
instance str_instance3
includes a tab escape character \t
:
str_instance3 = 'hello\tworld!'
Notice the difference in the cell output which shows the escape character and the print out which instead processes the escape character applying the formatting:
str_instance3
'hello\tworld!'
print(str_instance3)
hello world!
The formal and informal str
instances can now be examined:
repr(str_instance3)
"'hello\\tworld!'"
str(str_instance3)
'hello\tworld!'
Notice that casting the str
instance to a str
leaves it unchanged. However when the formal representation is used that additions are added. The '
used to enclose the str
and the \
used to indicate an escape character are now themselves incorporated as part of the str
. Since the str
now contains a str
literal, double quotations are used to enclose it. Notice also printing the formal representation matches the cell output of the informal representation:
print(repr(str_instance3))
'hello\tworld!'
Another example where the formal and informal representation differ is that of the Fraction
class. It can be imported from the fractions
module:
from fractions import Fraction
fraction_instance1 = Fraction(3, 4)
The difference can be seen in the cell output and the print out of the Fraction
instance:
fraction_instance1
Fraction(3, 4)
print(fraction_instance1)
3/4
Notice the formal representation shown in the cell output matches how the class is input whereas the informal representation shows a simplified representation which is easier to read for printing.
repr(fraction_instance1)
'Fraction(3, 4)'
str(fraction_instance1)
'3/4'
The Directory of Identifiers (__dir__)¶
The __dir__
datamodel method defines the behaviour of the dir
function:
dir?
Docstring: Show attributes of an object. If called without an argument, return the names in the current scope. Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it. If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns: for a module object: the module's attributes. for a class object: its attributes, and recursively the attributes of its bases. for any other object: its attributes, its class's attributes, and recursively the attributes of its class's base classes. Type: builtin_function_or_method
Previously this was used on the current scope, however an instance can be examined:
dir(object_instance1)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
dir
shows all the identifiers alphabetically but does not group them, like in the custom function dir2
.
The Class Type (__class__)¶
The datamodel method class can be used to return the class of an instance. If the method is not called, details about the class will display:
object_instance1.__class__
object
If it is called, it will initialise another instance of this class:
object_instance1.__class__()
<object at 0x27d9c260620>
The datamodel method __class__
defines the behaviour of the builtins
class type
(type
is a class and not a function) which returns the class type:
type?
Init signature: type(self, /, *args, **kwargs) Docstring: type(object) -> the object's type type(name, bases, dict, **kwds) -> a new type Type: type Subclasses: ABCMeta, EnumType, _AnyMeta, NamedTupleMeta, _TypedDictMeta, _DeprecatedType, _ABC, MetaHasDescriptors, PyCStructType, UnionType, ...
type(object_instance1)
object
type(str_instance1)
str
type(int_instance1)
int
Note that the __class__
defines the behaviour of the builtins
identifier type
and not the keyword class
which is reserved for creating a class. A class looks something like:
class Coordinate(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f'Coordinate(x={self.x}, y={self.y})'
def __str__(self):
return f'(x={self.x}, y={self.y})'
def distance_to(self, other):
"""
Calculate the Euclidean distance between two coordinates.
"""
dx = self.x - other.x
dy = self.y - other.y
return (dx**2 + dy**2)**0.5
__hash__ = None
dimension = 2
The first line is the class declaration. The parenthesis contain the base classes which in this case is object
. Everything in Python is based on the object
class and if left unspecified, the base class will default to object
:
class Coordinate(object):
Notice that under the class declaration is essentially a grouping of functions. The first one is the initialisation signature which was previously discussed. In this case two attributes x
and y
are required. Notice using ?
on the class displays details about the initialisation signature:
Coordinate?
Init signature: Coordinate(x, y) Docstring: <no docstring> Type: type Subclasses:
Two co-ordinate instances can be instantiated:
coordinate_instance1 = Coordinate(1, 2)
coordinate_instance2 = Coordinate(3, 4)
Notice as the class has __repr__
and __str__
defined, the builtins
function repr
and class str
can be used. This difference can be seen by examining an instance in the cell output and printing it:
coordinate_instance1
Coordinate(x=1, y=2)
print(coordinate_instance1)
(x=1, y=2)
The co-ordinate instance has 2 instance specific attributes:
coordinate_instance1.x
1
coordinate_instance1.y
2
And a class attribute which is an instance defined in the class and therefore the same for all instances:
coordinate_instance1.dimension
2
In the above:
coordinate_instance1.x
The instance name is coordinate_instance1
. Notice that self
is seen as the first input argument for all the instance methods above:
def __repr__(self):
return f'Coordinate(x={self.x}, y={self.y})'
self
is a term which essentially means this instance. When the above methods were called from the instance coordinate_instance1
, the instance name was implied and the x
and y
values were obtained from the instance data supplied to coordinate_instance1
when it was initialised.
The distance formula has this instance self
and also requires another instance other
:
def distance_to(self, other):
"""
Calculate the Euclidean distance between two coordinates.
"""
dx = self.x - other.x
dy = self.y - other.y
return (dx**2 + dy**2)**0.5
It is essentially an implementation of Pythagoras theorem and requires the second co-ordinate other
to calculate the distance from. Because this method has no leading or trialing underscores, it is a regular instance method and not a datamodel method. This means the function should be used directly as there is no corresponding method or class in builtins
for regular instance methods:
coordinate_instance1.distance_to(other=coordinate_instance2)
2.8284271247461903
Note because the distance_to
instance method was called from the instance coordiante_instance1
, this instance self
was implied.
If the instance method is called from the class it will require the instance self
to work on:
Coordinate.distance_to(self=coordinate_instance1, other=coordinate_instance2)
2.8284271247461903
Normally instance methods have a /
preceding self
which means self
(and in this case other
) have to be provided positionally:
def distance_to(self, other, /):
"""
Calculate the Euclidean distance between two coordinates.
"""
dx = self.x - other.x
dy = self.y - other.y
return (dx**2 + dy**2)**0.5
Like the following:
coordinate_instance1.distance_to(coordinate_instance2)
2.8284271247461903
Coordinate.distance_to(coordinate_instance1, coordinate_instance2)
2.8284271247461903
Because this Coordinate
class uses object
as a base
class, it inherits all the identifiers from the object
class. This can be seen if dir2
(which is a modified version of dir
which in turn invokes __dir__
which is inherited from the object
class):
dir2(coordinate_instance1)
{'attribute': ['dimension', 'x', 'y'], 'method': ['distance_to'], 'datamodel_attribute': ['__dict__', '__doc__', '__hash__', '__module__', '__weakref__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
Notice all the identifiers from object
are consistent:
dir2(coordinate_instance1, object, consistent_only=True)
{'datamodel_attribute': ['__doc__', '__hash__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
Sometimes it can be useful to see only the unique identifiers in the class:
dir2(coordinate_instance1, object, unique_only=True)
{'attribute': ['dimension', 'x', 'y'], 'method': ['distance_to'], 'datamodel_attribute': ['__dict__', '__module__', '__weakref__']}
The classes method resolution order can be examined:
Coordinate.mro()
[__main__.Coordinate, object]
The list above is an instruction to first look for a method definition in the Coordinate
class (which is defined in the notebook file which recall has the name '__main__'
) and then fallback on the object
class.
Note that the help
displays methods defined here:
help(Coordinate)
Help on class Coordinate in module __main__: class Coordinate(builtins.object) | Coordinate(x, y) | | Methods defined here: | | __init__(self, x, y) | Initialize self. See help(type(self)) for accurate signature. | | __repr__(self) | Return repr(self). | | __str__(self) | Return str(self). | | distance_to(self, other) | Calculate the Euclidean distance between two coordinates. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables | | __weakref__ | list of weak references to the object | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __hash__ = None | | dimension = 2
For details about the other methods inherited by object
, help
can be used on the object
as previously seen:
help(object)
Help on class object in module builtins: class object | The base class of the class hierarchy. | | When called, it accepts no arguments and returns a new featureless | instance that has no instance attributes and cannot be given any. | | Built-in subclasses: | anext_awaitable | async_generator | async_generator_asend | async_generator_athrow | ... and 90 other subclasses | | Methods defined here: | | __delattr__(self, name, /) | Implement delattr(self, name). | | __dir__(self, /) | Default dir() implementation. | | __eq__(self, value, /) | Return self==value. | | __format__(self, format_spec, /) | Default object formatter. | | Return str(self) if format_spec is empty. Raise TypeError otherwise. | | __ge__(self, value, /) | Return self>=value. | | __getattribute__(self, name, /) | Return getattr(self, name). | | __getstate__(self, /) | Helper for pickle. | | __gt__(self, value, /) | Return self>value. | | __hash__(self, /) | Return hash(self). | | __init__(self, /, *args, **kwargs) | Initialize self. See help(type(self)) for accurate signature. | | __le__(self, value, /) | Return self<=value. | | __lt__(self, value, /) | Return self<value. | | __ne__(self, value, /) | Return self!=value. | | __reduce__(self, /) | Helper for pickle. | | __reduce_ex__(self, protocol, /) | Helper for pickle. | | __repr__(self, /) | Return repr(self). | | __setattr__(self, name, value, /) | Implement setattr(self, name, value). | | __sizeof__(self, /) | Size of object in memory, in bytes. | | __str__(self, /) | Return str(self). | | ---------------------------------------------------------------------- | Class methods defined here: | | __init_subclass__(...) from builtins.type | This method is called when a class is subclassed. | | The default implementation does nothing. It may be | overridden to extend subclasses. | | __subclasshook__(...) from builtins.type | Abstract classes can override this to customize issubclass(). | | This is invoked early on by abc.ABCMeta.__subclasscheck__(). | It should return True, False or NotImplemented. If it returns | NotImplemented, the normal algorithm is used. Otherwise, it | overrides the normal algorithm (and the outcome is cached). | | ---------------------------------------------------------------------- | Static methods defined here: | | __new__(*args, **kwargs) from builtins.type | Create and return a new object. See help(type) for accurate signature. | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __class__ = <class 'type'> | type(object) -> the object's type | type(name, bases, dict, **kwds) -> a new type
Similar behaviour is seen when other builtins
classes are examined:
dir2(str, object, consistent_only=True)
{'datamodel_attribute': ['__doc__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
dir2(str, object, unique_only=True)
{'method': ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'], 'datamodel_method': ['__add__', '__contains__', '__getitem__', '__getnewargs__', '__iter__', '__len__', '__mod__', '__mul__', '__rmod__', '__rmul__']}
dir2(int, object, consistent_only=True)
{'datamodel_attribute': ['__doc__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
dir2(int, object, unique_only=True)
{'attribute': ['denominator', 'imag', 'numerator', 'real'], 'method': ['as_integer_ratio', 'bit_count', 'bit_length', 'conjugate', 'from_bytes', 'is_integer', 'to_bytes'], 'datamodel_method': ['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__divmod__', '__float__', '__floor__', '__floordiv__', '__getnewargs__', '__index__', '__int__', '__invert__', '__lshift__', '__mod__', '__mul__', '__neg__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__sub__', '__truediv__', '__trunc__', '__xor__']}
The identifiers in these classes will be explored in subsequent notebooks, what is important to note is that they both follow the design pattern of an object
and therefore have consistent identifiers to the object
class.
Comparison Datamodel Methods (__eq__, __ne__, __lt__, __le__, __gt__ and __ge__)¶
The following datamodel methods are for comparison operators. If the docstring of each datamodel identifier is examined the docstring highlights what operator to use:
object.__eq__?
Signature: object.__eq__(self, value, /) Call signature: object.__eq__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__eq__' of 'object' objects> Namespace: Python builtin Docstring: Return self==value.
object.__ne__?
Signature: object.__ne__(self, value, /) Call signature: object.__ne__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__ne__' of 'object' objects> Namespace: Python builtin Docstring: Return self!=value.
object.__lt__?
Signature: object.__lt__(self, value, /) Call signature: object.__lt__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__lt__' of 'object' objects> Namespace: Python builtin Docstring: Return self<value.
object.__le__?
Signature: object.__le__(self, value, /) Call signature: object.__le__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__le__' of 'object' objects> Namespace: Python builtin Docstring: Return self<=value.
object.__gt__?
Signature: object.__gt__(self, value, /) Call signature: object.__gt__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__gt__' of 'object' objects> Namespace: Python builtin Docstring: Return self>value.
object.__ge__?
Signature: object.__ge__(self, value, /) Call signature: object.__ge__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__ge__' of 'object' objects> Namespace: Python builtin Docstring: Return self>=value.
Notice that ==
checks for equality and should not be confused with =
for assignment. For object
instances, a check is made for the location each object
instance is stored in memory:
object_instance1
<object at 0x27d9c2606a0>
object_instance2
<object at 0x27d9c260640>
These two object
instances are stored in different locations in memory and therefore:
object_instance1 == object_instance2
False
If assignment is made to an existing instance:
object_instance3 = object_instance1
The assignment operator conceptually assigns the value on the right to the new instance name on the left.
In the case above the instance name on the right acts like a label and retrieves the object
instance affixed to the label. This object
instance now effectively has two labels:
object_instance1
<object at 0x27d9c2606a0>
object_instance3
<object at 0x27d9c2606a0>
Since both are the same object
in memory therefore:
object_instance1 == object_instance3
True
The not equal to !=
operator reverses the results above:
object_instance1 == object_instance2
False
object_instance1 == object_instance2
False
Although the slot wrappers for the other 4 comparison operators are defined, they are not implemented in the object
class which is not ordinal. They can be examined in an ordinal class such as an int
. The greater than and less than correspond to the operators >
and <
:
1 > 2
False
1 < 2
True
A check for greater than or equal to is commonly made. This can be done longhand or using the >=
and <=
operators:
(1 > 2) or (1 == 2)
False
1 >= 2
False
(1 < 2) or (1 == 2)
True
1 <= 2
True
Memory Size (__sizeof__)¶
The datamodel __sizeof__
retrieves the size of the instance in memory in bytes
:
object_instance1.__sizeof__?
Signature: object_instance1.__sizeof__() Docstring: Size of object in memory, in bytes. Type: builtin_function_or_method
The docstring doesn't mention a return
value to a builtins
. Typically the datamodel identifier isn't used directly but defines the behaviour of sys.getsizeof
:
import sys
sys.getsizeof(object_instance1)
16
object_instance1.__sizeof__()
16
Immutable Hash Value (__hash__) and Identification¶
In Python a class can be immutable or mutable. An instance of an immutable class is essentially read only and cannot be modified after instantiation. Because it cannot be modified it has a constant hash checksum value and a __hash__
datamodel method:
object_instance1.__hash__
<method-wrapper '__hash__' of object object at 0x0000027D9C2606A0>
This datamodel method defines the behaviour of the builtins
function hash
:
object_instance1.__hash__?
Signature: object_instance1.__hash__() Call signature: object_instance1.__hash__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__hash__' of object object at 0x0000027D9C2606A0> Docstring: Return hash(self).
hash(object_instance1)
171157119082
hash(object_instance2)
171157119076
A mapping is essentially a Collection of items where each item has key which is an immutable instance and an associated value that is a mutable or immutable Python instance.
Essentially conceptualise the mapping as collection of mailboxes. Each mailbox has a lock that needs a key to open. Because the key needs to fit the lock, the key shape cannot be changed (mutated). The key is used to open the mailbox and opening the mailbox retrieves a reference to the Python instance.
Immutable instances can be used as keys in mappings such as dict
:
mapping = {object_instance1: object(),
str_instance1: object(),
bytes_instance1: object(),
int_instance1: object(),
float_instance1: object()}
mapping
{<object at 0x27d9c2606a0>: <object at 0x27d9c260720>, 'hello': <object at 0x27d9c260730>, b'hello': <object at 0x27d9c260790>, 1: <object at 0x27d9c2607a0>, 3.14: <object at 0x27d9c260610>}
Each of the keys has a unique hash value:
hash(object_instance1)
171157119082
hash(str_instance1)
5544572916755683100
hash(bytes_instance1)
5544572916755683100
hash(int_instance1)
1
hash(float_instance1)
322818021289917443
Notice that the hash value returned is an int
instance. For the bytearray
instance and Coordinate
instance notice the __hash__
datamodel identifier is an attribute with a value that is None
which means the hash
function cannot be used:
bytearray_instance1.__hash__ == None
True
coordinate_instance1.__hash__ == None
True
Each Python instance also has an identification:
id(object_instance1)
2738513905312
id(str_instance1)
2738473804416
id(bytes_instance1)
2738534548752
id(bytearray_instance1)
2738534700272
id(coordinate_instance1)
2738535081616
An immutable instance has both an identification and a hash
:
id('Hello'), hash('Hello')
(2738535514736, -789358765565447528)
id('Hello World!'), hash('Hello World!')
(2738535497904, -1376674368131453453)
If the str
instance is assigned to greeting
:
greeting = 'Hello'
Then greeting
can be conceptualised as a label which can be used to retrieve the str
instance 'Hello'
. Notice the id
and hash
value match that seen earlier because it is the same instance:
id(greeting), hash(greeting)
(2738535514736, -789358765565447528)
If reassignment is used, the instance name greeting
which still being conceptualised as a label is removed from the old str
instance 'Hello'
and placed on the new str
instance 'Hello World!'
:
greeting = 'Hello World!'
id(greeting), hash(greeting)
(2738535502896, -1376674368131453453)
Notice the id
and hash
value match those seen for 'Hello World!'
because it is now labelling that instance. It no longer matches the id
and hash
value for 'Hello'
as it is no longer affixed to that instance.
The label is also known as a pointer as it points to an instance.
Get, Set and Delete Attribute (__getattribute__, __setattr__ and _delattr\_)¶
The __getattribute__
is a immutable method which can be used to retrieve an attribute using the name of the attribute as a str
. The object
class only has a small number of datamodel attributes:
object.__getattribute__?
Signature: object.__getattribute__(self, name, /) Call signature: object.__getattribute__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__getattribute__' of 'object' objects> Namespace: Python builtin Docstring: Return getattr(self, name).
For example the attribute __doc__
can be retrieved using getattr
and the str
of the attribute '__doc__'
:
getattr(object_instance1, '__doc__')
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
Normally an attribute is accessed using a dot:
object_instance1.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
However having the ability to select an attribute using a str
is useful, particularly because dir
outputs identifiers as a list of str
instances:
dir(object_instance1)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
If the mutable instance coordinate_instance1
of the Coordinate
class is examined. The attribute x
can be retrieved:
getattr(coordinate_instance1, 'x')
1
coordinate_instance1.x
1
The object
instance as previously discussed is immutable and the slot wrappers __setattr__
and __delatt__
which are used to set and delete an attribute are not implemented:
object.__setattr__?
Signature: object.__setattr__(self, name, value, /) Call signature: object.__setattr__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__setattr__' of 'object' objects> Namespace: Python builtin Docstring: Implement setattr(self, name, value).
object.__delattr__?
Signature: object.__delattr__(self, name, /) Call signature: object.__delattr__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__delattr__' of 'object' objects> Namespace: Python builtin Docstring: Implement delattr(self, name).
The Coordinate
instance on the other hand is mutable:
dir2(coordinate_instance1)
{'attribute': ['dimension', 'x', 'y'], 'method': ['distance_to'], 'datamodel_attribute': ['__dict__', '__doc__', '__hash__', '__module__', '__weakref__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
id(Coordinate)
2738490505792
And the attribute x
can be set to a new value:
setattr(coordinate_instance1, 'x', 200)
Notice in the above there is no return value because coordinate_instance1
is mutated (updated inplace). The id does not change but the value of x
can be seen to be updated:
id(coordinate_instance1)
2738535081616
coordinate_instance1.x
200
This can also be done using assignment of the attribute:
coordinate_instance1.x = 250
Note again that there is no return value because coordinate_instance1
is mutated (updated inplace). The change can be seen by viewing the attribute:
coordinate_instance1.x
250
If the identifiers of coordinate_instance1
are examined:
dir2(coordinate_instance1)
{'attribute': ['dimension', 'x', 'y'], 'method': ['distance_to'], 'datamodel_attribute': ['__dict__', '__doc__', '__hash__', '__module__', '__weakref__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
Notice the attributes can be deleted using the builtins
function delattr
:
delattr(coordinate_instance1, 'x')
Or more commonly using the del
keyword:
del coordinate_instance1.y
The change can be seen when coordinate_instance1
is examined:
dir2(coordinate_instance1)
{'attribute': ['dimension'], 'method': ['distance_to'], 'datamodel_attribute': ['__dict__', '__doc__', '__hash__', '__module__', '__weakref__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
Note that the id has not changed:
id(coordinate_instance1)
2738535081616
Pickle Helper Methods (__reduce__, __reduce_ex__ and __getstate__)¶
The three datamodel methods __reduce__
, __reduce_ex__
and __getstate__
are used by the pickle
module:
object.__reduce__?
Signature: object.__reduce__(self, /) Docstring: Helper for pickle. Type: method_descriptor
object.__reduce_ex__?
Signature: object.__reduce_ex__(self, protocol, /) Docstring: Helper for pickle. Type: method_descriptor
object.__getstate__?
Signature: object.__getstate__(self, /) Docstring: Helper for pickle. Type: method_descriptor
The pickle
module can be imported:
import pickle
And object_instance1
:
object_instance1
<object at 0x27d9c2606a0>
Can be serialised to a bytes
string instance using:
pickle.dumps(object_instance1)
b'\x80\x04\x95\x1a\x00\x00\x00\x00\x00\x00\x00\x8c\x08builtins\x94\x8c\x06object\x94\x93\x94)\x81\x94.'
And object_instance1
can be loaded from this bytes
string instance using:
pickle.loads(b'\x80\x04\x95\x1a\x00\x00\x00\x00\x00\x00\x00\x8c\x08builtins\x94\x8c\x06object\x94\x93\x94)\x81\x94.')
<object at 0x27d9c260820>
If dump
and load
are instead used it can be stored to a file:
with open('object_instance1.pk1', mode='wb') as file:
pickle.dump(object_instance1, file)
This file is in binary but can be viewed in VSCode in the HexEditor by Microsoft extension is installed object_instance1.pk1.
with open('object_instance1.pk1', mode='rb') as file:
loaded_data = pickle.load(file)
loaded_data
<object at 0x27d9c260850>
Subclass Methods (__init_subclass__ and __subclasshook__)¶
The __init_subclass__
and __subclasshook__
object based datamodel methods are sued when creating subclasses and a design pattern using abstract base classes:
__init_subclass__
gets called when a subclass is created but is not implemented by default:
object.__init_subclass__?
Docstring: This method is called when a class is subclassed. The default implementation does nothing. It may be overridden to extend subclasses. Type: builtin_function_or_method
Supposing the base subclass Coordinate
is created. The __init_subclass__
method can be defined:
class Coordinate(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __init_subclass__(cls, **kwargs):
print(f"__init_subclass__ called for {cls} with kwargs {kwargs}")
super().__init_subclass__(**kwargs)
def __repr__(self):
return f'Coordinate(x={self.x}, y={self.y})'
def __str__(self):
return f'(x={self.x}, y={self.y})'
def distance_to(self, other):
"""
Calculate the Euclidean distance between two coordinates.
"""
dx = self.x - other.x
dy = self.y - other.y
return (dx**2 + dy**2)**0.5
__hash__ = None
dimension = 2
When this is subclassed to 3D the additional print statement in the __init_subclass__
is displayed:
class Coordinate3D(Coordinate):
def __init__(self, x, y, z):
super().__init__(x, y)
self.z = z
def __repr__(self):
return f'Coordinate3D(x={self.x}, y={self.y}, z={self.z})'
def __str__(self):
return f'{super().__repr__()[:-1]}, z={self.z})'
def distance_to(self, other):
"""
Calculate the Euclidean distance between two coordinates.
"""
dx = self.x - other.x
dy = self.y - other.y
dz = self.z - other.z
return (dx**2 + dy**2 + dz**2)**0.5
dimension = 3
__init_subclass__ called for <class '__main__.Coordinate3D'> with kwargs {}
coordinate_instance1_3d = Coordinate3D(1, 2, 3)
coordinate_instance1_3d
Coordinate3D(x=1, y=2, z=3)
print(coordinate_instance1_3d)
Coordinate(x=1, y=2, z=3)
The __subclasshook__
is used to construct an abstract class design pattern:
object.__subclasshook__?
Docstring: Abstract classes can override this to customize issubclass(). This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached). Type: builtin_function_or_method
For example the AbstractCoordinate
class can be constructed:
import abc
class AbstractCoordinate(abc.ABC):
@classmethod
def __subclasshook__(cls, subclass):
return(all([hasattr(subclass, '__init__'),
callable(subclass.__init__),
hasattr(subclass, '__repr__'),
callable(subclass.__repr__),
hasattr(subclass, '__str__'),
callable(subclass.__str__),
hasattr(subclass, 'distance_to'),
callable(subclass.distance_to),
hasattr(subclass, '__hash__'),
hasattr(subclass, 'dimension')]))
def __init__(self):
pass
def __repr__(self):
pass
def __str__(self):
pass
def distance_to(self):
pass
__hash__ = None
dimension = None
The base class for an abstract class is the Abstract Base Class ABC
which is found in the module abc
:
import abc
class AbstractCoordinate(abc.ABC)
The __subclasshook__
has a return statement that is either True
or False
:
@classmethod
def __subclasshook__(cls, subclass):
return(all([]))
In this case a list of conditions is supplied to the builtins
function True
which is True
only if all the conditions are True
:
all([True, True, True])
True
all([True, False, True])
False
The Coordinate
class follows this design pattern so:
AbstractCoordinate.__subclasshook__(Coordinate)
True
issubclass(Coordinate, AbstractCoordinate)
True
issubclass(Coordinate, object)
True
This covers all of the identifiers seen for:
dir2(object)
{'datamodel_attribute': ['__doc__'], 'datamodel_method': ['__class__', '__delattr__', '__dir__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']}
All the other commonly used builtins
classes are based on the design principle of the object
. Therefore the datamodel methods examined above are applicable for these other classes. The next tutorials will examine identifiers available in these classes:
dir2(__builtins__, print_output=False)['lower_class']
['bool', 'bytearray', 'bytes', 'classmethod', 'complex', 'dict', 'enumerate', 'filter', 'float', 'frozenset', 'int', 'list', 'map', 'memoryview', 'object', 'property', 'range', 'reversed', 'set', 'slice', 'staticmethod', 'str', 'super', 'tuple', 'type', 'zip']