The str class is an abbreviation for a string of Unicode characters. The str class is an immutable ordered Collection of Unicode characters. Immutable means once it has been instantiated it cannot be later modified.
Table of contents
- Initialisation Signature
- Identifiers
- Data Model Identifiers
- Calling a Method
- Case Identifiers
- Valid Identifier Names
- ASCII
- Escape Characters
- Single vs Double quotations
- Multiline string
- ASCII and Unicode Escape Characters
- Raw String
- Formatted Strings
- Object Design Pattern
- Immutable Ordered Collection ABC Design Pattern
- Fill, Center and Justify
- Binary Operators
- Binary Comparison Operators
- Splitting and Joining Strings
Initialisation Signature
Inputting str() will display the docstring of the initialisation signature of the string class as a popup balloon. Some IDEs such as JupyterLab may require the keypress shift ⇧ and tab ↹ to invoke the docstring:
Init signature: str(self, /, *args, **kwargs) Docstring: str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
Inputting:
? str
Will output the docstring in the cell output of an interactive Python notebook or an ipython cell.
The purpose of the initialisation signature is to provide the data required to initialise a new instance. Under the hood the __new__ data model constructor will create a new instance and invoke the __init__ data model initialiser to initialise this instance with instance data.
The initialisation signature shows alternative ways of supplying instance data for a string.
The first way takes an already existing string instance self. The comma , is used as a delimiter to separate input arguments. In Python input arguments can be provided positionally or as named input arguments. Note when input arguments are placed before a / they will only be accepted as positional input arguments.
str(self, /, *args, **kwargs)
For example:
str('hello')
'hello'
A string is typically instantiated directly using:
'hello'
'hello'
Notice the difference in syntax highlighting between:
hello
And:
'hello'
The former is an object name, if this object name does not exist, Python will display a NameError:
NameError: name 'hello' is not defined
The latter is a string and the contents of a string are enclosed in single quotations ' '.
The second way uses a named keyword input argument object and assigns it to an empty string:
str(object='') -> str
This named input argument can be explicitly assigned to a new value:
str(object='hello')
'hello'
If the named input argument is not supplied, then object takes on its default value " returning an empty string:
str()
''
An instance name can be conceptualised as a label, that is used to reference an instance. In Python objects with no instance name have no reference and are immediately removed by Pythons garbage collection.
If the instance instead is instantiated to an instance name or label greeting, the ipython cell displays no output:
greeting = str(object='hello')
Notice the subtlety in spacing which follows Pythons PEP8 styling convention. The assignment operator to the instance name is subtly emphasised using the spacing. The keyword argument within the function call have no spacing as spacing within a function call is instead typically used with the , separator to visually separate out input arguments from one another.
Using the object name in another ipython cell displays the formal representation of the string:
greeting
'hello'
Some IDEs such as Spyder have a Variable Explorer and the instance name and associated value will display alongside the instance class type and other properties such as the length of the string:

Identifiers
Two instances can be created:
greeting = 'hello'
farewell = 'bye'
Notice that there is no output on the ipython console and both of these display on the variable explorer:

If the instance greeting is input followed by a dot . a list of identifiers will display. Some IDEs such as JupyterLab may require the keypress tab ↹ to invoke the list of identifiers:
- capitalize – function
- casefold – function
- center – function
- count – function
- encode – function
- endswith – function
- expandtabs – function
- find – function
- format – function
- format_map – function
- index – function
- isalnum – function
- isalpha – function
- isascii – function
- isdecimal – function
- isdigit – function
- isidentifier – function
- islower – function
- isalnumeric – function
- isprintable – function
- isspace – function
- istitle – function
- isupper – function
- join – function
- ljust – function
- lower – function
- lstrip – function
- maketrans – function
- partition – function
- removeprefix – function
- removesuffix – function
- replace – function
- rfind – function
- rindex – function
- rjust – function
- rpartition – function
- rsplit – function
- rstrip – function
- split – function
- splitlines – function
- startswith – function
- strip – function
- swapcase – function
- title – function
- translate – function
- upper – function
- zfill – function
If the instance farewell is input followed by a dot . a list of identifiers will display. Notice that these are the same list of identifiers:
- capitalize – function
- casefold -function
- center – function
- count – function
- encode – function
- endswith – function
- expandtabs – function
- find – function
- format – function
- format_map – function
- index – function
- isalnum – function
- isalpha – function
- isascii – function
- isdecimal – function
- isdigit – function
- isidentifier – function
- islower – function
- isalnumeric – function
- isprintable – function
- isspace – function
- istitle – function
- isupper – function
- join – function
- ljust – function
- lower – function
- lstrip – function
- maketrans – function
- partition – function
- removeprefix – function
- removesuffix – function
- replace – function
- rfind – function
- rindex – function
- rjust – function
- rpartition – function
- rsplit – function
- rstrip – function
- split – function
- splitlines – function
- startswith – function
- strip – function
- swapcase – function
- title – function
- translate – function
- upper – function
- zfill – function
This identifiers come from the string class. If str is input followed by a dot . an almost identical list of identifiers display:
- capitalize – function
- casefold -function
- center – function
- count – function
- encode – function
- endswith – function
- expandtabs – function
- find – function
- format – function
- format_map – function
- index – function
- isalnum – function
- isalpha – function
- isascii – function
- isdecimal – function
- isdigit – function
- isidentifier – function
- islower – function
- isalnumeric – function
- isprintable – function
- isspace – function
- istitle – function
- isupper – function
- join – function
- ljust – function
- lower – function
- lstrip – function
- maketrans – function
- mro – function
- partition – function
- removeprefix – function
- removesuffix – function
- replace – function
- rfind – function
- rindex – function
- rjust – function
- rpartition – function
- rsplit – function
- rstrip – function
- split – function
- splitlines – function
- startswith – function
- strip – function
- swapcase – function
- title – function
- translate – function
- upper – function
- zfill – function
Data Model Identifiers
There is an addition called mro, which stands for method resolution order. If str.mro() is input the docstring should display:
Signature: str.mro() Docstring: Return a type's method resolution order. Type: builtin_function_or_method
Notice that there are no input arguments as the mro will return the method resolution order of the string class itself and therefore no additional information is required to call the function:
str.mro()
[str, object]
The method resolution order is a list which has two items, the str class itself and the object class. Python is an Object Orientated Programming (OOP) language and therefore everything is based on an object and the design pattern of an object. The object is an abstract class and isn't normally directly instantiated, its docstring can be examined by inputting object()
Init signature: object() Docstring: The base class of the class hierarchy. When called, it accepts no arguments and returns a new featureless instance that has no instance attributes and cannot be given any.
The identifiers from object can be seen by inputting object followed by a dot .
- mro – function
If object followed by a dot . and two underscores __ is input, additional hidden data model identifiers can be seen:
- __annotations__ – statement
- __base__ – statement
- __bases__ – statement
- __basicsize__ – statement
- __call__ – function
- __class__ – function
- __delattr__ – function
- __dict__ – statement
- __dict_offset__ – statement
- __dir__ – function
- __doc__ – statement
- __eq__ – function
- __flags__ – statement
- __format__ – function
- __getattribute__ – function
- __hash__ – function
- __init__ – function
- __init_subclass__ – function
- __instancecheck__ – function
- __itemsize__ – statement
- __module__ statement
- __mro__ – statement
- __name__ – statement
- __ne__ – function
- __new__ – function
- __prepare__ – function
- __qualname__ – statement
- __reduce__ – function
- __reduce_ex__ – function
- __repr__ – function
- __setattr__ – function
- __sizeof__ – function
- __slots__ – statement
- __str__ – function
- __subclasscheck__ – function
- __subclasses__ – function
- __text_signature__ – statement
- __weakrefoffset__ – statement
If str followed by a dot . and two underscores __ is input, all the data model identifiers from the object class can be seen alongside the additional data model identifiers:
- __add__ – function
- __ge__ – function
- __getitem__ – function
- __getnewargs__- function
- __ge__ – function
- __getitem__ – function
- __getnewargs__ – function
- __gt__ – function
- __iter__ – function
- __le__ – function
- __len__ – function
- __lt__ – function
- __mod__ – function
- __reversed__ – function
- __rmul__ – function
If the help function is used on the str and the object classes details about each identifier is given. The output splits the identifiers into four groups:
- methods – functions bound to an instance and designed to work on instance data.
- class methods – functions bound to a class. These are normally used as alternative constructors.
- static methods – functions in the classes namespace but not bound to an instance or class.
- data and other attributes
It is worthwhile browsing through the output given by help as it gives an overview of the identifiers used in the str and the object classes. Recall the method resolution order of the str class is [str, object] which means that the str class has all of the identifiers in the object class. Most of these identifiers are redefined in the str class, however some are not shown for example __reduce__. The identifiers not shown are copied over from the parent object class without modification and details about these identifiers, therefore should be read by examining the help of the object class.
If the help function is used on the str and the object classes details about each methods are given:
help(str)
Help on class str in module builtins:
class str(object)
| str(object='') -> str
| str(bytes_or_buffer[, encoding[, errors]]) -> str
|
| Create a new string object from the given object. If encoding or
| errors is specified, then the object must expose a data buffer
| that will be decoded using the given encoding and error handler.
| Otherwise, returns the result of object.__str__() (if defined)
| or repr(object).
| encoding defaults to sys.getdefaultencoding().
| errors defaults to 'strict'.
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
|
| __contains__(self, key, /)
| Return key in self.
|
| __eq__(self, value, /)
| Return self==value.
|
| __format__(self, format_spec, /)
| Return a formatted version of the string as described by format_spec.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(self, key, /)
| Return self[key].
|
| __getnewargs__(...)
|
| __gt__(self, value, /)
| Return self>value.
|
| __hash__(self, /)
| Return hash(self).
|
| __iter__(self, /)
| Implement iter(self).
|
| __le__(self, value, /)
| Return self<=value.
|
| __len__(self, /)
| Return len(self).
|
| __lt__(self, value, /)
| Return self<value.
|
| __mod__(self, value, /)
| Return self%value.
|
| __mul__(self, value, /)
| Return self*value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __repr__(self, /)
| Return repr(self).
|
| __rmod__(self, value, /)
| Return value%self.
|
| __rmul__(self, value, /)
| Return value*self.
|
| __sizeof__(self, /)
| Return the size of the string in memory, in bytes.
|
| __str__(self, /)
| Return str(self).
|
| capitalize(self, /)
| Return a capitalized version of the string.
|
| More specifically, make the first character have upper case and the rest lower
| case.
|
| casefold(self, /)
| Return a version of the string suitable for caseless comparisons.
|
| center(self, width, fillchar=' ', /)
| Return a centered string of length width.
|
| Padding is done using the specified fill character (default is a space).
|
| count(...)
| S.count(sub[, start[, end]]) -> int
|
| Return the number of non-overlapping occurrences of substring sub in
| string S[start:end]. Optional arguments start and end are
| interpreted as in slice notation.
|
| encode(self, /, encoding='utf-8', errors='strict')
| Encode the string using the codec registered for encoding.
|
| encoding
| The encoding in which to encode the string.
| errors
| The error handling scheme to use for encoding errors.
| The default is 'strict' meaning that encoding errors raise a
| UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
| 'xmlcharrefreplace' as well as any other name registered with
| codecs.register_error that can handle UnicodeEncodeErrors.
|
| endswith(...)
| S.endswith(suffix[, start[, end]]) -> bool
|
| Return True if S ends with the specified suffix, False otherwise.
| With optional start, test S beginning at that position.
| With optional end, stop comparing S at that position.
| suffix can also be a tuple of strings to try.
|
| expandtabs(self, /, tabsize=8)
| Return a copy where all tab characters are expanded using spaces.
|
| If tabsize is not given, a tab size of 8 characters is assumed.
|
| find(...)
| S.find(sub[, start[, end]]) -> int
|
| Return the lowest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Return -1 on failure.
|
| format(...)
| S.format(*args, **kwargs) -> str
|
| Return a formatted version of S, using substitutions from args and kwargs.
| The substitutions are identified by braces ('{' and '}').
|
| format_map(...)
| S.format_map(mapping) -> str
|
| Return a formatted version of S, using substitutions from mapping.
| The substitutions are identified by braces ('{' and '}').
|
| index(...)
| S.index(sub[, start[, end]]) -> int
|
| Return the lowest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Raises ValueError when the substring is not found.
|
| isalnum(self, /)
| Return True if the string is an alpha-numeric string, False otherwise.
|
| A string is alpha-numeric if all characters in the string are alpha-numeric and
| there is at least one character in the string.
|
| isalpha(self, /)
| Return True if the string is an alphabetic string, False otherwise.
|
| A string is alphabetic if all characters in the string are alphabetic and there
| is at least one character in the string.
|
| isascii(self, /)
| Return True if all characters in the string are ASCII, False otherwise.
|
| ASCII characters have code points in the range U+0000-U+007F.
| Empty string is ASCII too.
|
| isdecimal(self, /)
| Return True if the string is a decimal string, False otherwise.
|
| A string is a decimal string if all characters in the string are decimal and
| there is at least one character in the string.
|
| isdigit(self, /)
| Return True if the string is a digit string, False otherwise.
|
| A string is a digit string if all characters in the string are digits and there
| is at least one character in the string.
|
| isidentifier(self, /)
| Return True if the string is a valid Python identifier, False otherwise.
|
| Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
| such as "def" or "class".
|
| islower(self, /)
| Return True if the string is a lowercase string, False otherwise.
|
| A string is lowercase if all cased characters in the string are lowercase and
| there is at least one cased character in the string.
|
| isnumeric(self, /)
| Return True if the string is a numeric string, False otherwise.
|
| A string is numeric if all characters in the string are numeric and there is at
| least one character in the string.
|
| isprintable(self, /)
| Return True if the string is printable, False otherwise.
|
| A string is printable if all of its characters are considered printable in
| repr() or if it is empty.
|
| isspace(self, /)
| Return True if the string is a whitespace string, False otherwise.
|
| A string is whitespace if all characters in the string are whitespace and there
| is at least one character in the string.
|
| istitle(self, /)
| Return True if the string is a title-cased string, False otherwise.
|
| In a title-cased string, upper- and title-case characters may only
| follow uncased characters and lowercase characters only cased ones.
|
| isupper(self, /)
| Return True if the string is an uppercase string, False otherwise.
|
| A string is uppercase if all cased characters in the string are uppercase and
| there is at least one cased character in the string.
|
| join(self, iterable, /)
| Concatenate any number of strings.
|
| The string whose method is called is inserted in between each given string.
| The result is returned as a new string.
|
| Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'
|
| ljust(self, width, fillchar=' ', /)
| Return a left-justified string of length width.
|
| Padding is done using the specified fill character (default is a space).
|
| lower(self, /)
| Return a copy of the string converted to lowercase.
|
| lstrip(self, chars=None, /)
| Return a copy of the string with leading whitespace removed.
|
| If chars is given and not None, remove characters in chars instead.
|
| partition(self, sep, /)
| Partition the string into three parts using the given separator.
|
| This will search for the separator in the string. If the separator is found,
| returns a 3-tuple containing the part before the separator, the separator
| itself, and the part after it.
|
| If the separator is not found, returns a 3-tuple containing the original string
| and two empty strings.
|
| removeprefix(self, prefix, /)
| Return a str with the given prefix string removed if present.
|
| If the string starts with the prefix string, return string[len(prefix):].
| Otherwise, return a copy of the original string.
|
| removesuffix(self, suffix, /)
| Return a str with the given suffix string removed if present.
|
| If the string ends with the suffix string and that suffix is not empty,
| return string[:-len(suffix)]. Otherwise, return a copy of the original
| string.
|
| replace(self, old, new, count=-1, /)
| Return a copy with all occurrences of substring old replaced by new.
|
| count
| Maximum number of occurrences to replace.
| -1 (the default value) means replace all occurrences.
|
| If the optional argument count is given, only the first count occurrences are
| replaced.
|
| rfind(...)
| S.rfind(sub[, start[, end]]) -> int
|
| Return the highest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Return -1 on failure.
|
| rindex(...)
| S.rindex(sub[, start[, end]]) -> int
|
| Return the highest index in S where substring sub is found,
| such that sub is contained within S[start:end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Raises ValueError when the substring is not found.
|
| rjust(self, width, fillchar=' ', /)
| Return a right-justified string of length width.
|
| Padding is done using the specified fill character (default is a space).
|
| rpartition(self, sep, /)
| Partition the string into three parts using the given separator.
|
| This will search for the separator in the string, starting at the end. If
| the separator is found, returns a 3-tuple containing the part before the
| separator, the separator itself, and the part after it.
|
| If the separator is not found, returns a 3-tuple containing two empty strings
| and the original string.
|
| rsplit(self, /, sep=None, maxsplit=-1)
| Return a list of the substrings in the string, using sep as the separator string.
|
| sep
| The separator used to split the string.
|
| When set to None (the default value), will split on any whitespace
| character (including \\n \\r \\t \\f and spaces) and will discard
| empty strings from the result.
| maxsplit
| Maximum number of splits (starting from the left).
| -1 (the default value) means no limit.
|
| Splitting starts at the end of the string and works to the front.
|
| rstrip(self, chars=None, /)
| Return a copy of the string with trailing whitespace removed.
|
| If chars is given and not None, remove characters in chars instead.
|
| split(self, /, sep=None, maxsplit=-1)
| Return a list of the substrings in the string, using sep as the separator string.
|
| sep
| The separator used to split the string.
|
| When set to None (the default value), will split on any whitespace
| character (including \\n \\r \\t \\f and spaces) and will discard
| empty strings from the result.
| maxsplit
| Maximum number of splits (starting from the left).
| -1 (the default value) means no limit.
|
| Note, str.split() is mainly useful for data that has been intentionally
| delimited. With natural text that includes punctuation, consider using
| the regular expression module.
|
| splitlines(self, /, keepends=False)
| Return a list of the lines in the string, breaking at line boundaries.
|
| Line breaks are not included in the resulting list unless keepends is given and
| true.
|
| startswith(...)
| S.startswith(prefix[, start[, end]]) -> bool
|
| Return True if S starts with the specified prefix, False otherwise.
| With optional start, test S beginning at that position.
| With optional end, stop comparing S at that position.
| prefix can also be a tuple of strings to try.
|
| strip(self, chars=None, /)
| Return a copy of the string with leading and trailing whitespace removed.
|
| If chars is given and not None, remove characters in chars instead.
|
| swapcase(self, /)
| Convert uppercase characters to lowercase and lowercase characters to uppercase.
|
| title(self, /)
| Return a version of the string where each word is titlecased.
|
| More specifically, words start with uppercased characters and all remaining
| cased characters have lower case.
|
| translate(self, table, /)
| Replace each character in the string using the given translation table.
|
| table
| Translation table, which must be a mapping of Unicode ordinals to
| Unicode ordinals, strings, or None.
|
| The table must implement lookup/indexing via __getitem__, for instance a
| dictionary or list. If this operation raises LookupError, the character is
| left untouched. Characters mapped to None are deleted.
|
| upper(self, /)
| Return a copy of the string converted to uppercase.
|
| zfill(self, width, /)
| Pad a numeric string with zeros on the left, to fill a field of the given width.
|
| The string is never truncated.
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| maketrans(...)
| Return a translation table usable for str.translate().
|
| If there is only one argument, it must be a dictionary mapping Unicode
| ordinals (integers) or characters to Unicode ordinals, strings or None.
| Character keys will be then converted to ordinals.
| If there are two arguments, they must be strings of equal length, and
| in the resulting dictionary, each character in x will be mapped to the
| character at the same position in y. If there is a third argument, it
| must be a string, whose characters will be mapped to None in the result.
help(object)
Help on class object in module builtins:
class object
| The base class of the class hierarchy.
|
| When called, it accepts no arguments and returns a new featureless
| instance that has no instance attributes and cannot be given any.
|
| Built-in subclasses:
| anext_awaitable
| ArgNotFound
| async_generator
| async_generator_asend
| ... and 116 other subclasses
|
| Methods defined here:
|
| __delattr__(self, name, /)
| Implement delattr(self, name).
|
| __dir__(self, /)
| Default dir() implementation.
|
| __eq__(self, value, /)
| Return self==value.
|
| __format__(self, format_spec, /)
| Default object formatter.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getstate__(self, /)
| Helper for pickle.
|
| __gt__(self, value, /)
| Return self>value.
|
| __hash__(self, /)
| Return hash(self).
|
| __init__(self, /, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| __le__(self, value, /)
| Return self<=value.
|
| __lt__(self, value, /)
| Return self<value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __reduce__(self, /)
| Helper for pickle.
|
| __reduce_ex__(self, protocol, /)
| Helper for pickle.
|
| __repr__(self, /)
| Return repr(self).
|
| __setattr__(self, name, value, /)
| Implement setattr(self, name, value).
|
| __sizeof__(self, /)
| Size of object in memory, in bytes.
|
| __str__(self, /)
| Return str(self).
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| __init_subclass__(...) from builtins.type
| This method is called when a class is subclassed.
|
| The default implementation does nothing. It may be
| overridden to extend subclasses.
|
| __subclasshook__(...) from builtins.type
| Abstract classes can override this to customize issubclass().
|
| This is invoked early on by abc.ABCMeta.__subclasscheck__().
| It should return True, False or NotImplemented. If it returns
| NotImplemented, the normal algorithm is used. Otherwise, it
| overrides the normal algorithm (and the outcome is cached).
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __class__ = <class 'type'>
| type(object) -> the object's type
| type(name, bases, dict, **kwds) -> a new type
Data model identifiers are hidden by default as they are not typically used directly. Instead a function or operator from the builtins module is typically used. builtins is automatically imported in a Python script or notebook. However it is sometimes useful to import it directly:
import builtins
Once imported the list of identifiers in the builtins module can be accessed by using builtins followed by a dot . and there are a large number of identifiers.
The classes named using CamelCase correspond to error classes which a user encounters when an error is found:
- ArithmeticError – class
- AssertionError – class
- BaseException – class
- BaseException – class
- BlockingIOError – class
- BrokenPipeError – class
- BytesWarning – class
- ChildProcessError – class
- ConnectionAbortedError – class
- ConnectionError – class
- ConnectionRefusedError – class
- ConnectionResetError – class
- DeprecationWarning – class
- EncodingWarning – class
- EnvironmentError – class
- EOFError – class
- Exception – class
- ExceptionGroup – class
- FileExistsError – class
- FileNotFoundError – class
- FloatingPointError – class
- FutureWarning – class
- GeneratorExit – class
- ImportError – class
- ImportWarning – class
- IndentationError – class
- IndexError – class
- InterruptedError – class
- IOError – class
- IsADirectoryError – class
- KeyboardInterrupt – class
- KeyError – class
- LookupError – class
- MemoryError – class
- ModuleNotFoundError – class
- NameError – class
- NotADirectoryError – class
- NotImplementedError – class
- OSError – class
- OverflowError – class
- PendingDeprciationWarning – class
- PermissionError – class
- RecursionError – class
- ReferenceError – class
- ResourceWarning – class
- RuntimeError – class
- RuntimeWarning – class
- StopAsyncIteration – class
- StopIteration – class
- SyntaxError – class
- SystemError – class
- SystemExit – class
- TabError – class
- TimeoutError – class
- TypeError – class
- UnboundLocalError – clas
- UnicodeDecodeError – class
- UnicodeEncodeError – class
- UnicodeError – class
- UnicodeTranslationError – class
- UnicodeWarning – class
- UserWarning – class
- ValueError – class
- Warning – class
- WindowsError – class
- ZeroDivisionError – class
The lower case classes are typically the classes a user instantiates on a regular basis:
- bool – class
- bytearray – class
- bytes – class
- classmethod – class
- complex – class
- dict – class
- enumerate – class
- filter – class
- float – class
- frozenset – class
- int – class
- list – class
- map – class
- object – class
- property – class
- range – class
- reversed – class
- set – class
- slice – class
- staticmethod – class
- str – class
- super – class
- tuple – classmethod
- type – class
- zip – class
The functions are setup to invoke the data model methods that belong to an instance, which recall are defined in the instances class:
- abs – function
- aiter – function
- all – function
- anext – function
- any – function
- ascii – function
- bin – function
- breakpoint – function
- callable – function
- chr – function
- compile – function
- delattr – function
- dir – function
- display – function
- divmod – function
- eval – function
- exec – function
- execfile – function
- format – function
- getattr – function
- globals – function
- hasattr – function
- hash – function
- hex – function
- id – function
- input – function
- isinstance – function
- issubclass – function
- iter – function
- len – function
- locals – function
- max – function
- min – function
- next – function
- oct – function
- open – function
- ord – function
- pow – function
- print – function
- repr – function
- round – function
- runfile – function
- setattr – function
- sorted – function
- sum – function
- vars – function
The upper case instances are constants that are frequently used:
- Ellipsis – instance
- False – instance
- None – instance
- NotImplemented – instance
- True – instance
The lower case instances are typically instances used by the Python interpretter:
- copyright – instance
- credits -instance
- exit – instance
- help – instance
- license – instance
- quit – instance
Calling a Method
A method is a function that is defined in an objects class. The docstring of the capitalize string method can be examined from the str class using str.capitalize()
Signature: str.capitalize(self, /) Docstring: Return a capitalized version of the string. More specifically, make the first character have upper case and the rest lower case. Type: method_descriptor
Alternatively the docstring can be examined from the instance greeting using greeting.capitalize()
Signature: greeting.capitalize() Docstring: Return a capitalized version of the string. More specifically, make the first character have upper case and the rest lower case. Type: builtin_function_or_method
Notice the subtle differences between the two docstrings above, when called from a class, instance data from an instance self is required. On the other hand when called from an instance, the instance is already supplied, self is a placeholder for an instance in a class or this instance from an instance.
Signature: str.capitalize(self, /) Signature: greeting.capitalize()
Notice also the slight difference in the type at the end of the docstring:
Type: method_descriptor Type: builtin_function_or_method
Methods which access instance data are known as instance methods as they are bound to the instance and instance data. Because these are the most commonly used methods, they are usually just referred to as methods. There are also class methods which are bound to the class and normally used for the purpose of alternative constructors. There are also static method which are rarer and neither bound to the class or instance but merely exist in the classes namespace for convenience. Finally there is data and other attributes, which are usually from an instance. Because data and other attributes are not functions they are accessed without parenthesis.
Case Identifiers
A str is immutable, which means once a string is instantiated that it cannot be modified. All the str methods therefore have a return value, returning a new str or an instance of another builtins class.
A unitary identifier acts directly on the instance data. If the docstring of the method capitalize from the str class is compared to the docstring of the method capitalize from the instance greeting is compared:
Signature: str.capitalize(self, /) Docstring: Return a capitalized version of the string. More specifically, make the first character have upper case and the rest lower case. Type: method_descriptor
Signature: greeting.capitalize() Docstring: Return a capitalized version of the string. More specifically, make the first character have upper case and the rest lower case. Type: builtin_function_or_method
Notice that the difference in the first line of the docstring, when calling the method from a class, an instance is required. When calling a method from an instance, the instance is already implied, in the second case self means this instance which has the instance name greeting:
Signature: str.capitalize(self, /) Signature: greeting.capitalize()
Using this method, returns a new string:
greeting.capitalize()
'Hello'
Once again this instance is not assigned to an object name and will be collected by Pythons garbage collection. When the function call is assigned to an instance name, the return statement gets assigned to the instance name. For example:
greeting2 = greeting.capitalize()
Once again because the instance is assigned to an instance name, it doesn't display in the ipython console. The assignment operator should be approached from right to the left; the operation on the right is carried out first and returns the str 'Hello'. Then this 'Hello' is assigned to the instance name greeting2.
Instead of assigning to a new instance name, the existing instance name can be reassigned.
greeting = greeting.capitalize()
Reassignment is often confused with mutation by beginners, recall that a str is immutable meaning that once it has been instantiated it cannot be modified. Like the above the assignment operator should be approached from right to the left; the operation on the right is carried out first on greeting which originally points to the str 'hello' and returns the new str instance 'Hello'. Then this new instance 'Hello' is assigned to the previous instance name greeting. greeting now points to the new str instance 'Hello' and no longer points to the old instance 'hello'. Conceptualise the instance name as a label placed on an instance and reassignment removes this label from the instance and places it on a new instance. If the old instance 'hello' has no other instance names, it has no references pointing to it meaning it is orphaned and collected by Pythons garbage collection.
There are a number of other identifiers such as:
- lower
- title
- casefold
- swapcase
which all operate on the instance data:
Signature: greeting.lower() Docstring: Return a copy of the string converted to lowercase. Type: builtin_function_or_method
Signature: greeting.upper() Docstring: Return a copy of the string converted to uppercase. Type: builtin_function_or_method
Signature: greeting.title() Docstring: Return a version of the string where each word is titlecased. More specifically, words start with uppercased characters and all remaining cased characters have lower case. Type: builtin_function_or_method
Signature: greeting.casefold() Docstring: Return a version of the string suitable for caseless comparisons. Type: builtin_function_or_method
Signature: greeting.swapcase() Docstring: Convert uppercase characters to lowercase and lowercase characters to uppercase. Type: builtin_function_or_method
These can all be used on the str instance greeting:
greeting
greeting.upper()
greeting.lower()
greeting.title()
greeting.casefold()
greeting.swapcase()
'Hello' # capitalized
'HELLO' # uppercase
'hello' # lowercase
'Hello' # titlecase
'hello' # lowercase (including non-English characters)
'hELLO' # swapcase
These can also be used with the str instance farewell:
farewell
farewell.capitalize()
farewell.upper()
farewell.lower()
farewell.title()
farewell.casefold()
farewell.swapcase()
'bye' # capitalized 'Bye' # uppercase 'BYE' # lowercase 'bye' # titlecase 'Bye' # lowercase (including non-English characters) 'bye' # swapcase
The following methods check the properties of a str instance returning a bool:
Signature: greeting.isupper() Docstring: Return True if the string is an uppercase string, False otherwise. A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string. Type: builtin_function_or_method
Signature: greeting.islower() Docstring: Return True if the string is a lowercase string, False otherwise. A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string. Type: builtin_function_or_method
Signature: greeting.istitle() Docstring: Return True if the string is a title-cased string, False otherwise. In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones. Type: builtin_function_or_method
These can be used on the instance greeting:
greeting.isupper()
greeting.islower()
greeting.istitle()
False # is upper
False # is lower
True # is title
Valid Identifier Names
The isidentifier method can be used to check whether a str instance has the properties to be used as an identifier. For example:
id1 = 'islower'
id2 = 'is lower'
id3 = 'is_lower'
id4 = 'is_lower2'
id5 = '2is_lower'
id6 = 'is-lower'
id7 = 'IS_UPPER'
id1.isidentifier()
id2.isidentifier()
id3.isidentifier()
id4.isidentifier()
id5.isidentifier()
id6.isidentifier()
id7.isidentifier()
True # all lower case characters are valid
False # a space is an invalid character
True # an underscore is a valid character
True # a number is a valid character
False # an identifier cannot start with a number
False # the - operator is an invalid character
True # all upper case characters are valid
Recall an identifier is a class, function or instance. In Python it is typical to use lower case instance names. Upper case instance names are reserved for a constant instance, that is an immutable instance that should never be reassigned. Python doesn't prevent reassignment of a constant, merely the upper case instance name indicates to another programmer than an instance is designed to be a constant.
An instance name shouldn't match any of the identifiers in builtins otherwise it will override the builtin (until the kernel is restarted) which will lead to confusion when the builtin is attempted to be used.
In addition to builtins, Python has a number of keywords. The keyword module gives details about these keywords:
import keyword
The keyword module has four identifiers:
- iskeyword – function
- issoftkeyword – function
- kwlist – instance
- softkwlist – instance
The identifiers kwlist and softkwlist are a list of strings:
keyword.kwlist
keyword.softkwlist
['False',
'None',
'True',
'and',
'as',
'assert',
'async',
'await',
'break',
'class',
'continue',
'def',
'del',
'elif',
'else',
'except',
'finally',
'for',
'from',
'global',
'if',
'import',
'in',
'is',
'lambda',
'nonlocal',
'not',
'or',
'pass',
'raise',
'return',
'try',
'while',
'with',
'yield']
['_', 'case', 'match']
None of these should be used as identifier names.
Python has a string
module which contains a number of useful strings.
import string
The string module has the following identifiers:
- ascii_letters – instance
- ascii_lowercase – instance
- ascii_uppercase – instance
- capwords – function
- digits – instance
- Formatter – class
- hexdigits – instance
- octdigits – instance
- printable – instance
- punctuation – instance
- Template – class
- whitespace – instance
ASCII
The first computers originated from a typewriter:

The American Standard Code for Information Interchange (ASCII) is as the name suggests based on an American typewriter which uses the English language. The typewriter has the letters and numbers but also has a number of whitespace commands such as the tab and the space. Physically to create a new line on a typewriter, the carriage has to return to the start of the page and the form feed has to be used to move the piece of paper up by the thickness of a line. ASCII has 128 commands, the first 32 are non-printable commands and the rest print English characters:
- null
- start of heading
- start of text
- end of text
- end of transmission
- enquiry
- acknowledge
- bell
- backspace
- horizontal tab
- new line
- vertical tab
- form feed
- carriage return
- shift out
- shift in
- data link escape
- device control 1
- device control 2
- device control 3
- device control 4
- negative acknowledge
- synchronous idle
- end of transmission block
- cancel
- end of medium
- substitute
- escape
- file separator
- group separator
- record separator
- unit seperator
- space
- !
- "
- #
- $
- %
- &
- '
- (
- )
- *
- +
- ,
- –
- .
- /
- 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- :
- ;
- <
- =
- >
- ?
- @
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- X
- Y
- Z
- [
- \
- ]
- ^
- _
- `
- a
- b
- c
- d
- e
- f
- g
- h
- i
- j
- k
- l
- m
- n
- o
- p
- q
- r
- s
- t
- u
- v
- w
- x
- y
- z
- {
- |
- }
- ~
- delete
The string module groups the ascii, digits, punctuation and whitespace characters. ascii_letters gives the letters in the English alphabet:
string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
These are split into ascii_lowercase and ascii_uppercase:
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits give the 10 digits used in the decimal numbering system:
string.digits
'0123456789'
punctuation gives the characters used for punctuation:
string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
Notice that the \\ displays twice, \ is a special character in Python which means insert an escape character. If \ is followed by \, it means the escape character to insert is the \ itself.
whitespace gives the representation used for whitespace. As whitespace characters cannot be seen, these normally use escape sequences:
string.whitespace
' \t\n\r\x0b\x0c'
This is the space ' ', the tab '\t', the new line '\n', the carriage return '\r', the vertical tab '\x0b' and form feed '\x0c' respectively. The whitespace characters most commonly used '\t', '\n' and to a slightly lesser extent '\r' all have a 1 letter escape character. The less commonly used whitespace characters use a 2 letter hexadecimal number which corresponds to their byte sequence which will be discussed in another tutorial.
The hexdigits give the 16 characters used for hexadecimal numbers. Essentially after 0:10, the first 6 characters in the alphabet are used. These can either be lower or upper case, but typically lower case is the default:
string.hexdigits
'0123456789abcdefABCDEF'
'\x0b' is the 11th value and '\x0c' is the 12th value as seen in the numeric list of the ascii characters above.
There is also the printable characters which group all of the above and are the typical characters used in a string:
string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
Now that there is a basic understanding of ascii and the ascii subgroupings the other identifiers that are used to check if every character in a string belongs to such groupings can be examined:
Signature: greeting.isalnum() Docstring: Return True if the string is an alpha-numeric string, False otherwise. A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string. Type: builtin_function_or_method
Signature: greeting.isalpha() Docstring: Return True if the string is an alphabetic string, False otherwise. A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string. Type: builtin_function_or_method
Signature: greeting.isascii() Docstring: Return True if all characters in the string are ASCII, False otherwise. ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too. Type: builtin_function_or_method
Signature: greeting.isdecimal() Docstring: Return True if the string is a decimal string, False otherwise. A string is a decimal string if all characters in the string are decimal and there is at least one character in the string. Type: builtin_function_or_method
Signature: greeting.isdigit() Docstring: Return True if the string is a digit string, False otherwise. A string is a digit string if all characters in the string are digits and there is at least one character in the string. Type: builtin_function_or_method
Signature: greeting.isspace() Docstring: Return True if the string is a whitespace string, False otherwise. A string is whitespace if all characters in the string are whitespace and there is at least one character in the string. Type: builtin_function_or_method
greeting = 'hello world!'
greeting.isalnum()
greeting.isalpha()
greeting.isascii()
greeting.isdecimal()
greeting.isdigit()
greeting.isnumeric()
greeting.isprintable()
greeting.isspace()
'hello world!'
False # '!' and ' ' are not alphanumerical
False # '!' and ' ' are not alphabetical
True # All characters are ASCII
False # None of the characters are decimal
False # None of the characters are digits
False # None of the characters are numeric
True # All of the characters are printable
False # Only the ' ' is a space
A string is a sequence of Unicode characters and can include ASCII and non-ASCII digits:
greeting = 'Hello World!'
greeting.isascii()
greek_greeting = 'Γειά σου Κόσμε!'
greek_greeting.isascii()
True # All English characters except the '£' are ASCII
False # Greek characters are not ASCII
The methods isdecimal, isdigit and isnumeric closely resemble one another when it comes to ASCII characters. They handle non-ASCII numeric characters slightly differently.
isdecimal is the most restrictive and only includes the numbers '0123456789'. These can be different Unicode characters for example '𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿', '𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵' and '𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡' which are the same characters with a different font.
isdigit and isnumeric also include different Unicode characters that represent subscript '₀₁₂₃₄₅₆₇₈₉' and superscript '⁰¹²³⁴⁵⁶⁷⁸⁹', as well as circled digits '➀➁➂➃➄➅➆➇➈'.
isnumeric includes Vulgar Fractions '½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉' and numeric Unicode characters that represent digits outwith '➀➁➂➃➄➅➆➇➈' such as '➉'.
Escape Characters
The \ is a special symbol used to insert an escape character. Th most commonly used escape characters have the form:
- \\ – insert a \
- \t – insert a tab
- \n – insert a new line
- \xff – insert an ASCII character as a 2 digit hexadecimal
- \uffff – insert a Unicode character as a 4 digit hexadecimal
- \' – insert a quotation
- \" – inset a double quotation
A file path in Windows uses the \ as a directory delimiter. For example:
file_path = 'C:\\Users\\Philip'
file_path
'C:\\Users\\Philip'
Notice that the ipython cell output displays the representation of the string, i.e. the sequence of characters needed to instantiate the string.
The print function is used to print the string and all the escape characters are converted into their printable equivalents. The docstring of the print function can be examined:
Signature: print(*args, sep=' ', end='\n', file=None, flush=False)
Docstring:
Prints the values to a stream, or to sys.stdout by default.
sep
string inserted between values, default a space.
end
string appended after the last value, default a newline.
file
a file-like object (stream); defaults to the current sys.stdout.
flush
whether to forcibly flush the stream.
Type: builtin_function_or_method
Notice that the first input argument is *args, this means a variable number of positional input arguments can be supplied to be printed. The two keywords sep and end have a default value of a space and a new line respectively. Leaving these at their defaults and printing the file_path gives:
print(file_path)
C:\Users\Philip
The influence of the default values of the keyword arguments sep and end can be seen with the following:
print(file_path, file_path, file_path)
print(file_path, file_path)
print(file_path)
C:\Users\Philip C:\Users\Philip C:\Users\Philip
C:\Users\Philip C:\Users\Philip
C:\Users\Philip
The behaviour of overriding these defaults can be seen with:
print(file_path, file_path, file_path)
print(file_path, file_path, file_path, sep='')
print(file_path, file_path, file_path, sep='\t')
print(file_path, file_path, end='')
print(file_path)
C:\Users\Philip C:\Users\Philip C:\Users\Philip
C:\Users\PhilipC:\Users\PhilipC:\Users\Philip
C:\Users\Philip C:\Users\Philip C:\Users\Philip
C:\Users\Philip C:\Users\PhilipC:\Users\Philip
In Python everything is based on an object and an object has the data model identifiers __repr__ and __str__ which give the formal and informal str representation respectively:
Signature: object.__repr__(self, /) Call signature: object.__repr__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__repr__' of 'object' objects> Namespace: Python builtin Docstring: Return repr(self).
Signature: object.__str__(self, /) Call signature: object.__str__(*args, **kwargs) Type: wrapper_descriptor String form: <slot wrapper '__str__' of 'object' objects> Namespace: Python builtin Docstring: Return str(self).
These data model identifiers are not typically used directly, and instead map to the repr function and str class in builtins:
Signature: repr(obj, /) Docstring: Return the canonical string representation of the object. For many object types, including most builtins, eval(repr(obj)) == obj. Type: builtin_function_or_method
Init signature: str(self, /, *args, **kwargs) Docstring: str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. Type: type Subclasses: StrEnum, DeferredConfigString, _rstr, _ScriptTarget, _ModuleTarget, LSString, include, Keys, InputMode, ColorDepth, ...
For many objects the formal (repr) and informal string (str) representation are identical however for the str class the subtle difference can be seen. The informal representation casts the string to another string instance leaving it unchanged as expected:
file_path = 'C:\\Users\\Philip'
str(file_path)
'C:\\Users\\Philip'
The formal representation will instead sequence each \ and prepend a \ to it and enclose the str literal in double quotations:
file_path = 'C:\\Users\\Philip'
repr(file_path)
"'C:\\\\Users\\\\Philip'"
The purpose of this is to create a string that displays the original string when printed:
file_path = 'C:\\Users\\Philip'
print(repr(file_path))
'C:\\Users\\Philip'
Single vs Double quotations
Notice the syntax highlighting below.
'The string greeting = 'hello world!''
The quotations within the string literal terminate the string. This gives the string:
'The string greeting = '
The instance name:
hello
which is not defined. The instance name:
world!
which is also not defined and is an invalid identifier name and finally an empty string:
''
To insert the string literal into the string, the quotations can be used to insert escape characters. Notice the syntax highlighting now shows that this is a single string:
'The string greeting = \'hello world!\''
"The string greeting = 'hello world!'"
Notice the cell output displays the string which includes the string literal in double quotations which is easier to read. Compare this with the printed string:
print('The string greeting = \'hello world!\'')
The string greeting = 'hello world!'
In Python both single quotes and double quotes can be used to enclose a string. The Python language by default prefers single quotes but uses double quotes to conveniently enclose a string literal.
Each ASCII character corresponds to an ordinal value, the function ord can be used to retrieve the ordinal value of an ASCII character:
Signature: ord(c, /) Docstring: Return the Unicode code point for a one-character string. Type: builtin_function_or_method
ord('a')
97
The function chr does the inverse, retrieving the character from the ordinal value:
Signature: chr(i, /) Docstring: Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff. Type: builtin_function_or_method
chr(97)
'a'
Notice that the character returned is enclosed in single quotations, the default for the Python language.
Unfortunately there is some confusion within the Python community between the use of single and double quotations in strings. Most of the data science community for example favour the use of double quotations over single quotations because the author of the popular pandas library uses double quotations by default. The Python language itself, numpy and matplotlib libraries prefer the use of single quotations.
Multiline string
Three single quotes will begin a multiline string:
paragraph = '''The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog'''
'The quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog'
Notice the difference between the above and:
paragraph = '''
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
'''
paragraph
'\n The quick brown fox jumps over the lazy dog\n The quick brown fox jumps over the lazy dog\n The quick brown fox jumps over the lazy dog\n The quick brown fox jumps over the lazy dog\n '
The above style syntax can be used to enclose Python collections for the purpose of making them more readable. For such collections the additional spacing and new line characters are ignored however for a multiline string these changes are incorporated into the string. Returning to:
paragraph = '''The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog'''
Printing the above gives:
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The most widely used purpose for a multiline string is a Python docstring. For example let's examine the print functions docstring again:
Signature: print(*args, sep=' ', end='\n', file=None, flush=False) Docstring: Prints the values to a stream, or to sys.stdout by default. sep string inserted between values, default a space. end string appended after the last value, default a newline. file a file-like object (stream); defaults to the current sys.stdout. flush whether to forcibly flush the stream. Type: builtin_function_or_method
The docstring component is:
printdocstring = '''Prints the values to a stream, or to sys.stdout by default.
sep
string inserted between values, default a space.
end
string appended after the last value, default a newline.
file
a file-like object (stream); defaults to the current sys.stdout.
flush
whether to forcibly flush the stream.'''
Notice that the sep and end default arguments are assigned to string literals which use single quotations. If more details about the possible valid string literals are to be included in the docstring, it is more convenient to use double quotations to enclose the docstring:
printdocstring = """Prints the values to a stream, or to sys.stdout by default.
sep
string inserted between values, default a ' '.
end
string appended after the last value, default a '\\n'.
file
a file-like object (stream); defaults to the current sys.stdout.
flush
whether to forcibly flush the stream."""
When functions are made, the docstring typically starts off as a single line docstring and sometimes this is enough for example:
Signature: object.mro() Docstring: Return a type's method resolution order. Type: builtin_function_or_method
Triple double quotes are generally used so that it is easier for subsequent expansion over multiple lines and is easy to add string literals to it:
fillmeinlater = """This is a placeholder docstring"""
fillmeinlater = """This is a placeholder docstring
Add note about string literal 'hello' and 'bye'"""
ASCII and Unicode Escape Characters
If the ordinal values of the English letter 'a' (ASCII) and Greek letter 'α' (Unicode) are compared:
ord('a')
ord('α')
97
945
Notice that the ASCII letter is between 0-128 and the ASCII letter is in the range 256-65536. Let's examine what these numbers mean in a bit more detail.
Under the hood a computers memory consists of a series of binary switches. A single switch can be conceptualised as an LED that is either off or on, which correspond to the values 0 and 1 respectively:


For 1 bit there are only 2 ** 1 configurations:
2 ** 1
2
This gives the range from 0-2 using zero-order indexing. In zero-order indexing the lowest bound 0 is included however the upper bound is exclusive, so 0-2 means 0 inclusive of 0, going up in steps of 1 until 2 is reached but not including 2 itself, giving the possible values 0 and 1. To count to larger numbers, the bits are commonly arranged into groupings of 8 known as a bit:

A byte has 8 bits and each bit has 2 combinations. The total number of combinations in a byte is:
2 ** 8
256
This gives the values 0-256 (inclusive of 0 and exclusive of 256). The function bin can be used to convert a decimal integer into a binary integer:
Signature: bin(number, /) Docstring: Return the binary representation of an integer. >>> bin(2796202) '0b1010101010101010101010' Type: builtin_function_or_method
Recall that the character 'a' had an ordinal value of 97:
bin(97)
'0b1100001'
0b is the prefix meaning binary and the 7 bits following the b is the binary sequence. The trailing zero is not shown, for a byte this is 01100001. Notice this is the 97th configuration of the byte and is the same as the LED sequence indicated above.
Binary is machine readable but not human readable; it is easy for a human to mis-transcribe a large binary sequence. As a consequence four binary digits of 2 characters which have:
2 ** 4
16
configurations are instead grouped into a single hexadecimal character. There are 16 hexadecimal characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. The docstring of the hex function can be examined:
Signature: hex(number, /) Docstring: Return the hexadecimal representation of an integer. >>> hex(12648430) '0xc0ffee' Type: builtin_function_or_method
hex(97)
'0x61'
Notice the prefix 0x meaning hexadecimal, this prefix distinguishes the hexadecimal 0x61 from the decimal 61; recalling the hexadecimal 61 equals the decimal 97.
Each ASCII character corresponds to 1 byte which is eight bits or two hexadecimal integers. A Unicode character normally corresponds to 2 bytes which is sixteen bits or four hexadecimal characters.
bin(945)
'0b1110110001'
There are 10 bits here, to bring the trailing zeros up to 2 bytes (16 bits), this would be 0000001110110001. In hex this is:
hex(945)
'0x3b1'
This gives 3 hexadecimal characters, to bring the trailing zeros up to 2 bytes (4 hexadecimal characters), this would be 03b1.
The \x and \u escape sequences can be used to insert an ASCII or Unicode escape character into a string. These escape sequences expect 2 hexadecimal digits or 4 hexadecimal digits respectively:
'\x61'
'a'
'\u03b1'
'α'
ASCII is a subset of Unicode, however the trailing zeros need to be added to give 4 hexadecimal characters when the Unicode escape character is inserted:
'\u0061'
'a'
ASCII characters and Unicode characters are normally included directly in a Unicode string opposed to using escape sequences like the above.
Raw String
A raw string is a string without escape sequences and the \ is a character. The raw string is prefixed by r. The most common purpose is for a file path or a regular expression:
r'C:\Users\Philip'
'C:\\Users\\Philip'
print(r'C:\Users\Philip')
C:\Users\Philip
Formatted Strings
Supposing the following string is instantiated:
stringbody = 'The string to 0 is 1 2!'
The format method can be used to insert other variables into a string body using an optional format specification:
Docstring: S.format(*args, **kwargs) -> str Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces ('{' and '}'). Type: builtin_function_or_method
In a formatted string braces { } are used as placeholders for the variables, this means the stringbody should be updated to:
stringbody = 'The string to {0} is {1} {2}!'
The format method can take in a variable number of positional input arguments *args, these *args should correspond to the numeric placeholders above. Since 0, 1 and 2 are provided, three positional input arguments should be provided in the format method. Let's create three strings:
str0 = 'print'
str1 = 'hello'
str2 = 'world'
Now the format method can be used supplying these three strings as positional input arguments:
stringbody.format(str0, str1, str2)
'The string to print is hello world!'
Alternatively if the placeholders are changed to instance names:
stringbody = 'The string to {str0} is {str1} {str2}!'
Then a variable number of named keyword arguments **kwargs can be specified, these should match the instance names in the stringbody:
stringbody.format(str0='print', str1='hello', str2='world')
'The string to print is hello world!'
It is more common for the values to be assigned to previously created instances:
str0 = 'print'
str1 = 'hello'
str2 = 'world'
stringbody = 'The string to {str0} is {str1} {str2}!'
formattedstring = stringbody.format(str0=str0, str1=str1, str2=str2)
Notice in the last two lines the duplication of each instance name. These two lines are typically combined and abbreviated using a f string. A f string under the hood uses the format method and inserts the variables directly into the formatted string:
formattedstring = f'The string to {str0} is {str1} {str2}!'
The object and hence all other classes has the data model method __format__:
Signature: object.__format__(self, format_spec, /) Docstring: Default object formatter. Type: method_descriptor
By default an object being inserted in a placeholder string is inserted using the syntax {obj} and uses its informal string representation. The data model method __format__ under the hood specifies how the object is to be represented as a string when inserted alongside a format specifier. When a format specifier is used the syntax is {obj:formatspec} which has a loose similarity to a mapping. Since these are strings, the string format specifier s will be used:
f'The string to {str0:s} is {str1:s} {str2:s}!'
'The string to print is hello world!'
Note there is no space around the colon. If a space is inserted before the colon which is the typical syntax used for a mapping:
f'The string to {str0: s} is {str1: s} {str2: s}!'
ValueError: Space not allowed in string format specifier
If an integer is prefixed before the string format specifier, any string shorter than the number will occupy the number of specified places using trailing whitespace. If the integer is prefixed with 0, the trailing whitespace will be replaced by 0. For example:
f'The string to {str0:8s} is {str1} {str2:08s}!'
'The string to print is hello world000!'
Formatted strings are frequently used to insert numbers into a string:
num1 = 1
num2 = 0.0000123456789
num3 = 12.3456789
f'The numbers are {num1}, {num2} and {num3}.'
'The numbers are 1, 1.23456789e-05 and 12.3456789.'
num1 is a decimal integer, which is a whole number.
num2 is a small floating point number much smaller than a unit. Because it is much smaller than a unit, it is displayed in scientific notation. A floating point much larger than a unit is also displayed in scientific notation.
num3 is a floating point number comparable to a unit and shown using standard notation.
There are additional format specifiers for other datatypes:
datatype | specifier |
---|---|
string | :s |
general format | :g |
decimal integer | :d |
fixed point format (standard format) | :f |
exponent format (scientific notation) | :e |
For numeric variables by default the general format is used:
f'The numbers are {num1:g}, {num2:g} and {num3:g}.'
'The numbers are 1, 1.23457e-05 and 12.3457.'
The decimal integer format can be used for the whole number, this can be prefixed with the number of desired spaces and a zero to show leading zeros:
f'The numbers are {num1:d}, {num2:g} and {num3:g}.'
'The numbers are 1, 1.23457e-05 and 12.3457.'
f'The numbers are {num1:5d}, {num2:g} and {num3:g}.'
'The numbers are 1, 1.23457e-05 and 12.3457.'
f'The numbers are {num1:05d}, {num2:g} and {num3:g}.'
'The numbers are 00001, 1.23457e-05 and 12.3457.'
Notice the change in num2 and num3 when the fixed point format and exponentials are used:
f'The numbers are {num1:d}, {num2:g} and {num3:g}.'
'The numbers are 1, 1.23457e-05 and 12.3457.'
f'The numbers are {num1:d}, {num2:f} and {num3:f}.'
'The numbers are 1, 0.000012 and 12.345679.'
f'The numbers are {num1:d}, {num2:e} and {num3:e}.'
'The numbers are 1, 1.234568e-05 and 1.234568e+01.'
The format specification in either case be change to .3 which indicates a precision of 3 digits past the decimal point:
f'The numbers are {num1:d}, {num2:.3f} and {num3:.3f}.'
'The numbers are 1, 0.000 and 12.346.'
f'The numbers are {num1:d}, {num2:.3e} and {num3:.3e}.'
'The numbers are 1, 1.235e-05 and 1.235e+01.'
The keys of the mapping can be included in the placeholder, alongside an optional format specifier:
numbers = {'num1': 1, 'num2': 0.0000123456789, 'num3': 12.3456789}
stringbody = 'The numbers are {num1:d}, {num2:.3e} and {num3:.3e}.'
There is another string method called format_map which creates a formatted string from a mapping:
Docstring: S.format_map(mapping) -> str Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces ('{' and '}'). Type: builtin_function_or_method
stringbody.format_map(numbers)
'The numbers are 1, 1.235e-05 and 1.235e+01.'
Note in the above that there is spacing after each colon in the mapping which follows PEP8 and emphasises the separation of the key and the value in the mapping. This is not done in the string body as the space would otherwise get incorporated to give 'num1 ' instead of 'num1'. This can be seen with:
numbers = {'num1': 1, 'num2': 0.0000123456789, 'num3': 12.3456789}
stringbody = 'The numbers are {num1 :d}, {num2 :.3e} and {num3 :.3e}.'
stringbody.format_map(numbers)
KeyError: 'num1 '
There is also an older style of formatted strings which uses the % as a placeholder followed by a format specifier:
stringbody = 'The numbers are %d, %.3e and %.3e.'
The behaviour of the mod operator % is defined by the __mod__ data model identifier:
Signature: stringbody.__mod__(value, /) Call signature: placeholder.__mod__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__mod__' of str object at 0x000002D0F7DE4570> Docstring: Return self%value.
For an old style formatted string, the mod operator % can be used with a tuple which has the same number of elements as the number of % placeholders in the string:
'The numbers are %d, %.3f and %.3f.' % (num1, num2, num3)
'The numbers are 1, 0.000 and 12.346.'
'The numbers are %d, %.3e and %.3e.' % (num1, num2, num3)
'The numbers are 1, 1.235e-05 and 1.235e+01.'
Object Design Pattern
The data models __init__, __new__, __repr__, __str__ and __format__ have been examined which are present in the parent class object. There are a number of additional identifiers present in the parent class which are commonly used by the str class. Let's look at the object instance instance and string instance greeting to compare these:
instance = object()
greeting = 'hello world!'
The __doc__ instance returns the docstring as a string:
instance.__doc__
greeting.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
"str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."
Notice for the string class, the docstring contains string literals so is enclosed in double quotations.
The docstring is more commonly used with the operator ? which provides some other additional information and prints the docstring:
? instance
? greeting
Type: object
String form: <object object at 0x0000019E7710A120>
Docstring:
The base class of the class hierarchy.
When called, it accepts no arguments and returns a new featureless
instance that has no instance attributes and cannot be given any.
Type: str
String form: hello world!
Length: 12
Docstring:
str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.
The type of the classes are object and str respectively. The type can be read off using the attribute __class__:
instance.__class__
greeting.__class__
object
str
The data model identifier is typically not used directly, instead the builtins class type is used:
Init signature: type(self, /, *args, **kwargs) Docstring: type(object) -> the object's type type(name, bases, dict, **kwds) -> a new type Type: type Subclasses: ABCMeta, EnumType, _AnyMeta, NamedTupleMeta, _TypedDictMeta, _DeprecatedType, _ABC, MetaHasDescriptors, PyCStructType, UnionType, ...
type(instance)
type(greeting)
object
str
Recall that the data model identifier __str__ defines the behaviour of the str class on the object giving the informal string representation. For the object the str just gives the class type and its location in memory.
The location in memory is represented as an integer value using the builtins identification id:
Signature: id(obj, /) Docstring: Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (CPython uses the object's memory address.) Type: builtin_function_or_method
id(instance)
id(greeting)
1780114039072
1780109053360
Each id is unique as expected.
The length is Collection specific and not available for an object which is not a Collection. This should not be confused with the identifier __sizeof__ which defines the behaviour of sys.getsizeof and returns the size of an object in bytes:
Signature: greeting.__sizeof__() Docstring: Return the size of the string in memory, in bytes. Type: builtin_function_or_method
import sys
Docstring: getsizeof(object [, default]) -> int Return the size of object in bytes. Type: builtin_function_or_method
sys.getsizeof(instance)
sys.getsizeof(greeting)
16
61
If a directory is explored in a file explorer, each directory contains files and sub directories. In Python, an object is conceptualised as a directory and the identifiers in the directory can be output as a list. The data model identifier __dir__ defines the behaviour of the dir function.
Signature: greeting.__dir__() Docstring: Default dir() implementation. Type: builtin_function_or_method
Docstring: dir([object]) -> list of strings If called without an argument, return the names in the current scope. Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it. If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns: for a module object: the module's attributes. for a class object: its attributes, and recursively the attributes of its bases. for any other object: its attributes, its class's attributes, and recursively the attributes of its class's base classes. Type: builtin_function_or_method
dir can be used on instance and greeting:
dir(instance)
dir(greeting)
['__class__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getstate__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__']
['__add__',
'__class__',
'__contains__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__getnewargs__',
'__getstate__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__mod__',
'__mul__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rmod__',
'__rmul__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'capitalize',
'casefold',
'center',
'count',
'encode',
'endswith',
'expandtabs',
'find',
'format',
'format_map',
'index',
'isalnum',
'isalpha',
'isascii',
'isdecimal',
'isdigit',
'isidentifier',
'islower',
'isnumeric',
'isprintable',
'isspace',
'istitle',
'isupper',
'join',
'ljust',
'lower',
'lstrip',
'maketrans',
'partition',
'removeprefix',
'removesuffix',
'replace',
'rfind',
'rindex',
'rjust',
'rpartition',
'rsplit',
'rstrip',
'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'zfill']
Immutable Ordered Collection ABC Design Pattern
Let's have a look at the str instance:
greeting = 'hello world!'
Earlier it was seen that the str was a subclass of the object and therefore followed the design pattern of an object. The design pattern of the str class actually has a number of abstract base classes. An abstract base class is a conceptual class, that isn't instantiated directly but used as a design pattern for numerous Python classes so their behaviour is consistent.
The str has a Container abstract base class, which means it has the data model method __contains__:
Signature: greeting.__contains__(key, /) Call signature: greeting.__contains__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__contains__' of str object at 0x0000019E76C48DB0> Docstring: Return key in self.
This data model method maps to the keyword in, which can be used to check whether a substring is in a string or if a string contains a substring. This can be used with the following letter and substring:
letter = 'h'
word = 'hello'
letter in greeting
word in greeting
True
True
Note this is case-sensitive so if the following letter and substring are searched for the results will be False:
letter = 'H'
word = 'Hello'
letter in greeting
word in greeting
False
False
If the case-sensitivity is to be dropped, the casefold method can be used:
letter.casefold() in greeting.casefold()
word.casefold() in greeting.casefold()
True
True
The string is also Hashable which means it has the data model method __hash__ which maps to builtins hash:
Signature: greeting.__hash__() Call signature: greeting.__hash__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__hash__' of str object at 0x0000019E76C48DB0> Docstring: Return hash(self).
Signature: hash(obj, /) Docstring: Return the hash value for the given object. Two objects that compare equal must also have the same hash value, but the reverse is not necessarily true. Type: builtin_function_or_method
The hash value corresponds to an integer:
hash(greeting)
6476820586813634871
Having a hash value, means the object can be used as a key in a mapping and the key is used to index into a mapping to retrieve the associated value:
colors = {'red': '#ff0000', 'green': '#00b050', 'blue': '#0070c0'}
hash('red')
colors['red']
9014713669131661868
'#ff0000'
The str is an Iterable which has the data model method __iter__, that controls the behaviour of the builtins iter:
Signature: greeting.__iter__() Call signature: greeting.__iter__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__iter__' of str object at 0x0000019E76C48DB0> Docstring: Implement iter(self).
Docstring: iter(iterable) -> iterator iter(callable, sentinel) -> iterator Get an iterator from an object. In the first form, the argument must supply its own iterator, or be a sequence. In the second form, the callable is called until it returns the sentinel. Type: builtin_function_or_method
This constructs an iterator from the str.
forward = iter(greeting)
<str_ascii_iterator at 0x19e75d05cc0>
An iterator has no regular identifiers but has a number of data model identifiers:
- __class__ – class
- __delattr__ – instance
- __dir__ – function
- __doc__ – instance
- __eq__ – instance
- __format__ – function
- __ge__ – instance
- __getattribute__ – instance
- __getstate__ – function
- __gt__ – instance
- __hash__ – instance
- __init__ – instance
- __init_subclass__ – function
- __iter__ – instance
- __le__ – instance
- __length_hint__ – instance
- __lt__ – instance
- __ne__ – instance
- __new__ – function
- __next__ – instance
- __reduce__ – function
- __reduce_ex__ – function
- __repr__ – instance
- __setattr__ – instance
- __setstate__ – function
- __sizeof__ – function
- __str__ – instance
- __subclasshook__ – function
Most of these are inherited from the object parent class. An iterator has the data model identifier __next__ which defines how the builtins next behaves:
Signature: forward.__next__() Call signature: forward.__next__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__next__' of str_ascii_iterator object at 0x0000019E75D05CC0> Docstring: Implement next(self).
Docstring: next(iterator[, default]) Return the next item from the iterator. If default is given and the iterator is exhausted, it is returned instead of raising StopIteration. Type: builtin_function_or_method
An iterator takes on a single value at a time and advances using the builtins next. When an iterator advances the previous value it takes is consumed. For the ASCII iterator obtained from the str, the iterator takes on the next ASCII character every time next is invoked:
next(forward)
next(forward)
next(forward)
'h'
'e'
'l'
A slightly different str_iterator will be obtained when the str being converted to an iterator contains Unicode characters:
greek_greeting = 'Γειά σου Κόσμε!'
forward = iter(greek_greeting)
forward
<str_iterator at 0x19e75d2b700>
next(forward)
next(forward)
next(forward)
'Γ'
'ε'
'ι'
When a for loop is used with a str, under the hood an iterator is created and consumed by the for loop. A simple for loop can be made which prints each letter twice:
for letter in greek_greeting:
print(letter, sep='', end='')
print(letter, sep='', end='')
ΓΓεειιάά σσοουυ ΚΚόόσσμμεε!!
for loops will be covered in a subsequent tutorial on programming constructs.
What is important to note here is the str is an Iterable but not an Iterator. An Iterable means an Iterator can be created from the Iterable using the iter function from builtins. This means the str itself does not possess the data model method __next__.
The string is also Sized which means it has the data model method __len__ which defines the behaviour of the builtins function len:
Signature: greeting.__len__() Call signature: greeting.__len__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__len__' of str object at 0x0000019E76C48DB0> Docstring: Return len(self).
Signature: len(obj, /) Docstring: Return the number of items in a container. Type: builtin_function_or_method
The len function will return the integer number of Unicode characters in a string:
len(greeting)
12
The number of letters can be counted using a for loop:
for index, letter in enumerate(greeting):
print(index, letter)
0 h
1 e
2 l
3 l
4 o
5
6 w
7 o
8 r
9 l
10 d
11 !
A string is a Collection, which means it has all the properties from Sized, Iterable and Container seen above.
A string is also an Sequence which means it contains the data model identifiers __getitem__, __len__, __contains__, __iter__ and __reversed__ alongside the identifiers index and count.
The data model identifier __getitem__ defines the behaviour when indexing into the Collection using square brackets [ ]:
Signature: greeting.__getitem__(key, /) Call signature: greeting.__getitem__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__getitem__' of str object at 0x0000019E76C48DB0> Docstring: Return self[key].
For example:
greeting[1]
'e'
Recall that zero-order indexing is used, so the 2nd Unicode character is at an index of 1, and the 1st Unicode character is at an index of 0. Confer with the output of for loop used above for more details.
for index, letter in enumerate(greeting):
print(index - len(greeting), letter)
-12 h
-11 e
-10 l
-9 l
-8 o
-7
-6 w
-5 o
-4 r
-3 l
-2 d
-1 !
greeting[-5]
'o'
The builtins slice function can be used to create a substring by slicing within square brackets:
Init signature: slice(self, /, *args, **kwargs) Docstring: slice(stop) slice(start, stop[, step]) Create a slice object. This is used for extended slicing (e.g. a[0:10:2]). Type: type Subclasses:
The slice object also uses zero-order indexing. It has three possible forms:
slice(stop) # 1
slice(start, stop) # 2
slice(start, stop, step) # 3
When the first form is used, only one input argument is supplied, the start is assumed to be 0 and the step is assumed to be 1.
When the second form is used, two input arguments are supplied and the step is assumed to be 1.
The three forms above are normally represented using colon slicing notation which is more flexible:
:stop # 1
start:stop # 2
start:stop:step # 3
: # all
:: # all
start: # start only
::step # step only
start:: # start and step only
For positive steps when a:
- start value is not specified it is assumed to be the default 0.
- stop value is not specified it is assumed to be the default len(string)
For a negative step when a:
- start value is not specified it is assumed to be the value before 0 which is -1
- stop value is not specified it is assumed to be -len(string) – 1
To get the first and second word of greeting using slicing, the following slices are required:
greeting = 'hello world!'
greeting[0:5]
greeting[6:11]
'hello'
'world'
Slicing from the start, end and a copy of the string can be made using:
greeting[:5]
greeting[6:]
greeting[:]
'hello'
'world!'
'hello world!'
Notice the inclusion of the last character at index 11 the '!' in 'World!'. To specify a slice that includes this character explicitly a stop value the length of the string needs to be used:
greeting[6:12]
'world!'
Attempting to access this position gives an IndexError because zero-order indexing is used and the last index is therefore the length minus 1 which is 11:
greeting[12]
IndexError: string index out of range
A step can be included to select every 2nd Unicode character. The starting position will dictate whether characters at even or odd indexes will be selected:
greeting[:5:2]
greeting[1:5:2]
'hlo'
'el'
When the step is negative and the start and stop are at their defaults, the string will be reversed:
greeting[::-1]
'!dlrow olleh'
Care needs to be taken when using a reverse step as the start bound is still inclusive and the stop bound is still exclusive. The default value for the start bound becomes the value before 0
which is -1
. The stop
value becomes the negative length of the string -1
:
greeting[-1:-13:-1]
'!dlrow olleh'
The string class doesn't have the data model identifier __reversed__ defined however because it is an ordered sequence the builtins function reversed can be used on a string instance to create a reversed iterator:
Init signature: reversed(sequence, /) Docstring: Return a reverse iterator over the values of the given sequence. Type: type Subclasses:
backward = reversed(greeting)
backward
<reversed at 0x1f166c2bdc0>
The builtins next works in a similar manner as seen before. Since the iterator is reversed, the last Unicode character in the string is displayed first and Unicode characters in the string are consumed backwards:
next(backward)
next(backward)
next(backward)
'!'
'd'
'l'
The numeric index has previously been used to retrieve the corresponding Unicode character at the respective numeric index.
for num, letter in enumerate(greeting):
print(num, letter)
0 h
1 e
2 l
3 l
4 o
5
6 w
7 o
8 r
9 l
10 d
11 !
The index identifier instead retrieves the index corresponding to the first occurrence of a character or substring:
Docstring: S.index(sub[, start[, end]]) -> int Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation. Raises ValueError when the substring is not found. Type: builtin_function_or_method
The letter 'l' occurs 3 times at the indexes 2, 3 and 9. To get the first value:
greeting.index('l')
2
To get the other index values, the start index value to restrict the search from needs to be added:
greeting.index('l', 2+1)
3
greeting.index('l', 3+1)
9
If a value is not found, a ValueError displays:
greeting.index('l', 9+1)
ValueError: substring not found
A Unicode substring can also be searched for opposed to a Unicode character:
greeting.index('world')
6
The string also has a number of associated identifiers that are not part of the Sequence abstract base class. There is for example the closely associated method find which has analogous input arguments and returns the integer -1 when a value is not found instead of a ValueError:
Docstring: S.count(sub[, start[, end]]) -> int Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation. Type: builtin_function_or_method
greeting.find('l')
2
greeting.find('l', 2+1)
3
greeting.find('l', 3+1)
9
greeting.find('l', 9+1)
-1
There is an associated method right find rfind that begins searches from the right of the string instead of the left:
Docstring: S.rfind(sub[, start[, end]]) -> int Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. Type: builtin_function_or_method
greeting.rfind('l')
9
The string identifier startswith searches for a string prefix and returns a boolean value of True if present, otherwise returns a boolean value of False:
Docstring: S.startswith(prefix[, start[, end]]) -> bool Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try. Type: builtin_function_or_method
greeting.startswith('hello')
True
The associated identifier endswith looks for a suffix:
Docstring: S.endswith(suffix[, start[, end]]) -> bool Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try. Type: builtin_function_or_method
greeting.endswith('!')
True
The replace method can be used to replace an old substring old with a new substring new. It has an optional argument count which has a default value of -1 and this means it allows for all replacements by default. The / trailing the input arguments once again indicates that the input arguments are to be supplied positionally:
Signature: greeting.replace(old, new, count=-1, /) Docstring: Return a copy with all occurrences of substring old replaced by new. count Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences. If the optional argument count is given, only the first count occurrences are replaced. Type: builtin_function_or_method
greeting.replace('hello', 'bye')
'bye world'
Fill, Center and Justify
Supposing the following string is
If greeting and the len of greeting are examined:
greeting = 'hello world!'
len(greeting)
12
The center method can be used to center the string over a specified width using whitespace (default) or a specified fill character. Alternatively the ljust and rjust are used to left and right justify the string and have consistent input arguments. Note the / in the input arguments meaning they have to be specified positionally:
Signature: greeting.center(width, fillchar=' ', /) Docstring: Return a centered string of length width. Padding is done using the specified fill character (default is a space). Type: builtin_function_or_method
Signature: greeting.ljust(width, fillchar=' ', /) Docstring: Return a left-justified string of length width. Padding is done using the specified fill character (default is a space). Type: builtin_function_or_method
Signature: greeting.rjust(width, fillchar=' ', /) Docstring: Return a right-justified string of length width. Padding is done using the specified fill character (default is a space). Type: builtin_function_or_method
greeting.center(20)
greeting.center(20, '◯')
greeting.ljust(20, '◯')
greeting.rjust(20, '◯')
' hello world! '
'◯◯◯◯hello world!◯◯◯◯'
'◯◯◯◯◯◯◯◯hello world!'
'hello world!◯◯◯◯◯◯◯◯'
If the following string is created:
greeting2 = greeting.center(20, '◯')
greeting2
'◯◯◯◯hello world!◯◯◯◯'
The opposite operation can be carried out using the identifiers lstrip and rstrip which left strip and right strip whitespace by default or a specified fill character or character sequence:
Signature: greeting2.lstrip(chars=None, /) Docstring: Return a copy of the string with leading whitespace removed. If chars is given and not None, remove characters in chars instead. Type: builtin_function_or_method
Signature: greeting2.rstrip(chars=None, /) Docstring: Return a copy of the string with trailing whitespace removed. If chars is given and not None, remove characters in chars instead. Type: builtin_function_or_method
greeting2.lstrip('◯')
greeting2.lstrip('◯').rstrip('◯')
'hello world!◯◯◯◯'
'hello world!'
There are the associated identifiers removeprefix and removesuffix that are more precise and will only remove a specified prefix or suffix:
Signature: greeting2.removeprefix(prefix, /) Docstring: Return a str with the given prefix string removed if present. If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string. Type: builtin_function_or_method
Signature: greeting2.removesuffix(suffix, /) Docstring: Return a str with the given suffix string removed if present. If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string. Type: builtin_function_or_method
greeting2.removeprefix('◯◯◯')
greeting2.removesuffix('◯')
'◯hello world!◯◯◯◯'
'◯◯◯◯hello world!◯◯◯'
Earlier the binary number corresponding to the character 'a' was computed:
bin(ord('a'))
'0b1100001'
The string identifier removeprefix can be used to remove the '0b':
bin(ord('a')).removeprefix('0b')
'1100001'
There is also the string method zfill which can be used to zero fill a string and is mainly intended for strings of numeric values:
lettera = bin(ord('a')).removeprefix('0b')
Signature: lettera.zfill(width, /) Docstring: Pad a numeric string with zeros on the left, to fill a field of the given width. The string is never truncated. Type: builtin_function_or_method
Since this is a byte, the width can be specified as 8, once again this is specified positionally as the input argument is followed by a /
lettera = lettera.zfill(8)
lettera
'01100001'
In the above reassignment is used. Recall that the assignment operator should be approached from right to left, so the operation on the right should be calculated using the original string. After the zfill of the original string '1100001', there is a new string '01100001'. The instance name should be conceptualised as a label and before the reassignment points to the original string '1100001' and after the assignment points to the new string '01100001'.
Binary Operators
The string is an ordered immutable Sequence as previously discussed. An immutable Sequence often has the data model identifiers addition __add__ and multiplication __mul__ which map to the operators + and * respectively. For a mutable sequence these perform the task of concatenation and replication with an integer respectively. The reverse multiplication __rmul__ is also typically defined which operates if the integer is multiplied by the string instead of the string multiplied by the integer giving the same result. Note that these data model identifiers alongside the previously examined __mod__ data model identifier which maps to the % operator have different behaviour in numeric data types where they instead behave differently and perform a numeric operation. The typical functionality of these data model identifiers is different for Sequences and numbers.
The __add__ data model identifier is called a binary operator as it requires the string instance self and the string instance value.
Signature: lettera.__add__(value, /) Call signature: lettera.__add__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__add__' of str object at 0x000001F166FEB6B0> Docstring: Return self+value.
The binary prefix '0b' can be added to the string lettera:
'0b' + lettera
'0b01100001'
Note in the above that '0b' is the instance self and lettera is the instance value because '0b' is on the left hand side of the + operator.
The return value can be reassigned to lettera, recall that the operation on the right hand side of the assignment operator is carried out first and creates a new instance. Then the assignment of this new object to the instance name is carried out. The instance name be conceptualised as a label then initially pointed at the string '01100001' and now points to the new string '0b01100001':
lettera = '0b' + lettera
lettera
'0b01100001'
The __mul__ data model identifier defines how the * operator behaves:
Signature: greeting.__mul__(value, /) Call signature: greeting.__mul__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__mul__' of str object at 0x00000201D22435B0> Docstring: Return self*value.
For a string the __mul__ works with a string instance self and integer value. self should be at the left hand side of the multiplication operator.
greeting * 3
'hello world!hello world!hello world!'
The data model identifier __rmul__ is also defined:
Signature: greeting.__rmul__(value, /) Call signature: greeting.__rmul__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__rmul__' of str object at 0x00000201D22435B0> Docstring: Return value*self.
This defines the behaviour of the multiplication operator when the str is used to the right hand side of the multiplication operator and the integer instance is used to the left hand side of the multiplication operator:
5 * greeting
'hello world!hello world!hello world!hello world!hello world!'
The use of __mod__ was seen before and required a string instance with % placeholders:
stringbody = 'The numbers are %d, %.3e and %.3e.'
The __mod__ data model identifier controls the behaviour of the % operator and is a binary operation between a string and a tuple of corresponding placeholders:
Signature: stringbody.__mod__(value, /) Call signature: stringbody.__mod__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__mod__' of str object at 0x00000201D22FE430> Docstring: Return self%value.
num1 = 1
num2 = 0.0000123456789
num3 = 12.3456789
stringbody = 'The numbers are %d, %.3e and %.3e.'
stringbody % (num1, num2, num3)
'The numbers are 1, 0.000 and 12.346.'
It is common to use a binary operator and reassign the output to the instance name. Recall the operation on the right is carried out using the original string and then the instance name, which can be conceptualised as a label then points to the new object:
greeting = 'hello world!'
greeting = greeting * 5
This second line can be carried out using the equivalent binary in place operator:
greeting = 'hello world!'
greeting *= 5
The second form more clearly denotes that the multiplication is carried out first to create a new string instance and then reassignment of the instance name to that new instance occurs. Recall that a string instance is immutable and cannot be modified once instantiated.
Binary Comparison Operators
When the object class is used to create instances:
instance1 = object()
instance2 = object()
Each instance is a unique object stored in a different memory location:
instance1
instance2
<object at 0x201d20293d0>
<object at 0x201d2029410>
The operator is checks to see if two objects are the same object, i.e. are stored in the same memory location:
object1 is object2
object1 is object1
False
True
There are also two data model identifiers equals __eq__ and not equals __ne__ which check to see if two objects are equal or not equal to one another. These map to the equals operator == and != operator respectively. The equals operator == should not be confused with the assignment operator = seen previously:
Signature: instance1.__eq__(value, /) Call signature: instance1.__eq__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__eq__' of object object at 0x00000201D20293D0> Docstring: Return self==value.
Signature: instance1.__ne__(value, /) Call signature: instance1.__ne__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__ne__' of object object at 0x00000201D20293D0> Docstring: Return self!=value.
instance1 == instance2
instance1 != instance2
False
True
instance1 == instance1
instance1 != instance1
True
False
There is a subtle difference between using is and is equal to but it is not observed in the object class which is basic.
Recall that a character in a string is ordinal:
import string
for character in string.printable:
print(ord(character), repr(character), character)
48 '0' 0
49 '1' 1
50 '2' 2
51 '3' 3
52 '4' 4
53 '5' 5
54 '6' 6
55 '7' 7
56 '8' 8
57 '9' 9
97 'a' a
98 'b' b
99 'c' c
100 'd' d
101 'e' e
102 'f' f
103 'g' g
104 'h' h
105 'i' i
106 'j' j
107 'k' k
108 'l' l
109 'm' m
110 'n' n
111 'o' o
112 'p' p
113 'q' q
114 'r' r
115 's' s
116 't' t
117 'u' u
118 'v' v
119 'w' w
120 'x' x
121 'y' y
122 'z' z
65 'A' A
66 'B' B
67 'C' C
68 'D' D
69 'E' E
70 'F' F
71 'G' G
72 'H' H
73 'I' I
74 'J' J
75 'K' K
76 'L' L
77 'M' M
78 'N' N
79 'O' O
80 'P' P
81 'Q' Q
82 'R' R
83 'S' S
84 'T' T
85 'U' U
86 'V' V
87 'W' W
88 'X' X
89 'Y' Y
90 'Z' Z
33 '!' !
34 '"' "
35 '#' #
36 '$' $
37 '%' %
38 '&' &
39 "'" '
40 '(' (
41 ')' )
42 '*' *
43 '+' +
44 ',' ,
45 '-' -
46 '.' .
47 '/' /
58 ':' :
59 ';' ;
60 '<' <
61 '=' =
62 '>' >
63 '?' ?
64 '@' @
91 '[' [
92 '\\' \
93 ']' ]
94 '^' ^
95 '_' _
96 '`' `
123 '{' {
124 '|' |
125 '}' }
126 '~' ~
32 ' '
9 '\t'
10 '\n'
13 '\r'
11 '\x0b'
12 '\x0c'
Because each of these characters is ordinal, the string class redefines the operators __eq__ and __ne__ which recall map to == and !=:
Signature: greeting.__eq__(value, /) Call signature: greeting.__eq__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__eq__' of str object at 0x00000201D22435B0> Docstring: Return self==value.
Signature: greeting.__ne__(value, /) Call signature: greeting.__ne__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__ne__' of str object at 0x00000201D22435B0> Docstring: Return self!=value.
The string class also has four other ordinal based comparison data model identifiers less than __lt__, greater than __gt__, less than or equal to __le__ and greater than or equal to __ge__ which map to <, >, <= and >= respectively:
Signature: greeting.__lt__(value, /) Call signature: greeting.__lt__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__lt__' of str object at 0x00000201D22435B0> Docstring: Return self<value.
Signature: greeting.__gt__(value, /) Call signature: greeting.__gt__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__gt__' of str object at 0x00000201D22435B0> Docstring: Return self>value.
Signature: greeting.__le__(value, /) Call signature: greeting.__le__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__le__' of str object at 0x00000201D22435B0> Docstring: Return self<=value.
Signature: greeting.__ge__(value, /) Call signature: greeting.__ge__(*args, **kwargs) Type: method-wrapper String form: <method-wrapper '__ge__' of str object at 0x00000201D22435B0> Docstring: Return self>=value.
'hello' == 'Hello'
'hello' != 'Hello'
'hello' > 'Hello'
False
True
True
Splitting and Joining Strings
The string has a number of identifiers which are used for splitting and joining a string. These generally involve casting to a Python collection such as a tuple of strings or a list of strings.
For example the identifier partition and right partition rpartition will partition a string into a three element tuple containing the substring before the partition, the partition substring and the substring after the partition respectively. To make it more obvious the following string will be instantiated:
greeting = 'hello|world|!'
Signature: greeting.partition(sep, /) Docstring: Partition the string into three parts using the given separator. This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it. If the separator is not found, returns a 3-tuple containing the original string and two empty strings. Type: builtin_function_or_method
Signature: greeting.rpartition(sep, /) Docstring: Partition the string into three parts using the given separator. This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it. If the separator is not found, returns a 3-tuple containing two empty strings and the original string. Type: builtin_function_or_method
greeting.partition('|')
greeting.rpartition('|')
('hello', '|', 'world|!')
greeting.rpartition('|')
More generally the split and join identifiers can be used to split a string into a list of strings or join a list of strings up into a single string. For example if the following sentence is created:
sentence = 'the fat black cat sat on the mat!'
The identifier split can be examined:
Signature: sentence.split(sep=None, maxsplit=-1) Docstring: Return a list of the substrings in the string, using sep as the separator string. sep The separator used to split the string. When set to None (the default value), will split on any whitespace character (including \\n \\r \\t \\f and spaces) and will discard empty strings from the result. maxsplit Maximum number of splits (starting from the left). -1 (the default value) means no limit. Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module. Type: builtin_function_or_method
Since the values to be split from are whitespace, the input arguments can be left unspecified defaulting to their default values. This gives a list of strings:
words = sentence.split()
words
['the', 'fat', 'black', 'cat', 'sat', 'on', 'the', 'mat!']
To join the words a delimiterstring needs to be can created for example a space:
delimiterstring = ' '
Signature: delimiterstring.join(iterable, /) Docstring: Concatenate any number of strings. The string whose method is called is inserted in between each given string. The result is returned as a new string. Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs' Type: builtin_function_or_method
The join method can be used on this delimiter string and the list of strings can be supplied as the iterable:
delimiterstring.join(words)
'the fat black cat sat on the mat!'
More generally this is called from a space itself:
' '.join(words)
'the fat black cat sat on the mat!'
If a multiline string is created:
paragraph = '''The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
'''
There is an associated identifier split, which splits the string into a list using the newline. It has an input argument keepends which defaults to False and therefor excludes the newline character:
Signature: paragraph.splitlines(keepends=False) Docstring: Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. Type: builtin_function_or_method
This can be used to get a list of sentences:
paragraph.splitlines()
['The quick brown fox jumps over the lazy dog',
'The quick brown fox jumps over the lazy dog',
'The quick brown fox jumps over the lazy dog',
'The quick brown fox jumps over the lazy dog']
A multiline string can be created with tabs:
paragraph = '''\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
\tThe quick brown fox jumps over the lazy dog
'''
The tabs can be replaced by a specified number of spaces using the expandtabs identifier:
Signature: paragraph.expandtabs(tabsize=8) Docstring: Return a copy where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed. Type: builtin_function_or_method
The tabs can be expanded by 4 spaces using:
paragraph.expandtabs(4)
' The quick brown fox jumps over the lazy dog\n The quick brown fox jumps over the lazy dog\n The quick brown fox jumps over the lazy dog\nThe quick brown fox jumps over the lazy dog\n'
This concludes the overview of the string class, further related concepts will be explored with the byte class in the next tutorial.