In the previous tutorial the string class was examined and was seen to be a Sequence on Unicode characters. The bytes class is an immutable Sequence of byte values and was essentially the foundation for strings in Python 2. The main difference between a byte and a Unicode string is the fundamental unit, a byte uses a byte and a string uses a Unicode character. For an ASCII character the byte and the Unicode character are the same making the data types interchangeable for this limited set of characters. Unicode strings have been designed to make it easier to work with non-English characters.
Table of contents
Let's examine each printable ASCII character and the byte integer sequence in decimal, binary and hexadecimal:
import string
for character in string.printable:
formalstr = repr(character)
hexint = hex(ord(character))
hexint = '0x' + hexint.removeprefix('0x').zfill(2)
decint = ord(character)
binint = bin(ord(character))
binint = '0b' + binint.removeprefix('0b').zfill(8)
print(formalstr, decint, binint, hexint)
'0' 48 0b00110000 0x30
'1' 49 0b00110001 0x31
'2' 50 0b00110010 0x32
'3' 51 0b00110011 0x33
'4' 52 0b00110100 0x34
'5' 53 0b00110101 0x35
'6' 54 0b00110110 0x36
'7' 55 0b00110111 0x37
'8' 56 0b00111000 0x38
'9' 57 0b00111001 0x39
'a' 97 0b01100001 0x61
'b' 98 0b01100010 0x62
'c' 99 0b01100011 0x63
'd' 100 0b01100100 0x64
'e' 101 0b01100101 0x65
'f' 102 0b01100110 0x66
'g' 103 0b01100111 0x67
'h' 104 0b01101000 0x68
'i' 105 0b01101001 0x69
'j' 106 0b01101010 0x6a
'k' 107 0b01101011 0x6b
'l' 108 0b01101100 0x6c
'm' 109 0b01101101 0x6d
'n' 110 0b01101110 0x6e
'o' 111 0b01101111 0x6f
'p' 112 0b01110000 0x70
'q' 113 0b01110001 0x71
'r' 114 0b01110010 0x72
's' 115 0b01110011 0x73
't' 116 0b01110100 0x74
'u' 117 0b01110101 0x75
'v' 118 0b01110110 0x76
'w' 119 0b01110111 0x77
'x' 120 0b01111000 0x78
'y' 121 0b01111001 0x79
'z' 122 0b01111010 0x7a
'A' 65 0b01000001 0x41
'B' 66 0b01000010 0x42
'C' 67 0b01000011 0x43
'D' 68 0b01000100 0x44
'E' 69 0b01000101 0x45
'F' 70 0b01000110 0x46
'G' 71 0b01000111 0x47
'H' 72 0b01001000 0x48
'I' 73 0b01001001 0x49
'J' 74 0b01001010 0x4a
'K' 75 0b01001011 0x4b
'L' 76 0b01001100 0x4c
'M' 77 0b01001101 0x4d
'N' 78 0b01001110 0x4e
'O' 79 0b01001111 0x4f
'P' 80 0b01010000 0x50
'Q' 81 0b01010001 0x51
'R' 82 0b01010010 0x52
'S' 83 0b01010011 0x53
'T' 84 0b01010100 0x54
'U' 85 0b01010101 0x55
'V' 86 0b01010110 0x56
'W' 87 0b01010111 0x57
'X' 88 0b01011000 0x58
'Y' 89 0b01011001 0x59
'Z' 90 0b01011010 0x5a
'!' 33 0b00100001 0x21
'"' 34 0b00100010 0x22
'#' 35 0b00100011 0x23
'$' 36 0b00100100 0x24
'%' 37 0b00100101 0x25
'&' 38 0b00100110 0x26
"'" 39 0b00100111 0x27
'(' 40 0b00101000 0x28
')' 41 0b00101001 0x29
'*' 42 0b00101010 0x2a
'+' 43 0b00101011 0x2b
',' 44 0b00101100 0x2c
'-' 45 0b00101101 0x2d
'.' 46 0b00101110 0x2e
'/' 47 0b00101111 0x2f
':' 58 0b00111010 0x3a
';' 59 0b00111011 0x3b
'<' 60 0b00111100 0x3c
'=' 61 0b00111101 0x3d
'>' 62 0b00111110 0x3e
'?' 63 0b00111111 0x3f
'@' 64 0b01000000 0x40
'[' 91 0b01011011 0x5b
'\\' 92 0b01011100 0x5c
']' 93 0b01011101 0x5d
'^' 94 0b01011110 0x5e
'_' 95 0b01011111 0x5f
'`' 96 0b01100000 0x60
'{' 123 0b01111011 0x7b
'|' 124 0b01111100 0x7c
'}' 125 0b01111101 0x7d
'~' 126 0b01111110 0x7e
' ' 32 0b00100000 0x20
'\t' 9 0b00001001 0x09
'\n' 10 0b00001010 0x0a
'\r' 13 0b00001101 0x0d
'\x0b' 11 0b00001011 0x0b
'\x0c' 12 0b00001100 0x0c
Initialisation Signature
Inputting bytes() will display the docstring of the initialisation signature of the byte class as a popup balloon. Recall some IDEs such as JupyterLab may require the keypress shift ⇧ and tab ↹ to invoke the docstring:
Init signature: bytes(self, /, *args, **kwargs) Docstring: bytes(iterable_of_ints) -> bytes bytes(string, encoding[, errors]) -> bytes bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer bytes(int) -> bytes object of size given by the parameter initialized with null bytes bytes() -> empty bytes object Construct an immutable array of bytes from: - an iterable yielding integers in range(256) - a text string encoded using the specified encoding - any object implementing the buffer API. - an integer Type: type Subclasses:
Notice in the output of the for loop of printable ASCII characters that each ASCII character is a decimal of 0-127. If a tuple sequence of decimal integers is computed:
integers = (104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33)
bytes(integers)
b'hello world!'
The decimal numbering system is the numbering system we are used to but isn't easy to visualise what is stored in the byte sequence physically. Instead the binary values can be supplied, these are machine readable and relate to the physical values stored in a byte for each character but they aren't very human readable:
integers = (0b01101000,
0b01100101,
0b01101100,
0b01101100,
0b01101111,
0b00100000,
0b01110111,
0b01101111,
0b01110010,
0b01101100,
0b01100100,
0b00100001)
bytes(integers)
b'hello world!'
If the 0b prefix is removed and the binary values are split into groupings of 4:
integers = (0110 1000,
0110 0101,
0110 1100,
0110 1100,
0110 1111,
0010 0000,
0111 0111,
0110 1111,
0111 0010,
0110 1100,
0110 0100,
0010 0001)
Each of the groupings of 4 can be replaced by their corresponding hexadecimal values:
integers = (6 8,
6 5,
6 c,
6 c,
6 f,
2 0,
7 7,
6 f,
7 2,
6 c,
6 4,
2 1)
Removing the space and adding the 0x prefix gives the more human readable hexadecimal values that are also easy to translate physically to the byte sequence.
integers = (0x68,
0x65,
0x6c,
0x6c,
0x6f,
0x20,
0x77,
0x6f,
0x72,
0x6c,
0x64,
0x21)
bytes(integers)
b'hello world!'
A byte string can also be created using the equivalent hexadecimal escape characters notice that the encoding needs to also be specified and in this case is 'ASCII'. Encoding is essentially a translation mapping each byte to a Unicode character:
greeting = bytes('\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21', encoding='ASCII')
greeting
b'hello world!'
A bytes sequence can be created by casting a Unicode string to an integer, once again the encoding is specified as 'ASCII':
greeting = bytes('hello world!', encoding='ASCII')
greeting
b'hello world!'
Encoding and Decoding
If a string with non-ASCII characters is cast into a bytes Sequence, the equivalent encoding needs to be specified. The most common encoding is UTF-8 which handles both ASCII and non-ASCII characters:
bytes('Γειά σου Κόσμε!', encoding='UTF-8')
b'\xce\x93\xce\xb5\xce\xb9\xce\xac \xcf\x83\xce\xbf\xcf\x85 \xce\x9a\xcf\x8c\xcf\x83\xce\xbc\xce\xb5!'
Notice the output displays all the Greek characters using a 2 byte hexadecimal escape sequence. Only the ASCII characters are shown using their character, the ' ' and '!' respectively.
greek_greeting = bytes('Γειά σου Κόσμε!', encoding='UTF-8')
To decode a byte sequence to a Unicode string, the decode identifier can be used:
Signature: greek_greeting.decode(encoding='utf-8', errors='strict') Docstring: Decode the bytes using the codec registered for encoding. encoding The encoding with which to decode the bytes. errors The error handling scheme to use for the handling of decoding errors. The default is 'strict' meaning that decoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registered with codecs.register_error that can handle UnicodeDecodeErrors. Type: builtin_function_or_method
This requires the correct encoding scheme to use to map each byte to a corresponding Unicode character:
greek_greeting.decode('UTF-8')
'Γειά σου Κόσμε!'
If the wrong encoding scheme is specified an error usually displays:
greek_greeting.decode('ASCII')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)
The identifier hex will cast the byte into a hexadecimal Unicode string, each byte will be shown as a 2 digit hexadecimal character:
Docstring: Create a string of hexadecimal numbers from a bytes object. sep An optional single character or byte to separate hex bytes. bytes_per_sep How many bytes between separators. Positive values count from the right, negative values count from the left. Example: >>> value = b'\xb9\x01\xef' >>> value.hex() 'b901ef' >>> value.hex(':') 'b9:01:ef' >>> value.hex(':', 2) 'b9:01ef' >>> value.hex(':', -2) 'b901:ef' Type: builtin_function_or_method
greek_greeting.hex()
'ce93ceb5ceb9ceac20cf83cebfcf8520ce9acf8ccf83cebcceb521'
The __len__ data model identifier defines the behaviour of the builtins function len. In the case of the byte this returns the number of bytes and not the number of hexadecimal characters in the Unicode string:
len(greek_greeting) # number of bytes
len(greek_greeting.hex()) # hexadecimal Unicode string 2 hexadecimal characters per byte
len(greek_greeting.decode('UTF-8')) # decoded Unicode string
27
54
15
Note that there are 12 Greek letters which are Unicode characters which occupy 2 bytes and 3 ASCII letters which are ASCII characters and occupy 1 byte in the encoding scheme 'UTF-8':
12*2 + 3*1
27
The 8 in UTF-8 means 8 bits in a byte which is the memory used for an ASCII character. UTF-8 is however adaptive and for the most common Unicode characters out with ASCII, 2 bytes (16 bits) are used to encode the character. Since character sets of other languages alongside emojis are being added to Unicode the number of combinations for 16 bits has been exceeded:
2 ** 16
65536
and therefore some newer characters occupy 4 bytes (32 bits):
2 ** 64
18446744073709551616
The number of combinations available from 4 bytes is ample and is nowhere nearbeing exceeded.
Before the adaptive encoding scheme 'UTF-8' became the standard there were different encoding schemes such as 'UTF-16' which contained the most commonly used characters in Western European languages. The 16 in UTF-16 means that each ASCII and supported Unicode character is encoded in 2 bytes or 16 bit.
A comparison can be made between 'UTF-8' and 'UTF-16':
greek_greeting = bytes('Γειά σου Κόσμε!', encoding='UTF-8')
greek_greeting.hex()
len(greek_greeting)
'ce93ceb5ceb9ceac20cf83cebfcf8520ce9acf8ccf83cebcceb521'
27
greek_greeting16 = bytes('Γειά σου Κόσμε!', encoding='UTF-16')
greek_greeting16.hex()
len(greek_greeting16)
'fffe9303b503b903ac032000c303bf03c50320009a03cc03c303bc03b5032100'
32
The UTF-8 encoding assigns the 3 ASCII characters to 1 byte and the 12 Unicode characters to 2 bytes:
3*1 + 12*2
27
The UTF-8 encoding assigns the 15 characters to to 2 bytes:
15 * 2
30
Notice that the length is 32, this is because a byte order marker occupying 2 bytes has been prefixed:
bom = bytes('', encoding='UTF-16')
bom.hex()
'fffe'
The translation table for the Unicode characters is different however the translation table for the ASCII characters is similar. If the last byte is examined it has a value '2100' in 'UTF-16' and '21' in 'UTF-8'. The reason it is '2100' and not '0021' is that UTF-16 uses little endian; each character corresponds to a byte pair and the least significant value in this case the trailing zeros are stored first.
The endian value can be specified:
greek_greeting16le = bytes('Γειά σου Κόσμε!', encoding='UTF-16-LE')
greek_greeting16le.hex()
'9303b503b903ac032000c303bf03c50320009a03cc03c303bc03b5032100'
greek_greeting16be = bytes('Γειά σου Κόσμε!', encoding='UTF-16-BE')
greek_greeting16be.hex()
'039303b503b903ac002003c303bf03c50020039a03cc03c303bc03b50021'
When the endian is specified, there is no BOM. Notice the difference in the byte ordering by examining every 4 hexadecimal characters. Each byte (2 hexadecimal characters) pair is swapped.
Although 'UTF-8' does not use a BOM, Microsoft have a tendency to add BOMs to 'UTF-8':
greek_greeting = bytes('Γειά σου Κόσμε!', encoding='UTF-8')
greek_greeting.hex()
'ce93ceb5ceb9ceac20cf83cebfcf8520ce9acf8ccf83cebcceb521'
greek_greeting_bom = bytes('Γειά σου Κόσμε!', encoding='UTF-8-SIG')
greek_greeting_bom.hex()
'efbbbfce93ceb5ceb9ceac20cf83cebfcf8520ce9acf8ccf83cebcceb521'
And just to clarify:
bom = bytes('', encoding='UTF-8-SIG')
bom.hex()
'efbbbf'
ASCII values range between 0 (inclusive) and 128 (exclusive). The last value is at 127:
'0b' + bin(127).removeprefix('0b').zfill(8)
'0b01111111'
Notice this spans only half the values in the byte. Originally the other half of the byte was used for 'Latin1' for English and North European languages which expanded the character set to 191 characters and included for example the '£' sign:
price = bytes('£123.45', encoding='latin1')
b'\xa3123.45'
Notice that the '£' sign is not encoded and instead displays the hexadecimal escape sequence '\xa3' which still occupies the space of 1 byte but is larger than the value of 127. The remaining characters in the bytes Sequence are ASCII and are encoded:
'0b' + bin(0xa3).removeprefix('0b').zfill(8)
'0b10100011'
Although 'Latin-1' was commonly in the English speaking world, there were a number of regional variations:
price.decode('latin-1)
£123.45
'latin-2' and 'latin-3' were used in eastern and southern Europe respectively. 'greek' was used in Greece and 'cyrillic' is used in countries that use the Cyrillic alphabet. Since these all use the same byte sequence but map to a character in a different mapping, these characters were often replaced with the wrong character in a phenomenon known as mojibake which was common in the early advent of the internet:
price.decode('latin2')
price.decode('latin3')
price.decode('greek')
price.decode('cyrillic')
'Ł123.45'
'£123.45'
'£123.45'
'Ѓ123.45'
The fundamental unit of a bytes instance is a byte. If a bytes instance is created:
greek_greeting = bytes('Γειά σου Κόσμε!', encoding='UTF-8')
It can be indexed using square brackets:
greek_greeting[0]
206
Notice that a number if returned that is <256 and corresponds to a byte. This can be seen more clearly if the hex values of the byte sequence are examined:
greek_greeting.hex()
'ce93ceb5ceb9ceac20cf83cebfcf8520ce9acf8ccf83cebcceb521'
And if the hex function is used on the returned integer with the '0x' prefix removed. The first byte is the 1st and 2nd characters of the hexadecimal string:
hex(greek_greeting[0]).removeprefix('0x')
'ce'
The second byte is the 3rd and 4th characters of the hexadecimal string:
hex(greek_greeting[1]).removeprefix('0x')
'93'
Slicing on the other hand returns a bytes string even if the slice only returns a single element:
greek_greeting[:1]
b'\xce'
Note that the first Unicode character 'Γ ' spans 2 bytes but the slice above only returns the first byte. Attempting to decode this will lead to a UnicodeDecodeError:
greek_greeting[:1].decode(encoding='UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: unexpected end of data
Slicing to get the 1st and 2nd byte will decode properly:
greek_greeting[:2]
b'\xce\x93'
greek_greeting[:2].decode(encoding='UTF-8')
'Γ'
Identifiers and Data Model Identifiers
If the method resolution order of bytes is checked, it has the object as a parent class:
bytes.mro()
[bytes, object]
If help is used on bytes, details about the defined identifiers split into the three groupings:
- methods
- class methods
- static methods
help(bytes)
Help on class bytes in module builtins:
class bytes(object)
| bytes(iterable_of_ints) -> bytes
| bytes(string, encoding[, errors]) -> bytes
| bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
| bytes(int) -> bytes object of size given by the parameter initialized with null bytes
| bytes() -> empty bytes object
|
| Construct an immutable array of bytes from:
| - an iterable yielding integers in range(256)
| - a text string encoded using the specified encoding
| - any object implementing the buffer API.
| - an integer
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
|
| __bytes__(self, /)
| Convert this value to exact type bytes.
|
| __contains__(self, key, /)
| Return key in self.
|
| __eq__(self, value, /)
| Return self==value.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(self, key, /)
| Return self[key].
|
| __getnewargs__(...)
|
| __gt__(self, value, /)
| Return self>value.
|
| __hash__(self, /)
| Return hash(self).
|
| __iter__(self, /)
| Implement iter(self).
|
| __le__(self, value, /)
| Return self<=value.
|
| __len__(self, /)
| Return len(self).
|
| __lt__(self, value, /)
| Return self<value.
|
| __mod__(self, value, /)
| Return self%value.
|
| __mul__(self, value, /)
| Return self*value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __repr__(self, /)
| Return repr(self).
|
| __rmod__(self, value, /)
| Return value%self.
|
| __rmul__(self, value, /)
| Return value*self.
|
| __str__(self, /)
| Return str(self).
|
| capitalize(...)
| B.capitalize() -> copy of B
|
| Return a copy of B with only its first character capitalized (ASCII)
| and the rest lower-cased.
|
| center(self, width, fillchar=b' ', /)
| Return a centered string of length width.
|
| Padding is done using the specified fill character.
|
| count(...)
| B.count(sub[, start[, end]]) -> int
|
| Return the number of non-overlapping occurrences of subsection sub in
| bytes B[start:end]. Optional arguments start and end are interpreted
| as in slice notation.
|
| decode(self, /, encoding='utf-8', errors='strict')
| Decode the bytes using the codec registered for encoding.
|
| encoding
| The encoding with which to decode the bytes.
| errors
| The error handling scheme to use for the handling of decoding errors.
| The default is 'strict' meaning that decoding errors raise a
| UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
| as well as any other name registered with codecs.register_error that
| can handle UnicodeDecodeErrors.
|
| endswith(...)
| B.endswith(suffix[, start[, end]]) -> bool
|
| Return True if B ends with the specified suffix, False otherwise.
| With optional start, test B beginning at that position.
| With optional end, stop comparing B at that position.
| suffix can also be a tuple of bytes to try.
|
| expandtabs(self, /, tabsize=8)
| Return a copy where all tab characters are expanded using spaces.
|
| If tabsize is not given, a tab size of 8 characters is assumed.
|
| find(...)
| B.find(sub[, start[, end]]) -> int
|
| Return the lowest index in B where subsection sub is found,
| such that sub is contained within B[start,end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Return -1 on failure.
|
| hex(...)
| Create a string of hexadecimal numbers from a bytes object.
|
| sep
| An optional single character or byte to separate hex bytes.
| bytes_per_sep
| How many bytes between separators. Positive values count from the
| right, negative values count from the left.
|
| Example:
| >>> value = b'\xb9\x01\xef'
| >>> value.hex()
| 'b901ef'
| >>> value.hex(':')
| 'b9:01:ef'
| >>> value.hex(':', 2)
| 'b9:01ef'
| >>> value.hex(':', -2)
| 'b901:ef'
|
| index(...)
| B.index(sub[, start[, end]]) -> int
|
| Return the lowest index in B where subsection sub is found,
| such that sub is contained within B[start,end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Raises ValueError when the subsection is not found.
|
| isalnum(...)
| B.isalnum() -> bool
|
| Return True if all characters in B are alphanumeric
| and there is at least one character in B, False otherwise.
|
| isalpha(...)
| B.isalpha() -> bool
|
| Return True if all characters in B are alphabetic
| and there is at least one character in B, False otherwise.
|
| isascii(...)
| B.isascii() -> bool
|
| Return True if B is empty or all characters in B are ASCII,
| False otherwise.
|
| isdigit(...)
| B.isdigit() -> bool
|
| Return True if all characters in B are digits
| and there is at least one character in B, False otherwise.
|
| islower(...)
| B.islower() -> bool
|
| Return True if all cased characters in B are lowercase and there is
| at least one cased character in B, False otherwise.
|
| isspace(...)
| B.isspace() -> bool
|
| Return True if all characters in B are whitespace
| and there is at least one character in B, False otherwise.
|
| istitle(...)
| B.istitle() -> bool
|
| Return True if B is a titlecased string and there is at least one
| character in B, i.e. uppercase characters may only follow uncased
| characters and lowercase characters only cased ones. Return False
| otherwise.
|
| isupper(...)
| B.isupper() -> bool
|
| Return True if all cased characters in B are uppercase and there is
| at least one cased character in B, False otherwise.
|
| join(self, iterable_of_bytes, /)
| Concatenate any number of bytes objects.
|
| The bytes whose method is called is inserted in between each pair.
|
| The result is returned as a new bytes object.
|
| Example: b'.'.join([b'ab', b'pq', b'rs']) -> b'ab.pq.rs'.
|
| ljust(self, width, fillchar=b' ', /)
| Return a left-justified string of length width.
|
| Padding is done using the specified fill character.
|
| lower(...)
| B.lower() -> copy of B
|
| Return a copy of B with all ASCII characters converted to lowercase.
|
| lstrip(self, bytes=None, /)
| Strip leading bytes contained in the argument.
|
| If the argument is omitted or None, strip leading ASCII whitespace.
|
| partition(self, sep, /)
| Partition the bytes into three parts using the given separator.
|
| This will search for the separator sep in the bytes. If the separator is found,
| returns a 3-tuple containing the part before the separator, the separator
| itself, and the part after it.
|
| If the separator is not found, returns a 3-tuple containing the original bytes
| object and two empty bytes objects.
|
| removeprefix(self, prefix, /)
| Return a bytes object with the given prefix string removed if present.
|
| If the bytes starts with the prefix string, return bytes[len(prefix):].
| Otherwise, return a copy of the original bytes.
|
| removesuffix(self, suffix, /)
| Return a bytes object with the given suffix string removed if present.
|
| If the bytes ends with the suffix string and that suffix is not empty,
| return bytes[:-len(prefix)]. Otherwise, return a copy of the original
| bytes.
|
| replace(self, old, new, count=-1, /)
| Return a copy with all occurrences of substring old replaced by new.
|
| count
| Maximum number of occurrences to replace.
| -1 (the default value) means replace all occurrences.
|
| If the optional argument count is given, only the first count occurrences are
| replaced.
|
| rfind(...)
| B.rfind(sub[, start[, end]]) -> int
|
| Return the highest index in B where subsection sub is found,
| such that sub is contained within B[start,end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Return -1 on failure.
|
| rindex(...)
| B.rindex(sub[, start[, end]]) -> int
|
| Return the highest index in B where subsection sub is found,
| such that sub is contained within B[start,end]. Optional
| arguments start and end are interpreted as in slice notation.
|
| Raise ValueError when the subsection is not found.
|
| rjust(self, width, fillchar=b' ', /)
| Return a right-justified string of length width.
|
| Padding is done using the specified fill character.
|
| rpartition(self, sep, /)
| Partition the bytes into three parts using the given separator.
|
| This will search for the separator sep in the bytes, starting at the end. If
| the separator is found, returns a 3-tuple containing the part before the
| separator, the separator itself, and the part after it.
|
| If the separator is not found, returns a 3-tuple containing two empty bytes
| objects and the original bytes object.
|
| rsplit(self, /, sep=None, maxsplit=-1)
| Return a list of the sections in the bytes, using sep as the delimiter.
|
| sep
| The delimiter according which to split the bytes.
| None (the default value) means split on ASCII whitespace characters
| (space, tab, return, newline, formfeed, vertical tab).
| maxsplit
| Maximum number of splits to do.
| -1 (the default value) means no limit.
|
| Splitting is done starting at the end of the bytes and working to the front.
|
| rstrip(self, bytes=None, /)
| Strip trailing bytes contained in the argument.
|
| If the argument is omitted or None, strip trailing ASCII whitespace.
|
| split(self, /, sep=None, maxsplit=-1)
| Return a list of the sections in the bytes, using sep as the delimiter.
|
| sep
| The delimiter according which to split the bytes.
| None (the default value) means split on ASCII whitespace characters
| (space, tab, return, newline, formfeed, vertical tab).
| maxsplit
| Maximum number of splits to do.
| -1 (the default value) means no limit.
|
| splitlines(self, /, keepends=False)
| Return a list of the lines in the bytes, breaking at line boundaries.
|
| Line breaks are not included in the resulting list unless keepends is given and
| true.
|
| startswith(...)
| B.startswith(prefix[, start[, end]]) -> bool
|
| Return True if B starts with the specified prefix, False otherwise.
| With optional start, test B beginning at that position.
| With optional end, stop comparing B at that position.
| prefix can also be a tuple of bytes to try.
|
| strip(self, bytes=None, /)
| Strip leading and trailing bytes contained in the argument.
|
| If the argument is omitted or None, strip leading and trailing ASCII whitespace.
|
| swapcase(...)
| B.swapcase() -> copy of B
|
| Return a copy of B with uppercase ASCII characters converted
| to lowercase ASCII and vice versa.
|
| title(...)
| B.title() -> copy of B
|
| Return a titlecased version of B, i.e. ASCII words start with uppercase
| characters, all remaining cased characters have lowercase.
|
| translate(self, table, /, delete=b'')
| Return a copy with each character mapped by the given translation table.
|
| table
| Translation table, which must be a bytes object of length 256.
|
| All characters occurring in the optional argument delete are removed.
| The remaining characters are mapped through the given translation table.
|
| upper(...)
| B.upper() -> copy of B
|
| Return a copy of B with all ASCII characters converted to uppercase.
|
| zfill(self, width, /)
| Pad a numeric string with zeros on the left, to fill a field of the given width.
|
| The original string is never truncated.
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| fromhex(string, /) from builtins.type
| Create a bytes object from a string of hexadecimal numbers.
|
| Spaces between two numbers are accepted.
| Example: bytes.fromhex('B9 01EF') -> b'\\xb9\\x01\\xef'.
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| maketrans(frm, to, /)
| Return a translation table useable for the bytes or bytearray translate method.
|
| The returned table will be one where each byte in frm is mapped to the byte at
| the same position in to.
|
| The bytes objects frm and to must be of the same length.
Because the string and bytes class are similar, most of the identifiers are consistent to those found in the string class and behave analogously. The slight difference in behaviour due to a byte being a fundamental unit opposed to a Unicode string was seen when using the __getitem__ and __len__ data model identifiers. The decode and hex method are additional methods associated with this difference.
There is also a class method fromhex which is an alternative constructor and can be used to construct a bytes instance from a hex string:
Signature: bytes.fromhex(string, /) Docstring: Create a bytes object from a string of hexadecimal numbers. Spaces between two numbers are accepted. Example: bytes.fromhex('B9 01EF') -> b'\\xb9\\x01\\xef'. Type: builtin_function_or_method
The alternative constructor essentially does the reverse of the hex method:
greek_greeting = bytes('Γειά σου Κόσμε!', encoding='UTF-8')
greek_greeting_hex = greek_greeting.hex()
greek_greeting_hex
'ce93ceb5ceb9ceac20cf83cebfcf8520ce9acf8ccf83cebcceb521'
The alternative constructor is a class method and is typically called from a class:
greek_greeting2 = bytes.fromhex(greek_greeting_hex)
greek_greeting2
b'\xce\x93\xce\xb5\xce\xb9\xce\xac \xcf\x83\xce\xbf\xcf\x85 \xce\x9a\xcf\x8c\xcf\x83\xce\xbc\xce\xb5!'
There is also the static method maketrans which is not bound to a class or an instance but associated with the bytes class and therefore found in the name space of the bytes class:
Signature: bytes.maketrans(frm, to, /) Docstring: Return a translation table useable for the bytes or bytearray translate method. The returned table will be one where each byte in frm is mapped to the byte at the same position in to. The bytes objects frm and to must be of the same length. Type: builtin_function_or_method
It can be used to create a translation table from lower case to upper case letters:
translation_table = bytes.maketrans(bytes('abcdefghijklmnopqrstuvwxyz',
encoding='ASCII'),
bytes('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
encoding='ASCII'))
The translation table can be used with the method translate:
greeting = bytes('hello world!', encoding='ASCII')
Signature: greeting.translate(table, /, delete=b'') Docstring: Return a copy with each character mapped by the given translation table. table Translation table, which must be a bytes object of length 256. All characters occurring in the optional argument delete are removed. The remaining characters are mapped through the given translation table. Type: builtin_function_or_method
greeting.translate(translation_table)
b'HELLO WORLD!'