Python Pickle Serialization

pickle allows you to serialize and de-serialize Python objects to save them into a file for future use. You can then read this file and extract the stored Python objects, de-serializing them so they can be integrated back into the code’s logic.

You just need the two basic commands: pickle.dump(object, file) and pickle.load(file).

Below, a round trip example:

import pickle

FILENAME = "tmp.pickle"
original_list = ["banana", "apple", "pear"]

with open(FILENAME, "wb") as file:
    pickle.dump(original_list, file)
file.close()

with open(FILENAME, "rb") as file:
    retrieved_list = pickle.load(file)
file.close()

print(retrieved_list) # ["banana", "apple", "pear"]

Python Itertools Cycle

The itertools.cycle() method is a nice way to iterate through an iterable – e.g. a list – indefinitely in a cycling way. When all the elements are exhausted, the elements are once again red from the beginning.

from itertools import cycle

duty_persons = ["Olivier", "David", "Goran",]
duty_persons_cycle = cycle(duty_persons)

for _ in range(6):
    print(next(duty_persons_cycle))

The above snippet will returned the following output:

Olivier
David
Goran
Olivier
David
Goran

As you can see in the above example, the cycle method is quite helpful in a couple of situations:

  • establishing a rolling list of duty-persons, rotating on a regular base;
  • can be used when scrapping the web to cycle between hosts to outrun anti-bot policies;
  • any use-case you might think of…

Python uuid

uuid is a python module providing immutable UUID (Universally Unique IDentifier URN Namespace) objects and functions e.g. uuid4().

The generated UUID objects are unique since it generates IDs based on time and computer hardware.

python> from uuid import uuid4
python> uuid_object = uuid4()
python> uuid_object
UUID('ce8d1fee-bc31-406b-8690-94c01caabcb6')

python> str(uuid_object)
'ce8d1fee-bc31-406b-8690-94c01caabcb6'

Those can be used to generate random strings that can serve as unique identifier across a given namespace – e.g. if you want to generate temporary filenames on the fly located under a specific folder:

for _ in range(10):
    file = open(f"/tmp/{uuid4()}.txt", "a")
    file.write("hello world!")
    file.close()

Note: despite the above snippet working like a charm, it is better to use open() with the context manager with to make sure the close() function will always be called should an error occurs during the write operation.

Hazardous: uuid1() compromises privacy since it uses the network address of the computer to generate the unique ID. Thus, it could be reverse-engineered and retrieved. To prevent this, always choose one of the latest functions e.g. uuid4() or uuid5() as the previous ones rapidly get depreciated.

Note: to know more about URIs, URLs and URNs https://olivierbenard.fr/change-the-url-of-a-git-repository/.

Python extend vs. append

  • append() appends the given object at the end of the list. The length is only incremented of +1.
  • extend() extends the list by appending the elements it contains one by one into the list. The length is incremented by the number of elements.

In more details

The append() method is straight forward:

python> a = [1,2,3]
python> a.append(4)
python> a
[1, 2, 3, 4]

However, what happens if you need to add multiple elements at the same time? You could call the append() method several times, but would it work to give the list of new elements to be added into the original list directly as a parameter? Let’s give a try:

python> a = [1,2,3]
python> a.append([4,5])
python> a
[1, 2, 3, [5, 6]] # vs. expected [1, 2, 3, 4, 5, 6]

The extend() method intends to solve this problem:

python> b = [1,2,3]
python> b.append([4,5])
python> b
[1, 2, 3, 4, 5]

Note: the + operator is equivalent to an extend call.

python> a = [1,2,3]
python> a += [4,5]
python> a
[1, 2, 3, 4, 5]

Python Walrus Operator

The walrus operator := is a nice way to avoid repetitions of function calls and statements. It simplifies your code. You can compare the two following code snippets:

grades = [1, 12, 14, 17, 5, 8, 20]

stats = {
    'nb_grades': len(grades),
    'sum': sum(grades),
    'mean': sum(grades)/len(grades)
}
grades = [1, 12, 14, 17, 5, 8, 20]

stats = {
    'nb_grades': (nb_grades := len(grades)),
    'sum': (sum_grades := sum(grades)),
    'mean': sum_grades/nb_grades
}

Note: The parentheses are mandatory for the plain assignment to work.

Same goes for function calls:

foo = "hello world!"
if (n := len(foo)) > 4:
    print(f"foo is of length {n}")
foo is of length 12

In the above snippet, the len() method has only been called once instead of twice. More generally, you can assign values to variables on the fly without having to call the same methods more than once.

Important: The python walrus operator := (officially known as assignment expression operator) has been introduced by Python 3.8. This mean, once implemented, your code won’t be backward compatible anymore.

Note: The “walrus operator” affective appellation is due to its resemblance to the eyes and tusks of a walrus.

More examples

python> [name for _name in ["OLIVIER", "OLIVIA", "OLIVER"] if "vi" in (name := _name.lower())]
['olivier', 'olivia']

In this example, we are iterating through the list, storing each item of the list into the temporary _name variable. Then, we apply the lower() string method on the _name object, turning the upper case string into lower case. Next, we store the lower case value into the name variable using the walrus operator. Finally, we filter the values using the predicate to only keep the names containing “vi”.

The alternative without the walrus operator would have been:

python> [name.lower() for name in ["OLIVIER", "OLIVIA", "OLIVER"] if "vi" in name.lower()]
['olivier', 'olivia']

As you can see, this above code snipped is less “optimized” as you call the len() method twice on the same object.

You can also use the walrus operator without performing any filtering. The following code works like a charm:

python> [name for _name in ["OLIVIER", "OLIVIA", "OLIVER"] if (name := _name.lower())]
['olivier', 'olivia', 'oliver']

However, this is highly counter-intuitive and calls for errors. The presence of if let’s assume that there is a conditional check and filtering in place. Which is not the case. The codebase does not benefit from such design since the following snippet – on top of being clearer – is strictly equivalent in term of outcome:

python> [name.lower() for name in ["OLIVIER", "OLIVIA", "OLIVER"]]
['olivier', 'olivia', 'oliver']

Note: Developers tend to be very smart people. We sometime like to show off our smarts by demonstrate our mental juggling abilities. Resist the tentation of writing complex code. Programming is a social activity (e.g. all the collaborative open source projects). Be professional and keep the nerdy brilliant workarounds for your personal projects. Follow the KISS principle: Keep It Simple Stupid. Clarity is all that matters. You want to maximize the codebase discoverability.

Last but not least, you can also use the walrus operator inside while conditional statements:

while (answer := int(input())) != 42:
    print("This is not the Answer to the Ultimate Question of Life, the Universe, and Everything.")
python> python script.py
7
This is not the Answer to the Ultimate Question of Life, the Universe, and Everything.
3
This is not the Answer to the Ultimate Question of Life, the Universe, and Everything.
42

Note: In the above examples we have used list comprehension. This article (in progress) explains this design in more detail.

One more thing

You cannot do a plain assignment with the walrus operator. At least, it is not that easy:

python> a := 42
  File "<stdin>", line 1
    a := 42
      ^^
SyntaxError: invalid syntax

For the above code snippet to work, you need to enclose the assignment expression around parentheses:

python> a = 42
python> (a := 18)
18
python> print(a)
18

Who told you that Software Developers were not sentimental? ❤️

Do not return null values

It is always a bad idea to write a method returning a null value because it requires the client to remember to check for null:

  1. it is foisting problems upon the caller methods, postponing conditional checks and creating work for later on that one might forget to implement.
  2. it invites errors; all it takes is one missing none checks to have your application spinning out of control.
  3. if you are still tempted to return none from a method, consider throwing an exception or special-case object instead.

Note: this works well for methods returning iterable types e.g. list, set, strings… as you can just return an empty list. Returning an empty custom-object such as instantiated class is more hairy. In such edge-case only, you can return null.

from config.constants import REGISTERED_ITEMS

def retrieve_registered_item_information(item):
    if item is not None:
        item_id = item.get_id()
        if item_id is not None:
            return REGISTERED_ITEMS.get_item(item_id).get_info()

As a demonstration for our second aforementioned point, did you noticed the fact that there wasn’t a null check in the last line? What about the item not being retrieved among the REGISTERED_ITEMS but you, still trying to access the get_info() method of a None element? You will get an error for sure.

Example

You have the following structured json object you want to extract the id from:

{
    "id": "42",
    "name": "some_name",
    "data": [...]
}
def get_object_id(object: dict) -> str | None:
    candidate_id = None
    try:
        candidate_id = object["id"]
    except KeyError as message:
        logger.error(f"Error retrieving the object id: {message}")
    return candidate_id

The above method is not ideal:

  1. You have a mixed type between str and None. You do not want your method to be schizophrenic but rather it to be type-consistent instead.
  2. Some python versions do not accept type annotations with | operators. Python 3.9+ solves this problem.

Instead, always favour the following accessor method as a nice remedy:

def get_object_id(object: dict) -> str:
    candidate_id = ""
    try:
        candidate_id = object["id"]
    except KeyError as message:
        logger.error(f"Error retrieving the object id: {message}")
    return candidate_id

There are multiple reasons and benefits for that:

  1. You remove the returned type ambiguity and the returned type is consistent. Whatever might happens, you always return a string value.
  2. It removes the type annotation error you might get on the old python versions otherwise.

Caution: Last but not the least, note that python always implicitly returns a None value; wether you add a return statement or not. The three following code snippets are equivalent:

def foo():
    pass

def faa():
    return

def fii():
    return None

You can try it yourself:

python> result = foo()
python> print(result)
None

The advantage of explicitly using a return is that it acts as a break statement:

def fuu():
    return 42
    a = 5
    return a
python> fuu()
42

Notes:

  • We have used a logger object to handle logs for us. More on python logging in this article (in progress).
  • We have prefixed the names of our accessor methods using get. More on how to find meaningful names for your variables in this article (in progress).

As a conclusion, you do not want to rock the boat. Be careful when returning a null value and always favour other options. Your code will be way cleaner and you will minimize the chance of getting an error 🛶

What is the difference between random choices and random sample python functions.

The main difference is:

  • The random choices function draws random elements from the sequence, eventually including duplicates since the same element in the sequence can be drawn multiple times (with replacement).

  • The random sample function draws random elements from the sequence but without duplicates since once elements are picked they are removed from the sequence to sample (without replacement).

Imagine a lottery game. In the first case, we are putting back the ball into the drawing lot while on the second case, the ball is definitively removed from the set.

Note: without duplicated does not mean the same value cannot be seen several times in the resulting sampled sequence. If several balls hold the same value in the lot, and these balls are drawn, the occurrence will also be reflected in the result. But the same ball, once drawn, cannot be drawn ever again.

Examples: choices vs. sample

pool = "abcd"
print("".join((random.choices(pool, k=5))))

In the above example we are extracting 5 random elements from the pool of elements to pick from. Once drawn, the value is replaced in the pool so it can eventually be picked up again:

addaa

Note: since you have a replacement, you can extract more elements than the population originally contains. Hence k=5 while the sequence only contains 4 elements.

population = "abcd"
print("".join((random.sample(population, k=4))))

In the aforementioned example, we ask the random function to draw 4 elements from the population without replacement. This means that once the element is picked up, it is removed from the population:

abdc

Note: since you do not have a replacement, you cannot have k to be greater than the length of your sequence. Should you try, you will get a ValueError: Sample larger than population or is negative error raised at you.

Use-case Example: Alphanumeric Generation

To generate a sequence of 32 random alphanumeric values:

population = string.ascii_letters + string.digits
result = "".join((random.choices(population, k=32)))
print(result)
coqHR7HrsCsKcvGvmlClJI1OnWZjvwH9

Notes:

  • It is always a very bad idea to use python’s randomness to generate passwords or other sensitive secrets since the random function is not really random.
  • Worse than that, never write your own random function as it is prone to vulnerabilities. Rather use a public and scientifically proved method (this is the beauty of maths: being capable of generating indecipherable secrets, with the generating method know by all).
  • Even worse: never base the robustness of your encryption protocol on the secrecy of the generation method.

As least those are the (rare) few takeaways I still remember from my Master of Science in Computer Science specialized in Cybersecurity.

And you, what is your score on Root Me? 🏴‍☠️

Why using snake case

The snake case is a style of writing in which each space is replaced by an underscode and letters writen in lowercase:

this_is_what_the_snake_case_style_looks_like

Since the snake_case format is mandatory for some objects, it is then easier to stick to it and generalised its usage throughout.

It is important that you use the snake case because your python code might simply do not work otherwise:

from helpers.math-module import incr

def test_incr() -> None:
    result = incr(42)
    print(result == 43)

if __name__ == "__main__":
    test_incr()
> python main.py
File "path/to/snake_case_project/main.py", line 1
from helpers.math-module import incr
                 ^
SyntaxError: invalid syntax

Instead, change for the following syntax:

snake_case_project/
    ├── helpers
        ├── __init__.py
        └── math_module.py
    └── main.py
from helpers.math_module import incr

def test_incr() -> None:
    result = incr(42)
    print(result == 43)

if __name__ == "__main__":
    test_incr()
> python main.py
True

Admit that for a language like Python, the snake_case is rather well adapted! 🐍

What is python __init__.py file for?

The Python __init__.py file serves two main functions:

  1. It is used to label a directory as a python package to make it visible so other python files can re-use the nested resources (e.g. the incr method defined inside helpers/file1.py):

    from helpers.file1 import incr
    
    result = incr(42)
    assert result == 43
    

    A side effect is that – with some not-recommended workarounds – developers do not have to care about the method’s location in your package hierarchy:

        helpers/
        ├── __init__.py
        ├── file1.py
        ├── file2.py
        ├── ...
        └── fileN.py
    

    For that, simply fill the __init__.py file with the following content:

    from file1 import *
    from file2 import *
    ...
    from fileN import *
    

    Therefore, even though it is always a good practice to explicitely mention the source, they can simply use:

    from helpers import incr
    
    result = incr(42)
    assert result == 43
    
  2. It is used to define variables or to initialise objects like logging at the package level and import time (to make them accesible at a global package level):

    from helpers.file3 import MY_VAR
    
    print(MY_VAR)
    

Still blur? Thereafter an easy example to understand:

First, let’s plot some context

You have the following project structure:

playground_packages
├── helpers/
    └── utils.py
└── main.py

The utils.py file contains:

def incr(n:list[float]) -> list[float]:
    return [x+1 for x in n]

if __name__ == "__main__":
    pass

Note: you could have also used the map and lambda methods instead. However, here is a nice example to show about list comprehension. The alternave version would have looked like:

list(map(lambda x: x+1, n))

The main.py file is looking like the following:

from helpers.utils import incr

def main() -> None:
    result = incr([1,2,3,4,5])
    print(result)

if __name__ == "__main__":
    main()

Notes:

  • Why we haven’t used import helpers.utils or import * is explained here (to do).
  • The if __name__ == "__main__" conditional statement is explained here (to do).

__init__.py to label a folder as Python package

Jumping back to our example, if you try to run the code with the current configuration, you will get the following error:

> python main.py
Traceback (most recent call last):
File "path/to/playground_package/main.py", line 1, in <module>
    from helpers.utils import incr
ModuleNotFoundError: No module named 'helpers'

This is because the helpers directory is not yet visible for Python. Python is actively looking for Python packages but cannot find any. A package is a folder that contains a __init__.py file.

Simply edit our current structure for the following:

playground_packages
├── helpers/
    ├── __init__.py
    └── utils.py
└── main.py

Now, it you try again, it will succeed:

> python main.py
[2, 3, 4, 5, 6]

The main take-away is:

If you want to split-up your code in different folders and files (to make your code more readable and debuggable), you must create a __init__.py file under each folder so they become visible for Python and can therefore be used and refered to in your code using import.

__init__.py to define global variables

In our previous example, the __init__.py file is empty. We can edit it, adding the following line:

MY_LIST = [2,4,6,8,10]

This variable is accessible even by the main function:

from helpers import MY_LIST
from helpers.utils import incr

def main() -> None:
    result = incr(MY_LIST)
    print(result)

if __name__ == "__main__":
    main()
> python main.py
[3, 5, 7, 9, 11]

Note: it is better to define variables in a config.py or constants.py file rather than in a __init__.py file. However, __init__.py becomes handy when it comes to instanciate objects such as logging or dynaconf. More on that will follow in another article.

You are now ready to fit your code together like Russian dolls 🪆

Use Poetry as Python Package Manager

Installing poetry is super easy. On macOS, simply run:

brew install poetry

Now, let’s have a look how to use it.

Poetry Cheat Sheet

I have gathered for you in this section the poetry commands you will always need. You can refer to this section later on and simply save the link for later use.

poetry init
poetry install
poetry update
poetry add <your-python-package>
poetry run

Getting started with an example

  1. Create your python project and move under its repository:

    mkdir playground-poetry && cd playground-poetry
    
  2. Init poetry. You will be prompted to fill-in the following configuration fields:

    > poetry init
    Package name [playground-poetry]:
    Version [0.1.0]:
    Description []:
    Author [None, n to skip]:
    License []:
    Compatible Python versions [^3.10]:
    Would you like to define your main dependencies interactively? (yes/no) [yes]
    Would you like to define your development dependencies interactively? (yes/no) [yes]
    Do you confirm generation? (yes/no) [yes]
    
  3. This will generate the pyproject.toml configuration file:

    [tool.poetry]
    name = "playground-poetry"
    version = "0.1.0"
    description = "A primer on poetry."
    authors = ["John Doe <john.doe@gmail.com>"]
    
    [tool.poetry.dependencies]
    python = "^3.10"
    
    [tool.poetry.dev-dependencies]
    
    [build-system]
    requires = ["poetry-core>=1.0.0"]
    build-backend = "poetry.core.masonry.api"
    

After generation, your project’s architecture should look like the following:

playground-poetry
└── pyproject.toml

Add a package

You can simply use the following command e.g.:

poetry add black

In our case, we wanted to install the python black code formatter.

Note: for more general codebase formatting, I recommend super-linter.

You will see that our pyproject.toml poetry configuration file has been updated as it now contains reference to the black package:

[tool.poetry]
name = "playground-poetry"
version = "0.1.0"
description = "A primer on poetry."
authors = ["John Doe <john.doe@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.10"
black = "^22.10.0"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Note: you are wondering what the weird ^ sign stands for? You well soon find an article about it.

You can check that black is indeed accessible and installed within poetry virtual environment via:

> poetry run black --version
black, 22.10.0 (compiled: yes)
Python (CPython) 3.10.4

The pyproject.toml is not the only thing that has changed. If you have a look on our project’s architecture, you will see that it now contains an additional poetry.lock file:

playground-poetry
├── poetry.lock
└── pyproject.toml

Note: poetry is storing the state the same way terraform is doing. If you are new to poetry this might be too much details right now. If you want to know more about poetry.lock an article will follow soon!

Run python code on a poetry environment

Imagine that you have now a python code that requires a couple of dependencies (e.g. could be black, pandas, logging, etc.) to run. E.g.:

"""
A simple module containing maths methods.
"""


def add(number_1: float, number_2: float) -> float:
    """
    Add the numbers.
    Args:
        number_1 (float): the first number.
        number_2 (float): the second number.
    Returns:
        float: the sum of both numbers.

    >>> add(-2, 1)
    -1
    >>> add(42, 0)
    42
    """
    return number_1 + number_2


def main() -> None:
    """
    Main function.
    """

    res = add(4, 7)
    print(res)


if __name__ == "__main__":
    main()

with the following project’s architecture:

playground-poetry
├── playground_poetry
    ├── __init__.py
    └── main.py
├── poetry.lock
└── pyproject.toml

Note: fair enough, in our project we do not really need those dependencies at this point, but let’s say that this is just an extract and other parts in the code do actually use logging or pandas.

You have those dependencies installed on your poetry environment (you can see them on the pyproject.toml dependencies section).

You then need to execute your python code within the umbrella of this poetry virtual environment.

This is done using poetry run python <your-python-file>.

In our example:

> poetry run python playground_poetry/main.py
11

Note: we recommend you to have a similar architecture on your projects as it makes the development of python’s package easier, using the snake_case.

your-project-name
└── your_project_name
    ├── __init__.py
    └── main.py

Get started on a cloned poetry project

Now let’s say you already inherit from an existing poetry project with an already existing pyproject.toml and poetry.lock files.

The first time you need to instantiate the virtual environment, reading from the pyproject.toml file:

poetry install

This will create the poetry.lock file if not existing or resolves the dependencies if so.

You can also update the poetry.lock file if needed:

poetry update

Note: more info https://python-poetry.org/docs/cli/.

Run the extra mile using poetry run and a Makefile

Let’s improve our example project.

Have you noticed that to run our main.py file, you need to explicitly state the whole path:

poetry run python playground_poetry/main.py

You can make things better, editing the pyproject.toml file for the following:

[tool.poetry]
name = "playground-poetry"
version = "0.1.0"
description = "A primer on poetry."
authors = ["John Doe <john.doe@gmail.com>"]
packages = [{include="playground_poetry"}]

[tool.poetry.dependencies]
python = "^3.10"
black = "^22.10.0"
pandas = "^1.5.1"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.scripts]
main = "playground_poetry.main:main"

and from now on simply do the same thing as before but shorter (and faster) via:

> poetry run main
11

Note: this is thanks to the packages line, the __init__.py file nested under it that makes the main.py file visible and the [tool.poetry.scripts] layer.

But that’s not all: we can do even better. Let’s make this command even shorter, saving it under a Makefile command.

In our example project, let’s add a Makefile with the following lines:

main:
    poetry run main

The final structure should look like the following:

playground-poetry
├── playground_poetry
    ├── __init__.py
    └── main.py
├── Makefile
├── poetry.lock
└── pyproject.toml

Finally, you can run the main function using:

> make main
poetry run main
11

And you thought our main job were to “write” code? The less the more! 😇

What have you learned

  • You can create a poetry environment from scratch to manage your python dependencies.
  • You can reuse an existing one.
  • You can scale and automate using Makefile commands.
  • You got a short primer on Software Development Standardization and Best Principles.