Python positional-only parameters

The Python positional-only parameter has been introduced with the Python 3.8 release. It is a way to specify which parameters must be positional-only arguments – i.e. which parameters cannot be given as keyword-parameters. It uses the / parameter syntax. Elements positioned on the left side will be then turned into positional-only parameters.

def f(a, b, /, c): ...

The above method will only accepts calls of the following form:

python> f(1, 2, 3)
python> f(1, 2, c=3)

Note: c can be either given a position or keyword argument.

On the contrary, the following calls will raise a TypeError: f() got some positional-only arguments passed as keyword arguments error as a and b – being on the left of / – cannot be exposed as possible keyword-arguments:

python> f(1, b=2, 3)
python> f(1, b=2, c=3)
python> f(a=1, b=2, 3)

Use cases

The / positional-only parameter syntax is helpful in the following use-cases:

  • It precludes keyword arguments from being used when the parameter’s name is not necessary, confusing or misleading e.g. len(obj=my_list). As stated in the documentation, the obj keyword argument would here impairs readability.

  • It allows the name of the parameter to be changed in the future without introducing breaking changes in he client as they can still be passed through **kwargs to be used in two ways:

def transform(name, /, **kwargs):
    print(name.upper())

In the above snippet, you can access the name‘s value either using name or kwargs.get("name").

Last but not the least, it can also be used to prevent overwriting pre-set keyword-arguments in methods that still need to accept arbitrary keyword arguments (via kwargs):

def initialize(name="unknown", /, **kwargs):
    print(name, kwargs)

The name keyword-argument is protected and could not be overwritten, even if mistakenly captured a second time by the kwargs argument:

python> initialize(name="olivier")
unknown {'name': 'olivier'}

Notes:

  • Python does not allow positional arguments after keyword arguments because of the left-side/right-side of the / operator thingy.
  • Python does not allow positional arguments after keyword arguments because of the left-side/right-side of the * operator thingy.
  • *args are collecting as positional arguments.
  • **kwargs are collecting as keyword-arguments.

Python Pickle Serialization

pickle allows you to serialize and de-serialize Python objects to save them into a file for future use. You can then read this file and extract the stored Python objects, de-serializing them so they can be integrated back into the code’s logic.

You just need the two basic commands: pickle.dump(object, file) and pickle.load(file).

Below, a round trip example:

import pickle

FILENAME = "tmp.pickle"
original_list = ["banana", "apple", "pear"]

with open(FILENAME, "wb") as file:
    pickle.dump(original_list, file)
file.close()

with open(FILENAME, "rb") as file:
    retrieved_list = pickle.load(file)
file.close()

print(retrieved_list) # ["banana", "apple", "pear"]

Python Itertools Cycle

The itertools.cycle() method is a nice way to iterate through an iterable – e.g. a list – indefinitely in a cycling way. When all the elements are exhausted, the elements are once again red from the beginning.

from itertools import cycle

duty_persons = ["Olivier", "David", "Goran",]
duty_persons_cycle = cycle(duty_persons)

for _ in range(6):
    print(next(duty_persons_cycle))

The above snippet will returned the following output:

Olivier
David
Goran
Olivier
David
Goran

As you can see in the above example, the cycle method is quite helpful in a couple of situations:

  • establishing a rolling list of duty-persons, rotating on a regular base;
  • can be used when scrapping the web to cycle between hosts to outrun anti-bot policies;
  • any use-case you might think of…

Python uuid

uuid is a python module providing immutable UUID (Universally Unique IDentifier URN Namespace) objects and functions e.g. uuid4().

The generated UUID objects are unique since it generates IDs based on time and computer hardware.

python> from uuid import uuid4
python> uuid_object = uuid4()
python> uuid_object
UUID('ce8d1fee-bc31-406b-8690-94c01caabcb6')

python> str(uuid_object)
'ce8d1fee-bc31-406b-8690-94c01caabcb6'

Those can be used to generate random strings that can serve as unique identifier across a given namespace – e.g. if you want to generate temporary filenames on the fly located under a specific folder:

for _ in range(10):
    file = open(f"/tmp/{uuid4()}.txt", "a")
    file.write("hello world!")
    file.close()

Note: despite the above snippet working like a charm, it is better to use open() with the context manager with to make sure the close() function will always be called should an error occurs during the write operation.

Hazardous: uuid1() compromises privacy since it uses the network address of the computer to generate the unique ID. Thus, it could be reverse-engineered and retrieved. To prevent this, always choose one of the latest functions e.g. uuid4() or uuid5() as the previous ones rapidly get depreciated.

Note: to know more about URIs, URLs and URNs https://olivierbenard.fr/change-the-url-of-a-git-repository/.

Python extend vs. append

  • append() appends the given object at the end of the list. The length is only incremented of +1.
  • extend() extends the list by appending the elements it contains one by one into the list. The length is incremented by the number of elements.

In more details

The append() method is straight forward:

python> a = [1,2,3]
python> a.append(4)
python> a
[1, 2, 3, 4]

However, what happens if you need to add multiple elements at the same time? You could call the append() method several times, but would it work to give the list of new elements to be added into the original list directly as a parameter? Let’s give a try:

python> a = [1,2,3]
python> a.append([4,5])
python> a
[1, 2, 3, [5, 6]] # vs. expected [1, 2, 3, 4, 5, 6]

The extend() method intends to solve this problem:

python> b = [1,2,3]
python> b.append([4,5])
python> b
[1, 2, 3, 4, 5]

Note: the + operator is equivalent to an extend call.

python> a = [1,2,3]
python> a += [4,5]
python> a
[1, 2, 3, 4, 5]

Python Walrus Operator

The walrus operator := is a nice way to avoid repetitions of function calls and statements. It simplifies your code. You can compare the two following code snippets:

grades = [1, 12, 14, 17, 5, 8, 20]

stats = {
    'nb_grades': len(grades),
    'sum': sum(grades),
    'mean': sum(grades)/len(grades)
}
grades = [1, 12, 14, 17, 5, 8, 20]

stats = {
    'nb_grades': (nb_grades := len(grades)),
    'sum': (sum_grades := sum(grades)),
    'mean': sum_grades/nb_grades
}

Note: The parentheses are mandatory for the plain assignment to work.

Same goes for function calls:

foo = "hello world!"
if (n := len(foo)) > 4:
    print(f"foo is of length {n}")
foo is of length 12

In the above snippet, the len() method has only been called once instead of twice. More generally, you can assign values to variables on the fly without having to call the same methods more than once.

Important: The python walrus operator := (officially known as assignment expression operator) has been introduced by Python 3.8. This mean, once implemented, your code won’t be backward compatible anymore.

Note: The “walrus operator” affective appellation is due to its resemblance to the eyes and tusks of a walrus.

More examples

python> [name for _name in ["OLIVIER", "OLIVIA", "OLIVER"] if "vi" in (name := _name.lower())]
['olivier', 'olivia']

In this example, we are iterating through the list, storing each item of the list into the temporary _name variable. Then, we apply the lower() string method on the _name object, turning the upper case string into lower case. Next, we store the lower case value into the name variable using the walrus operator. Finally, we filter the values using the predicate to only keep the names containing “vi”.

The alternative without the walrus operator would have been:

python> [name.lower() for name in ["OLIVIER", "OLIVIA", "OLIVER"] if "vi" in name.lower()]
['olivier', 'olivia']

As you can see, this above code snipped is less “optimized” as you call the len() method twice on the same object.

You can also use the walrus operator without performing any filtering. The following code works like a charm:

python> [name for _name in ["OLIVIER", "OLIVIA", "OLIVER"] if (name := _name.lower())]
['olivier', 'olivia', 'oliver']

However, this is highly counter-intuitive and calls for errors. The presence of if let’s assume that there is a conditional check and filtering in place. Which is not the case. The codebase does not benefit from such design since the following snippet – on top of being clearer – is strictly equivalent in term of outcome:

python> [name.lower() for name in ["OLIVIER", "OLIVIA", "OLIVER"]]
['olivier', 'olivia', 'oliver']

Note: Developers tend to be very smart people. We sometime like to show off our smarts by demonstrate our mental juggling abilities. Resist the tentation of writing complex code. Programming is a social activity (e.g. all the collaborative open source projects). Be professional and keep the nerdy brilliant workarounds for your personal projects. Follow the KISS principle: Keep It Simple Stupid. Clarity is all that matters. You want to maximize the codebase discoverability.

Last but not least, you can also use the walrus operator inside while conditional statements:

while (answer := int(input())) != 42:
    print("This is not the Answer to the Ultimate Question of Life, the Universe, and Everything.")
python> python script.py
7
This is not the Answer to the Ultimate Question of Life, the Universe, and Everything.
3
This is not the Answer to the Ultimate Question of Life, the Universe, and Everything.
42

Note: In the above examples we have used list comprehension. This article (in progress) explains this design in more detail.

One more thing

You cannot do a plain assignment with the walrus operator. At least, it is not that easy:

python> a := 42
  File "<stdin>", line 1
    a := 42
      ^^
SyntaxError: invalid syntax

For the above code snippet to work, you need to enclose the assignment expression around parentheses:

python> a = 42
python> (a := 18)
18
python> print(a)
18

Who told you that Software Developers were not sentimental? ❤️

Do not return null values

It is always a bad idea to write a method returning a null value because it requires the client to remember to check for null:

  1. it is foisting problems upon the caller methods, postponing conditional checks and creating work for later on that one might forget to implement.
  2. it invites errors; all it takes is one missing none checks to have your application spinning out of control.
  3. if you are still tempted to return none from a method, consider throwing an exception or special-case object instead.

Note: this works well for methods returning iterable types e.g. list, set, strings… as you can just return an empty list. Returning an empty custom-object such as instantiated class is more hairy. In such edge-case only, you can return null.

from config.constants import REGISTERED_ITEMS

def retrieve_registered_item_information(item):
    if item is not None:
        item_id = item.get_id()
        if item_id is not None:
            return REGISTERED_ITEMS.get_item(item_id).get_info()

As a demonstration for our second aforementioned point, did you noticed the fact that there wasn’t a null check in the last line? What about the item not being retrieved among the REGISTERED_ITEMS but you, still trying to access the get_info() method of a None element? You will get an error for sure.

Example

You have the following structured json object you want to extract the id from:

{
    "id": "42",
    "name": "some_name",
    "data": [...]
}
def get_object_id(object: dict) -> str | None:
    candidate_id = None
    try:
        candidate_id = object["id"]
    except KeyError as message:
        logger.error(f"Error retrieving the object id: {message}")
    return candidate_id

The above method is not ideal:

  1. You have a mixed type between str and None. You do not want your method to be schizophrenic but rather it to be type-consistent instead.
  2. Some python versions do not accept type annotations with | operators. Python 3.9+ solves this problem.

Instead, always favour the following accessor method as a nice remedy:

def get_object_id(object: dict) -> str:
    candidate_id = ""
    try:
        candidate_id = object["id"]
    except KeyError as message:
        logger.error(f"Error retrieving the object id: {message}")
    return candidate_id

There are multiple reasons and benefits for that:

  1. You remove the returned type ambiguity and the returned type is consistent. Whatever might happens, you always return a string value.
  2. It removes the type annotation error you might get on the old python versions otherwise.

Caution: Last but not the least, note that python always implicitly returns a None value; wether you add a return statement or not. The three following code snippets are equivalent:

def foo():
    pass

def faa():
    return

def fii():
    return None

You can try it yourself:

python> result = foo()
python> print(result)
None

The advantage of explicitly using a return is that it acts as a break statement:

def fuu():
    return 42
    a = 5
    return a
python> fuu()
42

Notes:

  • We have used a logger object to handle logs for us. More on python logging in this article (in progress).
  • We have prefixed the names of our accessor methods using get. More on how to find meaningful names for your variables in this article (in progress).

As a conclusion, you do not want to rock the boat. Be careful when returning a null value and always favour other options. Your code will be way cleaner and you will minimize the chance of getting an error 🛶

What is the difference between random choices and random sample python functions.

The main difference is:

  • The random choices function draws random elements from the sequence, eventually including duplicates since the same element in the sequence can be drawn multiple times (with replacement).

  • The random sample function draws random elements from the sequence but without duplicates since once elements are picked they are removed from the sequence to sample (without replacement).

Imagine a lottery game. In the first case, we are putting back the ball into the drawing lot while on the second case, the ball is definitively removed from the set.

Note: without duplicated does not mean the same value cannot be seen several times in the resulting sampled sequence. If several balls hold the same value in the lot, and these balls are drawn, the occurrence will also be reflected in the result. But the same ball, once drawn, cannot be drawn ever again.

Examples: choices vs. sample

pool = "abcd"
print("".join((random.choices(pool, k=5))))

In the above example we are extracting 5 random elements from the pool of elements to pick from. Once drawn, the value is replaced in the pool so it can eventually be picked up again:

addaa

Note: since you have a replacement, you can extract more elements than the population originally contains. Hence k=5 while the sequence only contains 4 elements.

population = "abcd"
print("".join((random.sample(population, k=4))))

In the aforementioned example, we ask the random function to draw 4 elements from the population without replacement. This means that once the element is picked up, it is removed from the population:

abdc

Note: since you do not have a replacement, you cannot have k to be greater than the length of your sequence. Should you try, you will get a ValueError: Sample larger than population or is negative error raised at you.

Use-case Example: Alphanumeric Generation

To generate a sequence of 32 random alphanumeric values:

population = string.ascii_letters + string.digits
result = "".join((random.choices(population, k=32)))
print(result)
coqHR7HrsCsKcvGvmlClJI1OnWZjvwH9

Notes:

  • It is always a very bad idea to use python’s randomness to generate passwords or other sensitive secrets since the random function is not really random.
  • Worse than that, never write your own random function as it is prone to vulnerabilities. Rather use a public and scientifically proved method (this is the beauty of maths: being capable of generating indecipherable secrets, with the generating method know by all).
  • Even worse: never base the robustness of your encryption protocol on the secrecy of the generation method.

As least those are the (rare) few takeaways I still remember from my Master of Science in Computer Science specialized in Cybersecurity.

And you, what is your score on Root Me? 🏴‍☠️

Why using snake case

The snake case is a style of writing in which each space is replaced by an underscode and letters writen in lowercase:

this_is_what_the_snake_case_style_looks_like

Since the snake_case format is mandatory for some objects, it is then easier to stick to it and generalised its usage throughout.

It is important that you use the snake case because your python code might simply do not work otherwise:

from helpers.math-module import incr

def test_incr() -> None:
    result = incr(42)
    print(result == 43)

if __name__ == "__main__":
    test_incr()
> python main.py
File "path/to/snake_case_project/main.py", line 1
from helpers.math-module import incr
                 ^
SyntaxError: invalid syntax

Instead, change for the following syntax:

snake_case_project/
    ├── helpers
        ├── __init__.py
        └── math_module.py
    └── main.py
from helpers.math_module import incr

def test_incr() -> None:
    result = incr(42)
    print(result == 43)

if __name__ == "__main__":
    test_incr()
> python main.py
True

Admit that for a language like Python, the snake_case is rather well adapted! 🐍

What is python __init__.py file for?

The Python __init__.py file serves two main functions:

  1. It is used to label a directory as a python package to make it visible so other python files can re-use the nested resources (e.g. the incr method defined inside helpers/file1.py):

    from helpers.file1 import incr
    
    result = incr(42)
    assert result == 43
    

    A side effect is that – with some not-recommended workarounds – developers do not have to care about the method’s location in your package hierarchy:

        helpers/
        ├── __init__.py
        ├── file1.py
        ├── file2.py
        ├── ...
        └── fileN.py
    

    For that, simply fill the __init__.py file with the following content:

    from file1 import *
    from file2 import *
    ...
    from fileN import *
    

    Therefore, even though it is always a good practice to explicitely mention the source, they can simply use:

    from helpers import incr
    
    result = incr(42)
    assert result == 43
    
  2. It is used to define variables or to initialise objects like logging at the package level and import time (to make them accesible at a global package level):

    from helpers.file3 import MY_VAR
    
    print(MY_VAR)
    

Still blur? Thereafter an easy example to understand:

First, let’s plot some context

You have the following project structure:

playground_packages
├── helpers/
    └── utils.py
└── main.py

The utils.py file contains:

def incr(n:list[float]) -> list[float]:
    return [x+1 for x in n]

if __name__ == "__main__":
    pass

Note: you could have also used the map and lambda methods instead. However, here is a nice example to show about list comprehension. The alternave version would have looked like:

list(map(lambda x: x+1, n))

The main.py file is looking like the following:

from helpers.utils import incr

def main() -> None:
    result = incr([1,2,3,4,5])
    print(result)

if __name__ == "__main__":
    main()

Notes:

  • Why we haven’t used import helpers.utils or import * is explained here (to do).
  • The if __name__ == "__main__" conditional statement is explained here (to do).

__init__.py to label a folder as Python package

Jumping back to our example, if you try to run the code with the current configuration, you will get the following error:

> python main.py
Traceback (most recent call last):
File "path/to/playground_package/main.py", line 1, in <module>
    from helpers.utils import incr
ModuleNotFoundError: No module named 'helpers'

This is because the helpers directory is not yet visible for Python. Python is actively looking for Python packages but cannot find any. A package is a folder that contains a __init__.py file.

Simply edit our current structure for the following:

playground_packages
├── helpers/
    ├── __init__.py
    └── utils.py
└── main.py

Now, it you try again, it will succeed:

> python main.py
[2, 3, 4, 5, 6]

The main take-away is:

If you want to split-up your code in different folders and files (to make your code more readable and debuggable), you must create a __init__.py file under each folder so they become visible for Python and can therefore be used and refered to in your code using import.

__init__.py to define global variables

In our previous example, the __init__.py file is empty. We can edit it, adding the following line:

MY_LIST = [2,4,6,8,10]

This variable is accessible even by the main function:

from helpers import MY_LIST
from helpers.utils import incr

def main() -> None:
    result = incr(MY_LIST)
    print(result)

if __name__ == "__main__":
    main()
> python main.py
[3, 5, 7, 9, 11]

Note: it is better to define variables in a config.py or constants.py file rather than in a __init__.py file. However, __init__.py becomes handy when it comes to instanciate objects such as logging or dynaconf. More on that will follow in another article.

You are now ready to fit your code together like Russian dolls 🪆