Factorize your pytest functions using the parameterized fixture.

The parametrized fixture is a convenient way to factorize your python test functions, avoid duplicates in your test code and help you stick to the DRY (Don’t Repeat Yourself) principle.

Note: you can use it after having installed the plugin via pip install parametrized.

Let’s demonstrate this with a quick and easy example. Let’s assume you have a function that returns the sum of the elements within a list:

def sum_list_elements(l):
    return sum(l)

You want to test the behavior of your function using pytest. In your Test Strategy, you want to test this function for different kind of inputs. A testing suit could look like:

def test_sum_list_no_elements():
    result = sum_list_elements([])
    assert result == 0

def test_sum_list_one_element():
    result = sum_list_elements([-2])
    assert result == -2

def test_sum_list_cancelling_elements():
    result = sum_list_elements([-3, 1, 2])
    assert result == 0

def test_sum_list_elements():
    result = sum_list_elements([1, 2, 3])
    assert result == 6

However, this means having a lot of redundant code. You can refactor the suit thanks to the parametrized fixture:

from parameterized import parameterized

@parameterized.expand([
    ([], 0),
    ([-2], -2),
    ([-3, 1, 2], 0),
    ([1, 2, 3], 6)
])
def test_sum_list_elements_suit(inputs, expected):
    result = sum_list_elements(inputs)
    assert result == expected

Here is the result of the tests:

zsh> poetry run pytest tests/test_parametrized.py
collected 4 items

tests/test_parametrized.py::test_sum_list_elements_suit_0 PASSED
tests/test_parametrized.py::test_sum_list_elements_suit_1 PASSED
tests/test_parametrized.py::test_sum_list_elements_suit_2 PASSED
tests/test_parametrized.py::test_sum_list_elements_suit_3 PASSED

======================= 4 passed in 0.01s =======================

To learn more about parametrization: Pytest Against a Wide Range of Data with Python hypothesis and Automatically Discover Edge Cases.

Disable pylint error checks

TL;DR: use # pylint: disable=error-type inline comments to disable error types or edit the .pylintrc file generated via pylint --generate-rcfile > .pylintrc.

If you are using pylint to run checks on the quality of your Python code, you might want to ignore some of the checks the tool is running on your codebase for you.

You can silence errors with inline comments (e.g. if you still want this check to be performed on your overall codebase but not for this particular snippet):

def f():
    pass

class NotAuthorized(Exception):
    def __init__(self, message=""):
        self.message = message
        super().__init__(self.message)

Running pylint on the above code gives you the following output:

1:0: C0116: Missing function or method docstring (missing-function-docstring)
1:0: C0103: Function name "f" doesn't conform to snake_case naming style (invalid-name)
4:0: C0115: Missing class docstring (missing-class-docstring)

On the opposite, the following snippets is rated 10/10 by pylint:

def f(): # pylint: disable=invalid-name, missing-function-docstring
    pass

class NotAuthorized(Exception): # pylint: disable=missing-class-docstring
    def __init__(self, message=""):
        self.message = message
        super().__init__(self.message)
Your code has been rated at 10.00/10 (previous run: 5.00/10, +5.00)

Note: you can disable multiple pylint errors with one single inline comment, using a comma as separator.

If you want to disable a specific error check for the whole codebase, you can create a .pylintrc at the root of your code:

zsh> poetry run pylint --generate-rcfile > .pylintrc

Then, navigate to the [MESSAGES CONTROL] section, editing the following lines with the error types you want to append:

disable=raw-checker-failed,
        bad-inline-option,
        locally-disabled,
        file-ignored,

Notes:

  • It is never a good practice to deactivating the error messages pylint is raising. Always try to work on it. For instance, too-man-arguments on a method probably means that you are missing one intermediary method and should refactor it.

  • pylint checks usually come in pair with black, mypy and unitests. You can group them to one target via a Makefile:

black:
    poetry run black --exclude=<excluded-folder> .

pylint:
    poetry run pylint .

mypy:
    poetry run mypy

test:
    poetry run pytest -vvs tests/

checks: black pylint mypy test
  • To keep my code DRY I avoid repeating information both in the code and in function/module docstrings. As explained in The Pragmatic Programmer by David Thomas & Andrew Hunt and Clean Code by Robert Martin, it makes the code less maintainable and enhance the risk of having the docstrings no longer aligned with the code (because not correctly updated). The code should be self-explanatory, well structured and sticking to good naming convention. The docstrings are only there to explain the why and not the how. Thus, why I often decide to silence the missing-function-docstring and missing-module-docstring since I will be force to add dummy docstrings otherwise.

Test Airflow DAG locally

Installing the python libraries

Airflow DAGs can be tested and integrated within your unittest workflow.

For that, apache-airflow and pytest are all the Python pip libraries you need.

First, import the libraries and retrieve the current working directory:

from pathlib import Path
from airflow.models import DagBag
from unittest.mock import patch
import pytest

SCRIPT_DIRECTORY = Path(__file__).parent

Collecting the DAGs in the DagBag

Second, you want to collect all the local dags you have under your dags/ folder and want to test. You can use airflow.models.DagBag. You can create a dedicated dag_bag function for that task:

@pytest.fixture()
def dag_bag() -> DagBag:
    dag_folder = SCRIPT_DIRECTORY / ".." / "dags"
    dag_bag = DagBag(
        dag_folder=dag_folder,
        read_dags_from_db=False,
    )
    return dag_bag

This function will return a collection of dags, parsed out from the local dag folder tree you have specified.

Note: this above function is tailored for a project with a similar structure:

airflow-dag-repo
├── dags # all your dags go there
    └── dag.py
├── airflow_dag_repo
    ├── __init__.py
    └── commons.py
├── tests
    └── test_dag.py
├── poetry.lock
└── pyproject.toml

Optional: I use poetry as Python package manager, you can learn more about it too here.

Note: the fixture decorator is used as a setup tool to initialize reusable objects at one place and pass them to all your test functions as arguments. Here, the dag_bag object can now be accessed by all the test functions in that module.

Running the test suit on the collected DAGs

Finally, you can implement your tests:

def test_dag_tasks_count(dag_bag):
    dag = dag_bag.get_dag(dag_id="your-dag-id")
    assert dag.task_count == 4

def test_dags_import_errors(dag_bag):
    assert dag_bag.import_errors == {}

You can check the full example on Github: airflow-dag-unittests

Note: you can wrap-up your test functions within a Class using unittest.TestCase as I did on the codebase on github.com/olivierbenard/airflow-dag-unittests. However, doing so prevents you from using fixtures. A work-around exists, I will let you check what I did.

Mocking Airflow Variables

If you are using Airflow Variables in your DAGs e.g.:

from airflow.models import Variable
MY_VARIABLE = Variable.get("my-variable")

You need to add the following lines:

@patch.dict(
    "os.environ",
    AIRFLOW_VAR_YOUR_VARIABLE="", # mock your variable, prefixed with AIRFLOW_VAR.
)
@pytest.fixture()
def dag_bag() -> DagBag:
    ...

Otherwise, you will stumble across the following error during your local tests:

raise KeyError(f"Variable {key} does not exist")
KeyError: 'Variable <your-variable> does not exist'

To conclude, Airflow DAGs are always a headache to test and integrate within your unittest workflow. I hope this makes it easier.

List Enabled Services per GCP projects

The bellow snippet will return the list of all enabled APIs (aka. Services) for the select Google Cloud Platform project.

import subprocess, os

project = "your-gcp-project"
export_dir = "export/"

command = (
    f"gcloud services list --enabled --project {project}"
    f" > {os.path.join(export_dir, project)}.txt"
)
status = subprocess.call(command, shell=True)
print("success:", status == 0, "\ncommand:", command)

The script relies on you having installed gcloud. The authentication is a one-time operation done during gcloud initialization after the installation, via gcloud init.

Note: In the aforementioned snippet, you can notice the fstring is written as a form of a multi-line statement. I prefer this convention, I explain why in more details here.

One idea for the visualization step would be then to display the list of enabled apis per project on a heat-map.

Why Monitoring Enabled APIs

Monitoring the Services/APIs you have enabled on your Google Cloud Platform projects becomes handy when you want to limit exposure (security) and cost-related fees (FinOps). E.g. at the time of this writing, enabling services like BigQuery on a project is just one-click away that can easily be done via the UI. Some of them can potentially bill you with a prime-time subscription fee of €300.

Python lists with trailing comma

In Python, you might have stumbled across lists ending with a trailing comma. Surprisingly, Python allows it, considering it as a valid syntax:

python> ["banana", "apple", "pear",]
["banana", "apple", "pear"]

There are multiple advantages adopting this convention. Ending your Python list with a trailing comma makes the list easier to edit – reducing the clutter in the git diff outcome – and makes future changes (e.g. adding an item to the list) less error-prone.

Reducing git diff clutter

Especially when your list is multi-lines, having a trailing comma makes the list easier to edit, reducing the clutter in the git diff outcome your version control framework presents to you.

Changing the following list:

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry",
]

to:

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry",
    "Bernard Clavel",
]

only involves a one-line change:

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry",
+   "Bernard Clavel",
]

versus a confusing 3 multi-lines difference git output otherwise:

names = [
    "Charles de Gaulle",
-   "Antoine de Saint-Exupéry"
+   "Antoine de Saint-Exupéry",
+   "Bernard Clavel"
]

No more breaking changes

Another advantage of having trailing commas in your Python lists is that it makes changes less error-prone (with the risk of missing a comma when adding a new item into the list):

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry"
    "Bernard Clavel"
]

Note: the above list is syntactically valid but will not return the expected outcome. Instead, it will trigger an implicit string literal concatenation.

['Charles de Gaulle', 'Antoine de Saint-ExupéryBernard Clavel']

Multiline Python fstring statement

In Python, you can write a string on multiple lines to increase codebase readability:

python> message = (
    "This is one line, "
    "this one continues.\n"
    "This one is new."
)
python> message
'This is one line, this one continues.\nThis one is new.'
python> print(message)
This is one line, this one continues.
This one is new.

This is purely visual and relies on wrapping the split sliced string within a tuple.

This becomes particularly handy if you are using a Python code formatter (e.g. black, mypy and pylint usually come together). If so, you might have stumbled on the line-too-long error messages.

One more example

def greet(name: str) -> None:
    message = (
        f"Hello {name}, this line "
        f"and this one "
        f"will be displayed on the same line.\n"
        f"but not this one"
    )
    print(message)
python> greet("Olivier")
Hello Olivier, this line and this one will be displayed on the same line.
but not this one

Python keyword-only parameters

Similar to Python Positional-Only Parameters but the other way around: parameters placed on the right side of the * syntax parameter will be coerced into the keyword-only parameter type.

def f(a, *, b, c):
    print(a, b, c)

In the above excerpt, a can be given either as positional or keyword parameter. However, b and c do not have other options beside being pass through keyword arguments:

python> f(1, b=2, c=3)
1 2 3
python> f(a=1, b=2, c=3)
1 2 3

Should you try something else, it will most likely fails:

python> f(1, 2, 3)
TypeError: f() takes 1 positional argument but 3 were given

Notes:

  • Python does not allow positional arguments after keyword arguments because of the left-side/right-side of the * operator thingy.
  • *args are collecting as positional arguments.
  • **kwargs are collecting as keyword-arguments.

Python positional-only parameters

The Python positional-only parameter has been introduced with the Python 3.8 release. It is a way to specify which parameters must be positional-only arguments – i.e. which parameters cannot be given as keyword-parameters. It uses the / parameter syntax. Elements positioned on the left side will be then turned into positional-only parameters.

def f(a, b, /, c): ...

The above method will only accepts calls of the following form:

python> f(1, 2, 3)
python> f(1, 2, c=3)

Note: c can be either given a position or keyword argument.

On the contrary, the following calls will raise a TypeError: f() got some positional-only arguments passed as keyword arguments error as a and b – being on the left of / – cannot be exposed as possible keyword-arguments:

python> f(1, b=2, 3)
python> f(1, b=2, c=3)
python> f(a=1, b=2, 3)

Use cases

The / positional-only parameter syntax is helpful in the following use-cases:

  • It precludes keyword arguments from being used when the parameter’s name is not necessary, confusing or misleading e.g. len(obj=my_list). As stated in the documentation, the obj keyword argument would here impairs readability.

  • It allows the name of the parameter to be changed in the future without introducing breaking changes in he client as they can still be passed through **kwargs to be used in two ways:

def transform(name, /, **kwargs):
    print(name.upper())

In the above snippet, you can access the name‘s value either using name or kwargs.get("name").

Last but not the least, it can also be used to prevent overwriting pre-set keyword-arguments in methods that still need to accept arbitrary keyword arguments (via kwargs):

def initialize(name="unknown", /, **kwargs):
    print(name, kwargs)

The name keyword-argument is protected and could not be overwritten, even if mistakenly captured a second time by the kwargs argument:

python> initialize(name="olivier")
unknown {'name': 'olivier'}

Notes:

  • Python does not allow positional arguments after keyword arguments because of the left-side/right-side of the / operator thingy.
  • Python does not allow positional arguments after keyword arguments because of the left-side/right-side of the * operator thingy.
  • *args are collecting as positional arguments.
  • **kwargs are collecting as keyword-arguments.

base64 encoding via CLI

Simply use base64 and base64 -d:

zsh> echo -n "username:password"|base64
dXNlcm5hbWU6cGFzc3dvcmQ=

Note: the -n option prevents echo to output a trailing newline (you do not want to encode \n).

zsh> echo dXNlcm5hbWU6cGFzc3dvcmQ=|base64 -d
username:password

base64 encoding is widely used in the web. It is mainly used to carry binary formatted data across channels that only support text content (e.g. HTTP requests or e-mail attachments). In the context of data-exchange through APIs, it is broadly used in the authentication process to include your credentials in the headers:

{
    "Authorization": "Basic <Base64Encoded(username:password)>"
}

Using the Curl Request command:

curl https://your-url
    -H "Authorization: Bearer {token}"

The token being most often your base64 encoded and colon separated credentials (or any other specified credentials).

Using Python

Base64 encoding/decoding with Python:

python> import base64
python> encoded = base64.b64encode(b"username:password")
python> print(encoded)
b'dXNlcm5hbWU6cGFzc3dvcmQ='

python> decoded = base64.b64decode(b'dXNlcm5hbWU6cGFzc3dvcmQ=')
python> print(decoded)
b'username:password'

Notes:

  • With the aforementioned python code, you will then have to decode the b”string” to string. This can be done using the .decode() method:
b'your-string'.decode()
  • If you are on VSC (Visual Studio Code) you can then encode/decode on the fly using an extension e.g. vscode-base64.

Python Pickle Serialization

pickle allows you to serialize and de-serialize Python objects to save them into a file for future use. You can then read this file and extract the stored Python objects, de-serializing them so they can be integrated back into the code’s logic.

You just need the two basic commands: pickle.dump(object, file) and pickle.load(file).

Below, a round trip example:

import pickle

FILENAME = "tmp.pickle"
original_list = ["banana", "apple", "pear"]

with open(FILENAME, "wb") as file:
    pickle.dump(original_list, file)
file.close()

with open(FILENAME, "rb") as file:
    retrieved_list = pickle.load(file)
file.close()

print(retrieved_list) # ["banana", "apple", "pear"]