Olivier Bénard, Author at Olivier Bénard

poetry local config file

Summary

When working with poetry, you can configure poetry with some specifics using the command line interface e.g.:

poetry config virtualenvs.create true

The recommendation here is to always hard-encode those configurations in a local file. This can be done adding a --local flag in the poetry config command:

poetry config --local virtualenvs.create true

The above command will create the a poetry.toml file at the root of your project, with the following information:

[virtualenvs]
create = true

This ensures that your configuration is always hard-encoded and replicable.

List the current poetry configuration

This can be done via a simple command:

poetry config --list

Use-case: system-git-client true

I rarely have to define poetry configurations, however it helps me to fix the following problem I encounter from time to time:

(1) I have a CI/CD pipeline running on Gitlab;

(2) This pipeline has 3 different stages: quality-checks, build and deploy;

(3) Each of these stages might require python dependencies. These dependencies are managed by poetry. You need the Gitlab CI/CD runner to install them within the virtual environment where the stages are gonna execute their scripts. This means, you need each job to run poetry install. E.g.:

variables:
  GIT_SUBMODULE_STRATEGY: recursive

stages:
  - test
  - build
  - deploy

quality-checks:
  image: "<your-custom-docker-path>/ci-cd-python-test-harness:latest"
  stage: test
  script:
    - <setup-gitlab-ssh-access>
    - <add-safe-git-directories>
    - poetry install
    - make checks

...

Note: here, the last line make checks triggers a Makefile action, running black, mypy, pylint and pytest on the codebase. You can have a look on what this make command looks like in the snippet immediately below:

black:
    poetry run black .

mypy:
    poetry run mypy <your-src-folder>

pylint:
    poetry run pylint <your-src-folder>

test:
    PYTHONWARNINGS=ignore poetry run pytest -vvvs <your-test-folder>

checks: black mypy pylint test

Thus, I have my CI/CD running poetry install before the CI/CD runner to be able to test my code in its virtual environment.

(4) Part of the poetry install will install all the dependencies my project contains, including git submodules. This is exactly where our issue lies!

$ poetry install
Creating virtualenv <your-repository>_-py3.10 in /home/gitlab-runner/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
No git repository was found at ../../<your-submodule-repository>.git
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

In order for your project to be able to use git submodules and the CI/CD to run successfully, you need to run the following command:

poetry config --local experimental.system-git-client true

This creates a poetry.toml file with the following lines:

[experimental]
system-git-client = true

This trick should fix the No git repository was found error occurring in your CI/CD pipeline.

Use Python Fixtures in Classes

TL;DR: use scope, @pytest.mark.usefixtures and request.cls to define your fixture as attribute of the class.

With pytest you can use fixtures to have a nice delimitation of responsibilities within your test modules, sticking to the Arrange-Act-Assert pattern:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

def test_reading_data(get_some_data):
    assert get_some_data == "get some data"

If the following code works, what about if you want to organize your tests functions within classes? Naively you would assume the following to be a fair implement:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

class TestDummy(unittest.TestCase):

    def test_dummy(self, get_some_data):
        assert get_some_data() == "get some data"

Running poetry run pytest -vvvs tests/path/to/test_module.py will return the following error in the traceback:

E       TypeError: TestDummy.test_dummy() missing 1 required positional argument: 'get_some_data'

In order to use python fixture within a class, you need to edit the above snippet for the following as you cannot call fixtures directly:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

class TestDummy(unittest.TestCase):

    @pytest.fixture(autouse=True)
    def _get_some_data(self, get_some_data):
        self.get_some_data = get_some_data

    def test_dummy(self):
        assert self.get_some_data == "get some data"

Note that _get_some_data will be called once per test by default which is inconvenient if you have to perform request through the network e.g. requests.get("https://www.google.com"). You can change this behaviour by adapting the scope:

@pytest.fixture(scope="module")
def get_some_data():
    yield "get some data"

@pytest.fixture(scope="class")
def define_get_data_attribute(request, get_some_data):
    request.cls._get_some_data = get_some_data

@pytest.mark.usefixtures("define_get_data_attribute")
class TestDummy(unittest.TestCase):

    def test_dummy(self):
        assert self._get_some_data == "get some data"

Note that the request object gives access to the requesting test context such as the cls attribute. More here.

mypy disable error

You can disable mypy error codes using the special comment # type: ignore[code, ...]. E.g.:

def f(): # type: ignore[call-arg]
    pass

To disable multiple codes in one line, simply separate the codes with a comma, e.g.:

def f(): # type: ignore[call-arg, arg-type]
    pass

You can also configure the mypy.ini configuration file directly to ignore specific error codes for the whole codebase. E.g.:

zsh> cat >> mypy.ini <<EOL
heredoc> [mypy]
heredoc> ignore_missing_imports = True
heredoc> EOL

More in this page: mypy.readthedocs.io/en/stable/config_file.html.

Note: it is never a good practice to deactivate the error messages mypy is raising. Always try to work on it. E.g. a too-many-arguments code on a method probably means that you are missing one intermediary method and should refactor your code.

Finally, mypy checks usually come in pair with black, pylint and unitests checks. You can combine error codes by-passing issued by multiple checkers on the same special inline comment:

@decorator() # type: ignore[call-arg, arg-type] # pylint: disable="no-value-for-parameter"
def f(): ...

More on how to disable pylint error checks: olivierbenard.fr/disable-pylint-error-checks.

Note: to know more about mypy error codes, you can visit mypy.readthedocs.io/en/stable/error_codes.html.

What is FinOps

FinOps helps you mitigate Cloud Spends by having (1) one central team of FinOps practitioners and (2) the rest of the engineering teams decentralized.

The one central team of FinOps practitioners is in charge of

providing a central Cloud Monitoring solution to real-time monitor your Cloud Spends with a very thin granularity (team-level) via heavy use of tags and automation;
identifying potential for savings (e.g. idle instances);
negotiating commitments and rates with Cloud Vendors (can help you saving up to 50% of Cloud Spends).

The rest of the engineering teams decentralized are aiming at

optimizing their Cloud usage against the efficiency metrics displayed by the monitoring solution;
challenging and implementing the FinOps practitioners’ recommendations;
re-allocating the spared budget on what truly matter (e.g. hiring new talents).

Cloud Spends = Usage x Rate

The formula for the Cloud Spends is the following: Cloud Spends = Usage x Rate.

With this in mind, the goal is to push cost-related accountability to the edges, having the expert teams managing their Cloud usage and having the central FinOps team gathering the organizational needs and negotiating global commitments out of it.

The main point of FinOps is not about saving costs but rather spending more on what truly matter for the business.

Outrun the common Cloud horror-story

FinOps will help you to retrieve visibility, predictability and budgetisation over Cloud spends and thus, opposes the common Cloud horror story:

Teams start their journey onto the Cloud. They love the scalability, high availability, and increased speed to innovation the tool brings.
Quickly, Cloud bills start skyrocketing out of control due to the lightly governed nature of Cloud spends.
Afraid, the Cloud adventure is brutally dampened after the last of too many wake-up call, leading to (a) a waterfall approach, (b) mixed and sometime conflicting cloud cost reductions strategies and (c) no trust and the Cloud solutions anymore.

Pytest against a wide range of data with Python hypothesis

The Hypothesis Python pytest library allows you to run your python tests against a wild range of data matching a set of hypothesis. In other words, your test function is provided with data matching the setup specifications and runs your Code Under Test (CUT) against it.

It is a nice way to automatically discover edge cases in your code without you even having to think about it.

Let’s go through an example. Let’s say you want to test the following function:

def divide_list_elements(my_list, denominator):
    return [item/denominator for item in my_list]

python> divide_list_elements([2, 4, 6], 2)
[1.0, 2.0, 3.0]

If you are like me, you would have then implemented your test strategy manually, grouped under a class because it is neat:

import unittest

class TestDivideListElements(unittest.TestCase):

    def test_divide_list_elements_one_element(self):
        result = divide_list_elements([42], 2)
        assert result == [21.0]

    def test_divide_list_elements_no_element(self):
        result = divide_list_elements([], 4)
        assert result == []

zsh> poetry run pytest tests/test_hypothesis.py::TestDivideListElements
collected 2 items

tests/test_hypothesis.py::TestDivideListElements::test_divide_list_elements_no_element PASSED
tests/test_hypothesis.py::TestDivideListElements::test_divide_list_elements_one_element PASSED

======================= 2 passed in 0.13s =======================

Well, all good right? We could have stopped there.

Now, let’s say, instead of manually defining your inputs, you let the hypothesis library managing this for you:

from hypothesis import given
from hypothesis import strategies as st

@given(st.lists(st.integers()), st.integers())
def test_divide_list_elements(input_list, input_denominator):
    result = divide_list_elements(input_list, input_denominator)
    expected = list(map(lambda x: x/input_denominator, input_list))
    assert result == expected

Running the test leaves you with an unexpected outcome:

zsh> poetry run pytest tests/test_hypothesis.py
>   return [item/denominator for item in my_list]
E   ZeroDivisionError: division by zero
E   Falsifying example: test_divide_list_elements(
E       input_list=[0],
E       input_denominator=0,
E   )

tests/test_hypothesis.py:17: ZeroDivisionError

You have obviously forgot to check about the division by 0…

Here is what is so beautiful about hypothesis: it can discovers for you edge cases you have forgotten about.

Let’s (1) redact our function:

def divide_list_elements(my_list: list, denominator: int) -> list:
    assert denominator != 0
    return [item/denominator for item in my_list]

(2) change the tests and (3) add the faulty test-case into our testing suit:

import pytest
import unittest
from hypothesis import given, example
from hypothesis import strategies as st


@given(st.lists(st.integers()), st.integers())
@example(input_list=[42], input_denominator=0)
def test_divide_list_elements(input_list, input_denominator):
    if input_denominator == 0:
        with pytest.raises(AssertionError) as exc_info:
            divide_list_elements(input_list, input_denominator)
            expected = "assert 0 != 0"
            assert expected == str(exc_info.value)
    else:
        result = divide_list_elements(input_list, input_denominator)
        expected = list(map(lambda x: x/input_denominator, input_list))
        assert result == expected

(4) run the tests again:

zsh> poetry run pytest -s tests/test_hypothesis.py::test_divide_list_elements
collected 1 item

tests/test_hypothesis.py::test_divide_list_elements PASSED

========================= 1 passed in 0.28s =====================

Notes:

The assert denominator != 0 statement ensures our function is given correct preconditions (referring to The Pragmatic Programmer, design by contracts and crash early! “Dead Programs Tell No Lies: A dead program does a lot less damage than a crippled one.“)
The @example(input_list=[42], input_denominator=0) statement is using the example decorator, which ensures a specific example is always tested. Here we want to make sure this edge case we missed is always checked.
The with pytest.raises(AssertionError) ensures that whatever is in the next block of code should raise an AssertionError exception. If not exception is raised, the test fails.

To learn more about parametrization: Factorize your pytest functions using the parameterized fixture.

Factorize your pytest functions using the parameterized fixture.

The parametrized fixture is a convenient way to factorize your python test functions, avoid duplicates in your test code and help you stick to the DRY (Don’t Repeat Yourself) principle.

Note: you can use it after having installed the plugin via pip install parametrized.

Let’s demonstrate this with a quick and easy example. Let’s assume you have a function that returns the sum of the elements within a list:

def sum_list_elements(l):
    return sum(l)

You want to test the behavior of your function using pytest. In your Test Strategy, you want to test this function for different kind of inputs. A testing suit could look like:

def test_sum_list_no_elements():
    result = sum_list_elements([])
    assert result == 0

def test_sum_list_one_element():
    result = sum_list_elements([-2])
    assert result == -2

def test_sum_list_cancelling_elements():
    result = sum_list_elements([-3, 1, 2])
    assert result == 0

def test_sum_list_elements():
    result = sum_list_elements([1, 2, 3])
    assert result == 6

However, this means having a lot of redundant code. You can refactor the suit thanks to the parametrized fixture:

from parameterized import parameterized

@parameterized.expand([
    ([], 0),
    ([-2], -2),
    ([-3, 1, 2], 0),
    ([1, 2, 3], 6)
])
def test_sum_list_elements_suit(inputs, expected):
    result = sum_list_elements(inputs)
    assert result == expected

Here is the result of the tests:

zsh> poetry run pytest tests/test_parametrized.py
collected 4 items

tests/test_parametrized.py::test_sum_list_elements_suit_0 PASSED
tests/test_parametrized.py::test_sum_list_elements_suit_1 PASSED
tests/test_parametrized.py::test_sum_list_elements_suit_2 PASSED
tests/test_parametrized.py::test_sum_list_elements_suit_3 PASSED

======================= 4 passed in 0.01s =======================

To learn more about parametrization: Pytest Against a Wide Range of Data with Python hypothesis and Automatically Discover Edge Cases.

Disable pylint error checks

TL;DR: use # pylint: disable=error-type inline comments to disable error types or edit the .pylintrc file generated via pylint --generate-rcfile > .pylintrc.

If you are using pylint to run checks on the quality of your Python code, you might want to ignore some of the checks the tool is running on your codebase for you.

You can silence errors with inline comments (e.g. if you still want this check to be performed on your overall codebase but not for this particular snippet):

def f():
    pass

class NotAuthorized(Exception):
    def __init__(self, message=""):
        self.message = message
        super().__init__(self.message)

Running pylint on the above code gives you the following output:

1:0: C0116: Missing function or method docstring (missing-function-docstring)
1:0: C0103: Function name "f" doesn't conform to snake_case naming style (invalid-name)
4:0: C0115: Missing class docstring (missing-class-docstring)

On the opposite, the following snippets is rated 10/10 by pylint:

def f(): # pylint: disable=invalid-name, missing-function-docstring
    pass

class NotAuthorized(Exception): # pylint: disable=missing-class-docstring
    def __init__(self, message=""):
        self.message = message
        super().__init__(self.message)

Your code has been rated at 10.00/10 (previous run: 5.00/10, +5.00)

Note: you can disable multiple pylint errors with one single inline comment, using a comma as separator.

If you want to disable a specific error check for the whole codebase, you can create a .pylintrc at the root of your code:

zsh> poetry run pylint --generate-rcfile > .pylintrc

Then, navigate to the [MESSAGES CONTROL] section, editing the following lines with the error types you want to append:

disable=raw-checker-failed,
        bad-inline-option,
        locally-disabled,
        file-ignored,

Notes:

It is never a good practice to deactivating the error messages pylint is raising. Always try to work on it. For instance, too-man-arguments on a method probably means that you are missing one intermediary method and should refactor it.
pylint checks usually come in pair with black, mypy and unitests. You can group them to one target via a Makefile:

black:
    poetry run black --exclude=<excluded-folder> .

pylint:
    poetry run pylint .

mypy:
    poetry run mypy

test:
    poetry run pytest -vvs tests/

checks: black pylint mypy test

To keep my code DRY I avoid repeating information both in the code and in function/module docstrings. As explained in The Pragmatic Programmer by David Thomas & Andrew Hunt and Clean Code by Robert Martin, it makes the code less maintainable and enhance the risk of having the docstrings no longer aligned with the code (because not correctly updated). The code should be self-explanatory, well structured and sticking to good naming convention. The docstrings are only there to explain the why and not the how. Thus, why I often decide to silence the missing-function-docstring and missing-module-docstring since I will be force to add dummy docstrings otherwise.

Test Airflow DAG locally

Installing the python libraries

Airflow DAGs can be tested and integrated within your unittest workflow.

For that, apache-airflow and pytest are all the Python pip libraries you need.

First, import the libraries and retrieve the current working directory:

from pathlib import Path
from airflow.models import DagBag
from unittest.mock import patch
import pytest

SCRIPT_DIRECTORY = Path(__file__).parent

Collecting the DAGs in the DagBag

Second, you want to collect all the local dags you have under your dags/ folder and want to test. You can use airflow.models.DagBag. You can create a dedicated dag_bag function for that task:

@pytest.fixture()
def dag_bag() -> DagBag:
    dag_folder = SCRIPT_DIRECTORY / ".." / "dags"
    dag_bag = DagBag(
        dag_folder=dag_folder,
        read_dags_from_db=False,
    )
    return dag_bag

This function will return a collection of dags, parsed out from the local dag folder tree you have specified.

Note: this above function is tailored for a project with a similar structure:

airflow-dag-repo
├── dags # all your dags go there
    └── dag.py
├── airflow_dag_repo
    ├── __init__.py
    └── commons.py
├── tests
    └── test_dag.py
├── poetry.lock
└── pyproject.toml

Optional: I use poetry as Python package manager, you can learn more about it too here.

Note: the fixture decorator is used as a setup tool to initialize reusable objects at one place and pass them to all your test functions as arguments. Here, the dag_bag object can now be accessed by all the test functions in that module.

Running the test suit on the collected DAGs

Finally, you can implement your tests:

def test_dag_tasks_count(dag_bag):
    dag = dag_bag.get_dag(dag_id="your-dag-id")
    assert dag.task_count == 4

def test_dags_import_errors(dag_bag):
    assert dag_bag.import_errors == {}

You can check the full example on Github: airflow-dag-unittests

Note: you can wrap-up your test functions within a Class using unittest.TestCase as I did on the codebase on github.com/olivierbenard/airflow-dag-unittests. However, doing so prevents you from using fixtures. A work-around exists, I will let you check what I did.

Mocking Airflow Variables

If you are using Airflow Variables in your DAGs e.g.:

from airflow.models import Variable
MY_VARIABLE = Variable.get("my-variable")

You need to add the following lines:

@patch.dict(
    "os.environ",
    AIRFLOW_VAR_YOUR_VARIABLE="", # mock your variable, prefixed with AIRFLOW_VAR.
)
@pytest.fixture()
def dag_bag() -> DagBag:
    ...

Otherwise, you will stumble across the following error during your local tests:

raise KeyError(f"Variable {key} does not exist")
KeyError: 'Variable <your-variable> does not exist'

To conclude, Airflow DAGs are always a headache to test and integrate within your unittest workflow. I hope this makes it easier.

List Enabled Services per GCP projects

The bellow snippet will return the list of all enabled APIs (aka. Services) for the select Google Cloud Platform project.

import subprocess, os

project = "your-gcp-project"
export_dir = "export/"

command = (
    f"gcloud services list --enabled --project {project}"
    f" > {os.path.join(export_dir, project)}.txt"
)
status = subprocess.call(command, shell=True)
print("success:", status == 0, "\ncommand:", command)

The script relies on you having installed gcloud. The authentication is a one-time operation done during gcloud initialization after the installation, via gcloud init.

Note: In the aforementioned snippet, you can notice the fstring is written as a form of a multi-line statement. I prefer this convention, I explain why in more details here.

One idea for the visualization step would be then to display the list of enabled apis per project on a heat-map.

Why Monitoring Enabled APIs

Monitoring the Services/APIs you have enabled on your Google Cloud Platform projects becomes handy when you want to limit exposure (security) and cost-related fees (FinOps). E.g. at the time of this writing, enabling services like BigQuery on a project is just one-click away that can easily be done via the UI. Some of them can potentially bill you with a prime-time subscription fee of €300.

Python lists with trailing comma

In Python, you might have stumbled across lists ending with a trailing comma. Surprisingly, Python allows it, considering it as a valid syntax:

python> ["banana", "apple", "pear",]
["banana", "apple", "pear"]

There are multiple advantages adopting this convention. Ending your Python list with a trailing comma makes the list easier to edit – reducing the clutter in the git diff outcome – and makes future changes (e.g. adding an item to the list) less error-prone.

Reducing git diff clutter

Especially when your list is multi-lines, having a trailing comma makes the list easier to edit, reducing the clutter in the git diff outcome your version control framework presents to you.

Changing the following list:

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry",
]

to:

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry",
    "Bernard Clavel",
]

only involves a one-line change:

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry",
+   "Bernard Clavel",
]

versus a confusing 3 multi-lines difference git output otherwise:

names = [
    "Charles de Gaulle",
-   "Antoine de Saint-Exupéry"
+   "Antoine de Saint-Exupéry",
+   "Bernard Clavel"
]

No more breaking changes

Another advantage of having trailing commas in your Python lists is that it makes changes less error-prone (with the risk of missing a comma when adding a new item into the list):

names = [
    "Charles de Gaulle",
    "Antoine de Saint-Exupéry"
    "Bernard Clavel"
]

Note: the above list is syntactically valid but will not return the expected outcome. Instead, it will trigger an implicit string literal concatenation.

['Charles de Gaulle', 'Antoine de Saint-ExupéryBernard Clavel']