Python files

This post is part of the Python Crash Course series. The chronological order on how to read the articles is to be found on the agenda.

In the previous post (see python-via-command-line) we have interacted with python and wrote our very first python instructions.

However, this stateless way of using python is not very handy as – once the session is over – the lines of code cannot be accessed anymore.

The solution is to write our code in files. Instructions can be saved in python files. A python file is just an ordinary file that ends with the .py extension.

Generic python file

The general template for your python files is as follow:

"""
Few lines describing what your file
is useful for (optional but good practice).
"""

if __name__ == "__main__":
    # your instructions go there
    # these are just inline-comments
    # that are going to be ignored.
    # ...
    # PS. indentations are important.

For instance, here are the content of a file I named first_python_script.py:

"""
Short script containing my very first
Python instructions.
"""

if __name__ == "__main__":
    a = 41
    b = 1
    print(a+b)
    print("foo")

Executing the Python file

You can then tell python to execute your script via a command line instruction in the terminal:

zsh> python path/to/your/file/filename.py

Here is what I obtain after I have run the following command:

zsh> python ./first_python_script.py
42
foo

Note: you can write python files in a text editor – even on Microsoft Word. However, there is some special softwares on the market that help you to write code. They provide autocompletion, coloration and a lot of other useful features. You can check https://code.visualstudio.com/.

What comes next?

You have python installed in your system.

You can interact with python and execute python code; either via command lines via terminal prompts or running python scripts containing your python instructions.

It’s now time to deep-dive into the python syntax and explore the possibilities offered by this programming language.

Python via command line

This post is part of the Python Crash Course series. The chronological order on how to read the articles is to be found on the agenda.

In the previous post (see python-as-a-program) we have installed python. It’s now time to play around with it.

First, make sure python is installed:

zsh> python --version
Python 3.11.4 # your version may differ

Launch the program:

zsh> python
>>>

Note: see how the terminal prompt has changed. This shows you are now within the python program.

It’s all fun and games

We are now free to play around the way we like:

>>> 2+2
4
>>> my_string = "hello world!"
>>> print(my_string)
hello world!
>>> a = 4
>>> b = 5
>>> a*b
20
>>> my_number = 42
>>> my_number += 1
>>> print(my_number)
43

When you have had enough, you can simply write quit() and then hit Enter to call the exiting method:

>>> quit()

Note: you have to press Enter for your inline-command to be executed. Outputs are displayed on the next lines before the terminal handovers the process back to you.

Until you get stuck

It might happen you sometime get stuck with your program endlessly looping, performing never ending computations in the background.

This is the case if you have a while loop with no exit conditions:

>>> while True:
...    print("foo")
...
foo
foo
foo
foo
^C
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
KeyboardInterrupt

You can interrupt such never-ending states by sending an exit signal. This is done by pressing Control + C.

Where to go now

By now, you have Python installed and you can run it in your terminal.

It’s a good start but not really convenient so far: after you have exited the program, all your instructions are gone.

It might be ok if you just want to play around. However, we want to be able to save our instructions so we can start building up on it next time we resume back to work.

What you want is to start writing your code into files so your code can actually be saved and retrieved after the session has ended.

This is what we gonna learn in the next chapter: python-files.

Python as a program

This post is part of the Python Crash Course series. The chronological order on how to read the articles is to be found on the agenda.

Like any other softwares

Like any other software, Python is a program that can be launched via the Terminal.

For instance, you can perfectly start a web browser session via the terminal:

zsh> open -a "Google Chrome" https://olivierbenard.fr

Same goes for Python:

zsh> python
>>> 2+2
4
>>> quit()

Software updates

Python is constantly updated; you can check the active releases and deprecated versions from the official page: https://www.python.org/downloads/.

An overview of the status for the different versions is accessible here: https://devguide.python.org/versions/.

Each version brings new functionalities to the code and correct existing bugs. Like a castle of cards, a constant improvement is going on.

For instance, the new 3.8 version brought multiple interesting features on the table such as the walrus operator and positional parameters.

A more extended view is given here: https://docs.python.org/3/whatsnew/3.8.html.

To track down each version – by convention – we use an incrementing sequence of numbers. More on that here: https://en.wikipedia.org/wiki/Software_versioning.

You can check the current installed version you might have on your system:

zsh> python --version
Python 3.11.4

As you can see, the semantic versioning has 3 components: 3.11.4 (major, minor and patch respectively).

You can have multiple python versions installed on your system. More on that here: pyenv-python-version-manager.

But that’s too much information, let’s leave it for now.

Python installation

To install python, simply pick the version you want from the official page and follow the instructions: https://www.python.org/downloads/.

I recommend going for the latest stable released version (3.12.1 at this time of writing).

Now that you have python installed, let’s start playing with it: python-via-command-line.

jq linux command for JSON processing

Recently I came across a very nice tool to process JSON data from your terminal: jq.

jq is a nice tool to use in your command lines to filter and process JSON streams that you can link directly in your CLI pipes.

Installation

zsh> brew install jq # MacOS
zsh> jq --version
jq-1.6

 Examples

jq can be used to prettify JSON inputs:

zsh> echo '{"fruits": ["apple", "peer", "banana"]}' | jq "."
{
"fruits": [
    "apple",
    "peer",
    "banana"
]
}

It can also read directly from files:

zsh> cat >frenchmen.json<<EOF
heredoc> {"frenchmen": [{"name": "Napoleon"}, {"name": "Degaulle"}, {"name": "Exupery"}]}
heredoc> EOF

zsh> jq "." frenchmen.json
{
"frenchmen": [
    {
    "name": "Napoleon"
    },
    {
    "name": "Degaulle"
    },
    {
    "name": "Exupery"
    }
]
}

You can access specific properties e.g. “name”:

zsh> jq '.frenchmen | .[].name' frenchmen.json
"Napoleon"
"Degaulle"
"Exupery"

You can slice through arrays:

zsh> echo '[1,2,3,4,5,6]' | jq ".[2:-1]"
[
    3,
    4,
    5
]

You can also use functions:

zsh> echo '["blue", "white", "red"]' | jq '.[] | length'
4
5
3

zsh> echo '["blue", "white", "red"]' | jq '[.[] | length] | max'
5

zsh> echo '[1,2,3,4,5,6]' | jq '.[] | select(.%2==0)'
2
4
6

A more extensive list can be found in this article: https://www.baeldung.com/linux/jq-command-json

Defining and using terraform modules

What is a terraform module?

A terraform module is like a black box. When called, it performs the logic you have encapsulated in it.

Variables can be used as inputs to parametrized the terraform configuration being executed within the black box.

When to use a terraform module?

A terraform module becomes handy when you have to set up the exact same resources multiple times across multiple configuration files. You can see it as a factorization method that prevents you from writing the exact same configuration block every time. Instead, you have one block that you can refer to, passing in some input variables and that can perform the action you want with the provided parameters.

Writing local files without using a module

Let’s say you have to provision your infrastructure, always having to write local files in it. Writing a local file in terraform is dead simple: it only takes a filename and a content as parameters. Terraform even maintains a local_file resource block in their Terraform Registry.

If you need to write two local files, you could add the following lines to a main.tf file:

resource "local_file" "my_file_1" {
    content = "This is the content of file 1."
    filename = "files/my_file_1.txt"
}

resource "local_file" "my_file_2" {
    content = "This is the content of file 2."
    filename = "files/my_file_2.txt"
}

Then run:

zsh> terraform init
zsh> terraform apply
zsh> tree .
.
├── files
│   ├── my_file_1.txt
│   └── my_file_2.txt
├── main.tf
└── terraform.tfstate

As you can see, the two files have been created. You can even check the content:

zsh> cat files/my_file_1.txt
This is the content of file 1.

However, because of the redundancy, this is not a good pattern. To be better of we will have to make good use of the terraform modularity. Hence, let’s move on into the next section!

Writing local files using a module

To reduce the code redundancy, you could write a terraform module to write your local files. The logic remains the same as before. The only thing to change is that, instead of taking hard-encoded filename and content parameters, our local_file resource block will read those values from input variables.

Following up on our aforementioned example, we decide to encapsulate the logic for the local_file creation under the tf-module/ folder:

.
├── tf-module
│   ├── main.tf
│   └── variables.tf
└──main.tf

tf-module/variables.tf:

variable filename {
    type = string
    nullable = false
}

variable content {
    type = string
    nullable = false
}

tf-module/main.tf:

resource "local_file" "file" {
    content = var.content
    filename = var.filename
}

./main.tf:

locals {
    filenames_postfix = toset(["1", "2", "3", "4"])
}

module "local_files" {
    for_each = local.filenames_postfix
    source = "./tf-module"
    filename = "files/my_file_${each.value}.txt"
    content = "This is the content of file ${each.value}."
}

Note: for_each only works with sets and maps. Thus, we need to convert of list of strings into a set element. An alternative is given in the following snippet.

variable "filenames_postfix" {
    type = set(string)
    default = ["1", "2", "3", "4"]
}

module "local_files" {
    for_each = var.filenames_postfix
    source = "./tf-module"
    filename = "files/my_file_${each.value}.txt"
    content = "This is the content of file ${each.value}."
}

Note: a local variable is only accessible within the local module i.e. the same local namespace. On the other hand, a terraform variable is globally accessible even though defined at a local terraform module level. You can either pick one or the other according to what suits your design pattern contract best.

You can now apply the terraform code:

zsh> terraform init
zsh> terraform apply
zsh> tree .
.
├── tf-module
│   ├── main.tf
│   └── variables.tf
├── files
│   ├── my_file_1.txt
│   ├── my_file_2.txt
│   ├── my_file_3.txt
│   └── my_file_4.txt
├── main.tf
└── terraform.tfstate

Using a remote terraform module

The source attribute of the local_files block module allows you to pinpoint at the module you want to refer to. So far, we have only used a local reference (./tf-module in our case) but it is also possible to finger point at a remote terraform module, for instance stored on a remote Gitlab or Github repository.

In our case, we stored our re-usable terraform modules in a Github repository, publicly accessible at github.com/olivierbenard/terraform-modules. Then, wherever we are, we can refer to it and use those module in our local terraform projects:

module "local_files" {
    source = "git::ssh://git@github.com/olivierbenard/terraform-modules.git//tf-local-file?ref=master"
    for_each = toset(["1", "2", "3"])
    filename = "files/my_file_${each.value}.txt"
    content = "This is the content of file ${each.value}."
}

Real-Case Example: Provisioning Secret Variables on GCP

So far we have played around, creating local files. However, in real-life, we might have few but little use of them. A more realistic use of terraform modules might be for instance for provisioning and storing sensitive data (e.g. password, variables…) on Google Cloud Platform Secret Manager using terraform as Infrastructure as Code (IaC).

Those credentials and variables can then be access by other processes, e.g. Airflow DAGs running on Google Cloud Cloud Composer using airflow.models.Variable.

from airflow.models import Variable
RETRIEVED_VARIABLE = "{{var.value.my_variable}}"

Notes:

  1. In the above snippet we have used a Jinja template to retrieve the variable stored on Google Secret Manager. The value is only gonna get replaced at run time. If you need to access the variable during build time, you need to use Variable.get("my_variable") instead.

  2. Your Airflow DAGs can only access the variables stored on Google Secret Manager if you have configured Cloud Composer to do so. More on the official documentation. An important remarque is that, to be visible by your DAGs, the variables and connections stored on Google Cloud Secret Manager need to match the following template: airflow-variables-<your_variable> and airflow-connections-<your_connection>.

The terraform module wrapping up the logic to provision secrets on Google Cloud Platform (GCP) is available at: github.com/olivierbenard/terraform-modules/tf-airflow-variable.

You can re-use it, using similar blocks of codes:

module "airflow_variable" {
    source = "git::ssh://git@github.com/olivierbenard/terraform-modules.git//tf-airflow-variable?ref=master"
    for_each = fileset("${var.root_directory}/path/to/af/vars/files/${var.env}", "*")
    airflow_variable_name = each.value
    airflow_variable_value = file("${var.root_directory}/path/to/af/vars/files/${var.env}/${each.value}")
    airflow_variable_location = "europe-west3"
}

Notes:

  1. var.root_directory can be defined by terragrunt to be equal to abspath(get_terragrunt_dir())

  2. var.env can be defined by terragrunt to be equal to read_terragrunt_config("env.hcl").locals.environment

pyenv python version manager

Why using pyenv?

 System Python

By default python comes pre-installed within your operating system.

If you are a Mac or Linux user, you can see the “System Python” that comes installed on your operating system:

zsh> which python
/usr/bin/python

Note: this version of python is available to all users (as reflected by its location).

However, this might not be the version you need:

zsh> /usr/local/bin/python3 --version
Python 3.6.8

Another problem is that by running sudo pip install <your-package>, you will be installing the Python package globally. What about if another needs another version of the package e.g. a slightly older version of the package or if two projects requires two different versions because of breaking changes introduced in the newer version?

Last but not the least, some operating system relies heavily on Python to perform operations. Installing a new version of Python could seriously dampen your ability to use your OS.

Pyenv

The logical place to look for to solve all the problems inherent to System Python is pyenv.

Pyenv is a great tool for managing multiple Python versions that can coexists simultaneously on your OS. You can then easily switch between the installed versions and use virtual environments to manage Python packages associated with each Python versions.

Installation

You need to install the following dependencies:

brew install openssl readline sqlite3 xz zlib

Add them within the PATH (macOS):

echo 'export PATH="/usr/local/opt/openssl@3/bin:$PATH"' >> ~/.zshrc
echo 'export LDFLAGS="-L/usr/local/opt/openssl@3/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/usr/local/opt/openssl@3/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/openssl@3/lib/pkgconfig"' >> ~/.zshrc
echo 'export PATH="/usr/local/opt/sqlite/bin:$PATH"' >> ~/.zshrcc
echo 'export LDFLAGS="-L/usr/local/opt/sqlite/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/usr/local/opt/sqlite/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/sqlite/lib/pkgconfig"' >> ~/.zshrc
echo 'export LDFLAGS="-L/usr/local/opt/zlib/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/usr/local/opt/zlib/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/zlib/lib/pkgconfig"' >> ~/.zshrc

Note: Pyenv comes with a set of useful dependencies:

  1. pyenv: The actual pyenv application
  2. pyenv-virtualenv: Plugin for pyenv and virtual environments
  3. pyenv-update: Plugin for updating pyenv
  4. pyenv-doctor: Plugin to verify that pyenv and build dependencies are installed
  5. pyenv-which-ext: Plugin to automatically lookup system commands

Then, install pyenv using the pyenv-installer:

curl https://pyenv.run | bash

Restart the terminal for the PATH changes to be reflected:

exec $SHELL

Finally, check that everything did worked it:

zsh> pyenv -v
pyenv 2.3.19

Uninstall pyenv

On MacOS:

brew remove pyenv

Using pyenv

Install python versions

zsh> pyenv install --list
    3.6.2
    3.6.7
    3.7.2
    3.8.2
    3.9.12
    3.10.4
    3.11-dev
    3.11.4

All the installed version will be located in your pyenv root directory:

zsh> ls ~/.pyenv/versions/
3.10.6  3.11.4  3.6.15  3.6.8   3.6.9   3.8.16  3.8.17  3.9.9

Note: make sure to regularly pyenv update to have access to all the latest python versions.

Uninstall python versions

You can simply remove the versions from the pyenv root folder:

rm -rf ~/.pyenv/versions/3.10.6

or use the provided command:

pyenv uninstall 3.10.6

 Switching between Python versions

You can see the python version you have installed:

zsh> pyenv versions
* system (set by /Users/johndoe/.pyenv/version)
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
3.11.4

Note: the * indicated which version of python is currently active. By default, it is system python. You can confirm is using the which command:

zsh> which python3
/Users/johndoe/.pyenv/shims/python3

pyenv insert itself into the PATH. From the OS’s perspective, pyenv is the executable getting called when you execute which python3. If you want to see the actual, you need to run the following:

zsh> pyenv which python3
/usr/local/bin/python3

zsh> /usr/local/bin/python3 -V
Python 3.6.8

To shift between different versions, you can simply run:

zsh> pyenv global 3.11.4

zsh> python -V
Python 3.11.4

zsh> which python
python: aliased to python3

zsh> pyenv which python
/Users/johndoe/.pyenv/versions/3.11.4/bin/python

zqh> pyenv versions
system
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
* 3.11.4 (set by /Users/johndoe/.pyenv/version)

shell vs. local vs. global vs. system

Use-cases

Let’s explore the different commands and their use-cases.

To ensure that this python version is gonna be used by default:

zsh> pyenv global 3.11.4

To set an application-specific python version:

zsh> pyenv local 3.11.4

The above command creates a .python-version file in the current directory. If pyenv is active in this an environment, the file will automatically activate this version.

To set a shell-specific python version:

zsh> pyenv shell 3.11.4

The above command activates the version specific by setting the `PYENV_VERSION“ environment variable. It overwrites any application or global setting you have made. To deactivate the version, you need to use the –unset flag:

zsh> echo $PYENV_VERSION
3.11.4
zsh> pyenv shell --unset

Resolution

The System Python is overwritten by pyenv global (~/.pyenv/version).

The pyenv global is overwritten by pyenv local (.python-version file).

The pyenv local is overwritten by pyenv shell ($PYENV_VERSION).

Thus, to determine which version of python to use, pyenv will first look for $PYENV_VERSION, then .python-version then ~/.pyenv/version before finally settling down on the Python System if none of the above have been resolved.

Example

zsh> mkdir /tmp/test && cd /tmp/test

zsh> pyenv versions
* system (set by /Users/johndoe/.pyenv/version)
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
3.11.4
zsh> python -V
Python 3.6.8

zsh> pyenv local 3.8.16
zsh> ls -a
.       ..      .python-version
zsh> .python-version
Python 3.8.16
zsh> python -V
Python 3.8.16

zsh> python shell 3.9.9
zsh> echo $PYENV_VERSION
3.9.9
zsh> python -V
Python 3.9.9

And the other way around you can coax it out, layer by layer:

zsh> pyenv shell --unset
zsh> echo $PYENV_VERSION

zsh> python -V
Python 3.8.16

zsh> rm .python-version
zsh> python -V
Python 3.6.8

zsh> pyenv versions
* system (set by /Users/johndoe/.pyenv/version)
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
3.11.4

Virtual environments and pyenv

To quote this realpython.com article, virtual environments and pyenv are a match made in heaven. Whether you use virtualenv or venv, pyenv plays nicely with either.

You can create virtual environment using the following template:

pyenv virtualenv <python_version> <environment_name>

You can activate your environment running the following:

pyenv local <environment_name>

You can also do it manually:

zsh> pyenv activate <environment_name>
zsh> pyenv deactivate

Environment variables in Docker

To pass environment variables to a container via a Dockerfile, you have 3 main methods:

(1) Using the -e flag:

docker run -e MY_VAR1="foo" -e MY_VAR2="fii" my_docker_image

(2) Using a .env file:

docker run --env-file=.env my_docker_image

(3) Mounting a volume:

docker run -v /path/on/host:/path/in/container my_docker_image

Let’s explore the 3 main methods to pass environment variables to a container via Dockerfile.

To perfectly understand, I pre-baked a short example you can glance over in the pre-requisite section that we will reuse in the following sections.

Pre-requisite

You have the following structure:

    .
├── Dockerfile
└── scripts
│   └── create_file.sh

The `create_file.sh“ bash script contains the following lines:

#!/bin/bash

set -e

cat> dummy.txt <<EOL
here is my first var: $MY_VAR1
and here my second one: $MY_VAR2.
EOL

and the Dockerfile is as follow:

FROM python

WORKDIR /app
COPY . /app

COPY --chmod=755 ./scripts/create_file.sh /app/scripts/create_file.sh

CMD /app/scripts/create_file.sh && cat dummy.txt

Using the -e flag

zsh> docker build -t my_docker_image .

zsh> docker run -e MY_VAR1="foo" -e MY_VAR2="fii" my_docker_image
here is my first var: foo
and here my second one: fii.

Note: the variables are replaced at run time, i.e. in the processes launched by the CMD command. Should you run the aforementioned script in a RUN instruction (i.e. during the build time), the variables would not have been replaced. See the below example:

FROM python

WORKDIR /app
COPY . /app

COPY --chmod=755 ./scripts/create_file.sh /app/scripts/create_file.sh

RUN /app/scripts/create_file.sh

CMD cat dummy.txt

The above image would have rendered the following once executed:

zsh> docker build -t my_docker_image .

zsh> docker run -e MY_VAR1="foo" -e MY_VAR2="fii" my_docker_image
here is my first var:
and here my second one: .

It could be however cumbersome to pass on all your variables in a command line, especially when you have multiple environment variables. It can then become handy to rather use an environment file.

Using an .env file

You can achieve the same results as previously. Simply add a .env file in your root project containing the following lines:

MY_VAR1="foo"
MY_VAR2="fii"

You should now have the following structure:

.
├── Dockerfile
├── scripts
│   └── create_file.sh
└── .env

Then, simply run:

zsh> docker run --env-file=.env  my_docker_image
here is my first var: "foo"
and here my second one: "fii".

Note: You most always want to .gitignore the content of the .env file.

Mounting volumes

Sometimes you want to share files stored on your host system directly with the remote container. This can be useful for instance in the case where you want to share configuration files for a server that you intend to run on a Docker container.

Via this method, you can access directories from the remote.

So, let’s say you have the following architecture:

.
├── Dockerfile
├── conf
│   └── dummy.txt
└── scripts
│   └── create_file.sh

The content of the text file is as follow:

here is my first var: "foo"
and here my second one: "fii".

And the Dockerfile contains the following lines:

FROM python

WORKDIR /app
COPY . /app

CMD cat /conf/dummy.txt

You can therefore see the outcome:

zsh> docker build -t my_docker_image .

zsh> docker run -v /relative/path/project/conf:/conf my_docker_image
here is my first var: "foo"
and here my second one: "fii".

Should you change the content of the dummy.txt file on the host, the outcome would also be changed while running the image in the container without you needing to build the image again:

zsh> docker run -v /relative/path/project/conf:/conf my_docker_image
here is my first var: "fuu"
and here my second one: "faa".

Note: A container is a running instance of an image. Multiple containers can derive from an image. An image is a blueprint, a template for containers, containing all the code, libraries and dependencies.

You should be now ready to go!

MyPy missing imports

When running mypy on your codebase, you might sometimes encounter a similar error:

error: Library stubs not installed for "requests"

You can have a look at the official documentation on how to solve missing imports but the quickest way to solve it is to run the following:

mypy --install-types

You might also stumble across the similar untyped import issue:

module is installed, but missing library stubs or py.typed marker [import-untyped]

In that case, you can just create a mypy.ini file, populated with the following line:

echo "[mypy]\nignore_missing_imports = True" > mypy.ini

poetry local config file

Summary

When working with poetry, you can configure poetry with some specifics using the command line interface e.g.:

poetry config virtualenvs.create true

The recommendation here is to always hard-encode those configurations in a local file. This can be done adding a --local flag in the poetry config command:

poetry config --local virtualenvs.create true

The above command will create the a poetry.toml file at the root of your project, with the following information:

[virtualenvs]
create = true

This ensures that your configuration is always hard-encoded and replicable.

List the current poetry configuration

This can be done via a simple command:

poetry config --list

Use-case: system-git-client true

I rarely have to define poetry configurations, however it helps me to fix the following problem I encounter from time to time:

(1) I have a CI/CD pipeline running on Gitlab;

(2) This pipeline has 3 different stages: quality-checks, build and deploy;

(3) Each of these stages might require python dependencies. These dependencies are managed by poetry. You need the Gitlab CI/CD runner to install them within the virtual environment where the stages are gonna execute their scripts. This means, you need each job to run poetry install. E.g.:

variables:
  GIT_SUBMODULE_STRATEGY: recursive

stages:
  - test
  - build
  - deploy

quality-checks:
  image: "<your-custom-docker-path>/ci-cd-python-test-harness:latest"
  stage: test
  script:
    - <setup-gitlab-ssh-access>
    - <add-safe-git-directories>
    - poetry install
    - make checks

...

Note: here, the last line make checks triggers a Makefile action, running black, mypy, pylint and pytest on the codebase. You can have a look on what this make command looks like in the snippet immediately below:

black:
    poetry run black .

mypy:
    poetry run mypy <your-src-folder>

pylint:
    poetry run pylint <your-src-folder>

test:
    PYTHONWARNINGS=ignore poetry run pytest -vvvs <your-test-folder>

checks: black mypy pylint test

Thus, I have my CI/CD running poetry install before the CI/CD runner to be able to test my code in its virtual environment.

(4) Part of the poetry install will install all the dependencies my project contains, including git submodules. This is exactly where our issue lies!

$ poetry install
Creating virtualenv <your-repository>_-py3.10 in /home/gitlab-runner/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
No git repository was found at ../../<your-submodule-repository>.git
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

In order for your project to be able to use git submodules and the CI/CD to run successfully, you need to run the following command:

poetry config --local experimental.system-git-client true

This creates a poetry.toml file with the following lines:

[experimental]
system-git-client = true

This trick should fix the No git repository was found error occurring in your CI/CD pipeline.

Use Python Fixtures in Classes

TL;DR: use scope, @pytest.mark.usefixtures and request.cls to define your fixture as attribute of the class.

With pytest you can use fixtures to have a nice delimitation of responsibilities within your test modules, sticking to the Arrange-Act-Assert pattern:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

def test_reading_data(get_some_data):
    assert get_some_data == "get some data"

If the following code works, what about if you want to organize your tests functions within classes? Naively you would assume the following to be a fair implement:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

class TestDummy(unittest.TestCase):

    def test_dummy(self, get_some_data):
        assert get_some_data() == "get some data"

Running poetry run pytest -vvvs tests/path/to/test_module.py will return the following error in the traceback:

E       TypeError: TestDummy.test_dummy() missing 1 required positional argument: 'get_some_data'

In order to use python fixture within a class, you need to edit the above snippet for the following as you cannot call fixtures directly:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

class TestDummy(unittest.TestCase):

    @pytest.fixture(autouse=True)
    def _get_some_data(self, get_some_data):
        self.get_some_data = get_some_data

    def test_dummy(self):
        assert self.get_some_data == "get some data"

Note that _get_some_data will be called once per test by default which is inconvenient if you have to perform request through the network e.g. requests.get("https://www.google.com"). You can change this behaviour by adapting the scope:

@pytest.fixture(scope="module")
def get_some_data():
    yield "get some data"

@pytest.fixture(scope="class")
def define_get_data_attribute(request, get_some_data):
    request.cls._get_some_data = get_some_data

@pytest.mark.usefixtures("define_get_data_attribute")
class TestDummy(unittest.TestCase):

    def test_dummy(self):
        assert self._get_some_data == "get some data"

Note that the request object gives access to the requesting test context such as the cls attribute. More here.