jq linux command for JSON processing

Recently I came across a very nice tool to process JSON data from your terminal: jq.

jq is a nice tool to use in your command lines to filter and process JSON streams that you can link directly in your CLI pipes.

Installation

zsh> brew install jq # MacOS
zsh> jq --version
jq-1.6

 Examples

jq can be used to prettify JSON inputs:

zsh> echo '{"fruits": ["apple", "peer", "banana"]}' | jq "."
{
"fruits": [
    "apple",
    "peer",
    "banana"
]
}

It can also read directly from files:

zsh> cat >frenchmen.json<<EOF
heredoc> {"frenchmen": [{"name": "Napoleon"}, {"name": "Degaulle"}, {"name": "Exupery"}]}
heredoc> EOF

zsh> jq "." frenchmen.json
{
"frenchmen": [
    {
    "name": "Napoleon"
    },
    {
    "name": "Degaulle"
    },
    {
    "name": "Exupery"
    }
]
}

You can access specific properties e.g. “name”:

zsh> jq '.frenchmen | .[].name' frenchmen.json
"Napoleon"
"Degaulle"
"Exupery"

You can slice through arrays:

zsh> echo '[1,2,3,4,5,6]' | jq ".[2:-1]"
[
    3,
    4,
    5
]

You can also use functions:

zsh> echo '["blue", "white", "red"]' | jq '.[] | length'
4
5
3

zsh> echo '["blue", "white", "red"]' | jq '[.[] | length] | max'
5

zsh> echo '[1,2,3,4,5,6]' | jq '.[] | select(.%2==0)'
2
4
6

A more extensive list can be found in this article: https://www.baeldung.com/linux/jq-command-json

Defining and using terraform modules

What is a terraform module?

A terraform module is like a black box. When called, it performs the logic you have encapsulated in it.

Variables can be used as inputs to parametrized the terraform configuration being executed within the black box.

When to use a terraform module?

A terraform module becomes handy when you have to set up the exact same resources multiple times across multiple configuration files. You can see it as a factorization method that prevents you from writing the exact same configuration block every time. Instead, you have one block that you can refer to, passing in some input variables and that can perform the action you want with the provided parameters.

Writing local files without using a module

Let’s say you have to provision your infrastructure, always having to write local files in it. Writing a local file in terraform is dead simple: it only takes a filename and a content as parameters. Terraform even maintains a local_file resource block in their Terraform Registry.

If you need to write two local files, you could add the following lines to a main.tf file:

resource "local_file" "my_file_1" {
    content = "This is the content of file 1."
    filename = "files/my_file_1.txt"
}

resource "local_file" "my_file_2" {
    content = "This is the content of file 2."
    filename = "files/my_file_2.txt"
}

Then run:

zsh> terraform init
zsh> terraform apply
zsh> tree .
.
├── files
│   ├── my_file_1.txt
│   └── my_file_2.txt
├── main.tf
└── terraform.tfstate

As you can see, the two files have been created. You can even check the content:

zsh> cat files/my_file_1.txt
This is the content of file 1.

However, because of the redundancy, this is not a good pattern. To be better of we will have to make good use of the terraform modularity. Hence, let’s move on into the next section!

Writing local files using a module

To reduce the code redundancy, you could write a terraform module to write your local files. The logic remains the same as before. The only thing to change is that, instead of taking hard-encoded filename and content parameters, our local_file resource block will read those values from input variables.

Following up on our aforementioned example, we decide to encapsulate the logic for the local_file creation under the tf-module/ folder:

.
├── tf-module
│   ├── main.tf
│   └── variables.tf
└──main.tf

tf-module/variables.tf:

variable filename {
    type = string
    nullable = false
}

variable content {
    type = string
    nullable = false
}

tf-module/main.tf:

resource "local_file" "file" {
    content = var.content
    filename = var.filename
}

./main.tf:

locals {
    filenames_postfix = toset(["1", "2", "3", "4"])
}

module "local_files" {
    for_each = local.filenames_postfix
    source = "./tf-module"
    filename = "files/my_file_${each.value}.txt"
    content = "This is the content of file ${each.value}."
}

Note: for_each only works with sets and maps. Thus, we need to convert of list of strings into a set element. An alternative is given in the following snippet.

variable "filenames_postfix" {
    type = set(string)
    default = ["1", "2", "3", "4"]
}

module "local_files" {
    for_each = var.filenames_postfix
    source = "./tf-module"
    filename = "files/my_file_${each.value}.txt"
    content = "This is the content of file ${each.value}."
}

Note: a local variable is only accessible within the local module i.e. the same local namespace. On the other hand, a terraform variable is globally accessible even though defined at a local terraform module level. You can either pick one or the other according to what suits your design pattern contract best.

You can now apply the terraform code:

zsh> terraform init
zsh> terraform apply
zsh> tree .
.
├── tf-module
│   ├── main.tf
│   └── variables.tf
├── files
│   ├── my_file_1.txt
│   ├── my_file_2.txt
│   ├── my_file_3.txt
│   └── my_file_4.txt
├── main.tf
└── terraform.tfstate

Using a remote terraform module

The source attribute of the local_files block module allows you to pinpoint at the module you want to refer to. So far, we have only used a local reference (./tf-module in our case) but it is also possible to finger point at a remote terraform module, for instance stored on a remote Gitlab or Github repository.

In our case, we stored our re-usable terraform modules in a Github repository, publicly accessible at github.com/olivierbenard/terraform-modules. Then, wherever we are, we can refer to it and use those module in our local terraform projects:

module "local_files" {
    source = "git::ssh://git@github.com/olivierbenard/terraform-modules.git//tf-local-file?ref=master"
    for_each = toset(["1", "2", "3"])
    filename = "files/my_file_${each.value}.txt"
    content = "This is the content of file ${each.value}."
}

Real-Case Example: Provisioning Secret Variables on GCP

So far we have played around, creating local files. However, in real-life, we might have few but little use of them. A more realistic use of terraform modules might be for instance for provisioning and storing sensitive data (e.g. password, variables…) on Google Cloud Platform Secret Manager using terraform as Infrastructure as Code (IaC).

Those credentials and variables can then be access by other processes, e.g. Airflow DAGs running on Google Cloud Cloud Composer using airflow.models.Variable.

from airflow.models import Variable
RETRIEVED_VARIABLE = "{{var.value.my_variable}}"

Notes:

  1. In the above snippet we have used a Jinja template to retrieve the variable stored on Google Secret Manager. The value is only gonna get replaced at run time. If you need to access the variable during build time, you need to use Variable.get("my_variable") instead.

  2. Your Airflow DAGs can only access the variables stored on Google Secret Manager if you have configured Cloud Composer to do so. More on the official documentation. An important remarque is that, to be visible by your DAGs, the variables and connections stored on Google Cloud Secret Manager need to match the following template: airflow-variables-<your_variable> and airflow-connections-<your_connection>.

The terraform module wrapping up the logic to provision secrets on Google Cloud Platform (GCP) is available at: github.com/olivierbenard/terraform-modules/tf-airflow-variable.

You can re-use it, using similar blocks of codes:

module "airflow_variable" {
    source = "git::ssh://git@github.com/olivierbenard/terraform-modules.git//tf-airflow-variable?ref=master"
    for_each = fileset("${var.root_directory}/path/to/af/vars/files/${var.env}", "*")
    airflow_variable_name = each.value
    airflow_variable_value = file("${var.root_directory}/path/to/af/vars/files/${var.env}/${each.value}")
    airflow_variable_location = "europe-west3"
}

Notes:

  1. var.root_directory can be defined by terragrunt to be equal to abspath(get_terragrunt_dir())

  2. var.env can be defined by terragrunt to be equal to read_terragrunt_config("env.hcl").locals.environment

pyenv python version manager

Why using pyenv?

 System Python

By default python comes pre-installed within your operating system.

If you are a Mac or Linux user, you can see the “System Python” that comes installed on your operating system:

zsh> which python
/usr/bin/python

Note: this version of python is available to all users (as reflected by its location).

However, this might not be the version you need:

zsh> /usr/local/bin/python3 --version
Python 3.6.8

Another problem is that by running sudo pip install <your-package>, you will be installing the Python package globally. What about if another needs another version of the package e.g. a slightly older version of the package or if two projects requires two different versions because of breaking changes introduced in the newer version?

Last but not the least, some operating system relies heavily on Python to perform operations. Installing a new version of Python could seriously dampen your ability to use your OS.

Pyenv

The logical place to look for to solve all the problems inherent to System Python is pyenv.

Pyenv is a great tool for managing multiple Python versions that can coexists simultaneously on your OS. You can then easily switch between the installed versions and use virtual environments to manage Python packages associated with each Python versions.

Installation

You need to install the following dependencies:

brew install openssl readline sqlite3 xz zlib

Add them within the PATH (macOS):

echo 'export PATH="/usr/local/opt/openssl@3/bin:$PATH"' >> ~/.zshrc
echo 'export LDFLAGS="-L/usr/local/opt/openssl@3/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/usr/local/opt/openssl@3/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/openssl@3/lib/pkgconfig"' >> ~/.zshrc
echo 'export PATH="/usr/local/opt/sqlite/bin:$PATH"' >> ~/.zshrcc
echo 'export LDFLAGS="-L/usr/local/opt/sqlite/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/usr/local/opt/sqlite/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/sqlite/lib/pkgconfig"' >> ~/.zshrc
echo 'export LDFLAGS="-L/usr/local/opt/zlib/lib"' >> ~/.zshrc
echo 'export CPPFLAGS="-I/usr/local/opt/zlib/include"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/zlib/lib/pkgconfig"' >> ~/.zshrc

Note: Pyenv comes with a set of useful dependencies:

  1. pyenv: The actual pyenv application
  2. pyenv-virtualenv: Plugin for pyenv and virtual environments
  3. pyenv-update: Plugin for updating pyenv
  4. pyenv-doctor: Plugin to verify that pyenv and build dependencies are installed
  5. pyenv-which-ext: Plugin to automatically lookup system commands

Then, install pyenv using the pyenv-installer:

curl https://pyenv.run | bash

Restart the terminal for the PATH changes to be reflected:

exec $SHELL

Finally, check that everything did worked it:

zsh> pyenv -v
pyenv 2.3.19

Uninstall pyenv

On MacOS:

brew remove pyenv

Using pyenv

Install python versions

zsh> pyenv install --list
    3.6.2
    3.6.7
    3.7.2
    3.8.2
    3.9.12
    3.10.4
    3.11-dev
    3.11.4

All the installed version will be located in your pyenv root directory:

zsh> ls ~/.pyenv/versions/
3.10.6  3.11.4  3.6.15  3.6.8   3.6.9   3.8.16  3.8.17  3.9.9

Note: make sure to regularly pyenv update to have access to all the latest python versions.

Uninstall python versions

You can simply remove the versions from the pyenv root folder:

rm -rf ~/.pyenv/versions/3.10.6

or use the provided command:

pyenv uninstall 3.10.6

 Switching between Python versions

You can see the python version you have installed:

zsh> pyenv versions
* system (set by /Users/johndoe/.pyenv/version)
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
3.11.4

Note: the * indicated which version of python is currently active. By default, it is system python. You can confirm is using the which command:

zsh> which python3
/Users/johndoe/.pyenv/shims/python3

pyenv insert itself into the PATH. From the OS’s perspective, pyenv is the executable getting called when you execute which python3. If you want to see the actual, you need to run the following:

zsh> pyenv which python3
/usr/local/bin/python3

zsh> /usr/local/bin/python3 -V
Python 3.6.8

To shift between different versions, you can simply run:

zsh> pyenv global 3.11.4

zsh> python -V
Python 3.11.4

zsh> which python
python: aliased to python3

zsh> pyenv which python
/Users/johndoe/.pyenv/versions/3.11.4/bin/python

zqh> pyenv versions
system
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
* 3.11.4 (set by /Users/johndoe/.pyenv/version)

shell vs. local vs. global vs. system

Use-cases

Let’s explore the different commands and their use-cases.

To ensure that this python version is gonna be used by default:

zsh> pyenv global 3.11.4

To set an application-specific python version:

zsh> pyenv local 3.11.4

The above command creates a .python-version file in the current directory. If pyenv is active in this an environment, the file will automatically activate this version.

To set a shell-specific python version:

zsh> pyenv shell 3.11.4

The above command activates the version specific by setting the `PYENV_VERSION“ environment variable. It overwrites any application or global setting you have made. To deactivate the version, you need to use the –unset flag:

zsh> echo $PYENV_VERSION
3.11.4
zsh> pyenv shell --unset

Resolution

The System Python is overwritten by pyenv global (~/.pyenv/version).

The pyenv global is overwritten by pyenv local (.python-version file).

The pyenv local is overwritten by pyenv shell ($PYENV_VERSION).

Thus, to determine which version of python to use, pyenv will first look for $PYENV_VERSION, then .python-version then ~/.pyenv/version before finally settling down on the Python System if none of the above have been resolved.

Example

zsh> mkdir /tmp/test && cd /tmp/test

zsh> pyenv versions
* system (set by /Users/johndoe/.pyenv/version)
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
3.11.4
zsh> python -V
Python 3.6.8

zsh> pyenv local 3.8.16
zsh> ls -a
.       ..      .python-version
zsh> .python-version
Python 3.8.16
zsh> python -V
Python 3.8.16

zsh> python shell 3.9.9
zsh> echo $PYENV_VERSION
3.9.9
zsh> python -V
Python 3.9.9

And the other way around you can coax it out, layer by layer:

zsh> pyenv shell --unset
zsh> echo $PYENV_VERSION

zsh> python -V
Python 3.8.16

zsh> rm .python-version
zsh> python -V
Python 3.6.8

zsh> pyenv versions
* system (set by /Users/johndoe/.pyenv/version)
3.6.8
3.6.9
3.6.15
3.8.16
3.8.17
3.9.9
3.10.6
3.11.4

Virtual environments and pyenv

To quote this realpython.com article, virtual environments and pyenv are a match made in heaven. Whether you use virtualenv or venv, pyenv plays nicely with either.

You can create virtual environment using the following template:

pyenv virtualenv <python_version> <environment_name>

You can activate your environment running the following:

pyenv local <environment_name>

You can also do it manually:

zsh> pyenv activate <environment_name>
zsh> pyenv deactivate

Environment variables in Docker

To pass environment variables to a container via a Dockerfile, you have 3 main methods:

(1) Using the -e flag:

docker run -e MY_VAR1="foo" -e MY_VAR2="fii" my_docker_image

(2) Using a .env file:

docker run --env-file=.env my_docker_image

(3) Mounting a volume:

docker run -v /path/on/host:/path/in/container my_docker_image

Let’s explore the 3 main methods to pass environment variables to a container via Dockerfile.

To perfectly understand, I pre-baked a short example you can glance over in the pre-requisite section that we will reuse in the following sections.

Pre-requisite

You have the following structure:

    .
├── Dockerfile
└── scripts
│   └── create_file.sh

The `create_file.sh“ bash script contains the following lines:

#!/bin/bash

set -e

cat> dummy.txt <<EOL
here is my first var: $MY_VAR1
and here my second one: $MY_VAR2.
EOL

and the Dockerfile is as follow:

FROM python

WORKDIR /app
COPY . /app

COPY --chmod=755 ./scripts/create_file.sh /app/scripts/create_file.sh

CMD /app/scripts/create_file.sh && cat dummy.txt

Using the -e flag

zsh> docker build -t my_docker_image .

zsh> docker run -e MY_VAR1="foo" -e MY_VAR2="fii" my_docker_image
here is my first var: foo
and here my second one: fii.

Note: the variables are replaced at run time, i.e. in the processes launched by the CMD command. Should you run the aforementioned script in a RUN instruction (i.e. during the build time), the variables would not have been replaced. See the below example:

FROM python

WORKDIR /app
COPY . /app

COPY --chmod=755 ./scripts/create_file.sh /app/scripts/create_file.sh

RUN /app/scripts/create_file.sh

CMD cat dummy.txt

The above image would have rendered the following once executed:

zsh> docker build -t my_docker_image .

zsh> docker run -e MY_VAR1="foo" -e MY_VAR2="fii" my_docker_image
here is my first var:
and here my second one: .

It could be however cumbersome to pass on all your variables in a command line, especially when you have multiple environment variables. It can then become handy to rather use an environment file.

Using an .env file

You can achieve the same results as previously. Simply add a .env file in your root project containing the following lines:

MY_VAR1="foo"
MY_VAR2="fii"

You should now have the following structure:

.
├── Dockerfile
├── scripts
│   └── create_file.sh
└── .env

Then, simply run:

zsh> docker run --env-file=.env  my_docker_image
here is my first var: "foo"
and here my second one: "fii".

Note: You most always want to .gitignore the content of the .env file.

Mounting volumes

Sometimes you want to share files stored on your host system directly with the remote container. This can be useful for instance in the case where you want to share configuration files for a server that you intend to run on a Docker container.

Via this method, you can access directories from the remote.

So, let’s say you have the following architecture:

.
├── Dockerfile
├── conf
│   └── dummy.txt
└── scripts
│   └── create_file.sh

The content of the text file is as follow:

here is my first var: "foo"
and here my second one: "fii".

And the Dockerfile contains the following lines:

FROM python

WORKDIR /app
COPY . /app

CMD cat /conf/dummy.txt

You can therefore see the outcome:

zsh> docker build -t my_docker_image .

zsh> docker run -v /relative/path/project/conf:/conf my_docker_image
here is my first var: "foo"
and here my second one: "fii".

Should you change the content of the dummy.txt file on the host, the outcome would also be changed while running the image in the container without you needing to build the image again:

zsh> docker run -v /relative/path/project/conf:/conf my_docker_image
here is my first var: "fuu"
and here my second one: "faa".

Note: A container is a running instance of an image. Multiple containers can derive from an image. An image is a blueprint, a template for containers, containing all the code, libraries and dependencies.

You should be now ready to go!

MyPy missing imports

When running mypy on your codebase, you might sometimes encounter a similar error:

error: Library stubs not installed for "requests"

You can have a look at the official documentation on how to solve missing imports but the quickest way to solve it is to run the following:

mypy --install-types

You might also stumble across the similar untyped import issue:

module is installed, but missing library stubs or py.typed marker [import-untyped]

In that case, you can just create a mypy.ini file, populated with the following line:

echo "[mypy]\nignore_missing_imports = True" > mypy.ini

poetry local config file

Summary

When working with poetry, you can configure poetry with some specifics using the command line interface e.g.:

poetry config virtualenvs.create true

The recommendation here is to always hard-encode those configurations in a local file. This can be done adding a --local flag in the poetry config command:

poetry config --local virtualenvs.create true

The above command will create the a poetry.toml file at the root of your project, with the following information:

[virtualenvs]
create = true

This ensures that your configuration is always hard-encoded and replicable.

List the current poetry configuration

This can be done via a simple command:

poetry config --list

Use-case: system-git-client true

I rarely have to define poetry configurations, however it helps me to fix the following problem I encounter from time to time:

(1) I have a CI/CD pipeline running on Gitlab;

(2) This pipeline has 3 different stages: quality-checks, build and deploy;

(3) Each of these stages might require python dependencies. These dependencies are managed by poetry. You need the Gitlab CI/CD runner to install them within the virtual environment where the stages are gonna execute their scripts. This means, you need each job to run poetry install. E.g.:

variables:
  GIT_SUBMODULE_STRATEGY: recursive

stages:
  - test
  - build
  - deploy

quality-checks:
  image: "<your-custom-docker-path>/ci-cd-python-test-harness:latest"
  stage: test
  script:
    - <setup-gitlab-ssh-access>
    - <add-safe-git-directories>
    - poetry install
    - make checks

...

Note: here, the last line make checks triggers a Makefile action, running black, mypy, pylint and pytest on the codebase. You can have a look on what this make command looks like in the snippet immediately below:

black:
    poetry run black .

mypy:
    poetry run mypy <your-src-folder>

pylint:
    poetry run pylint <your-src-folder>

test:
    PYTHONWARNINGS=ignore poetry run pytest -vvvs <your-test-folder>

checks: black mypy pylint test

Thus, I have my CI/CD running poetry install before the CI/CD runner to be able to test my code in its virtual environment.

(4) Part of the poetry install will install all the dependencies my project contains, including git submodules. This is exactly where our issue lies!

$ poetry install
Creating virtualenv <your-repository>_-py3.10 in /home/gitlab-runner/.cache/pypoetry/virtualenvs
Installing dependencies from lock file
No git repository was found at ../../<your-submodule-repository>.git
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

In order for your project to be able to use git submodules and the CI/CD to run successfully, you need to run the following command:

poetry config --local experimental.system-git-client true

This creates a poetry.toml file with the following lines:

[experimental]
system-git-client = true

This trick should fix the No git repository was found error occurring in your CI/CD pipeline.

Use Python Fixtures in Classes

TL;DR: use scope, @pytest.mark.usefixtures and request.cls to define your fixture as attribute of the class.

With pytest you can use fixtures to have a nice delimitation of responsibilities within your test modules, sticking to the Arrange-Act-Assert pattern:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

def test_reading_data(get_some_data):
    assert get_some_data == "get some data"

If the following code works, what about if you want to organize your tests functions within classes? Naively you would assume the following to be a fair implement:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

class TestDummy(unittest.TestCase):

    def test_dummy(self, get_some_data):
        assert get_some_data() == "get some data"

Running poetry run pytest -vvvs tests/path/to/test_module.py will return the following error in the traceback:

E       TypeError: TestDummy.test_dummy() missing 1 required positional argument: 'get_some_data'

In order to use python fixture within a class, you need to edit the above snippet for the following as you cannot call fixtures directly:

import pytest

@pytest.fixture()
def get_some_data():
    yield "get some data"

class TestDummy(unittest.TestCase):

    @pytest.fixture(autouse=True)
    def _get_some_data(self, get_some_data):
        self.get_some_data = get_some_data

    def test_dummy(self):
        assert self.get_some_data == "get some data"

Note that _get_some_data will be called once per test by default which is inconvenient if you have to perform request through the network e.g. requests.get("https://www.google.com"). You can change this behaviour by adapting the scope:

@pytest.fixture(scope="module")
def get_some_data():
    yield "get some data"

@pytest.fixture(scope="class")
def define_get_data_attribute(request, get_some_data):
    request.cls._get_some_data = get_some_data

@pytest.mark.usefixtures("define_get_data_attribute")
class TestDummy(unittest.TestCase):

    def test_dummy(self):
        assert self._get_some_data == "get some data"

Note that the request object gives access to the requesting test context such as the cls attribute. More here.

mypy disable error

You can disable mypy error codes using the special comment # type: ignore[code, ...]. E.g.:

def f(): # type: ignore[call-arg]
    pass

To disable multiple codes in one line, simply separate the codes with a comma, e.g.:

def f(): # type: ignore[call-arg, arg-type]
    pass

You can also configure the mypy.ini configuration file directly to ignore specific error codes for the whole codebase. E.g.:

zsh> cat >> mypy.ini <<EOL
heredoc> [mypy]
heredoc> ignore_missing_imports = True
heredoc> EOL

More in this page: mypy.readthedocs.io/en/stable/config_file.html.

Note: it is never a good practice to deactivate the error messages mypy is raising. Always try to work on it. E.g. a too-many-arguments code on a method probably means that you are missing one intermediary method and should refactor your code.

Finally, mypy checks usually come in pair with black, pylint and unitests checks. You can combine error codes by-passing issued by multiple checkers on the same special inline comment:

@decorator() # type: ignore[call-arg, arg-type] # pylint: disable="no-value-for-parameter"
def f(): ...

More on how to disable pylint error checks: olivierbenard.fr/disable-pylint-error-checks.

Note: to know more about mypy error codes, you can visit mypy.readthedocs.io/en/stable/error_codes.html.

What is FinOps

FinOps helps you mitigate Cloud Spends by having (1) one central team of FinOps practitioners and (2) the rest of the engineering teams decentralized.

The one central team of FinOps practitioners is in charge of

  1. providing a central Cloud Monitoring solution to real-time monitor your Cloud Spends with a very thin granularity (team-level) via heavy use of tags and automation;

  2. identifying potential for savings (e.g. idle instances);

  3. negotiating commitments and rates with Cloud Vendors (can help you saving up to 50% of Cloud Spends).

The rest of the engineering teams decentralized are aiming at

  1. optimizing their Cloud usage against the efficiency metrics displayed by the monitoring solution;

  2. challenging and implementing the FinOps practitioners’ recommendations;

  3. re-allocating the spared budget on what truly matter (e.g. hiring new talents).

Cloud Spends = Usage x Rate

The formula for the Cloud Spends is the following: Cloud Spends = Usage x Rate.

With this in mind, the goal is to push cost-related accountability to the edges, having the expert teams managing their Cloud usage and having the central FinOps team gathering the organizational needs and negotiating global commitments out of it.

The main point of FinOps is not about saving costs but rather spending more on what truly matter for the business.

Outrun the common Cloud horror-story

FinOps will help you to retrieve visibility, predictability and budgetisation over Cloud spends and thus, opposes the common Cloud horror story:

  1. Teams start their journey onto the Cloud. They love the scalability, high availability, and increased speed to innovation the tool brings.

  2. Quickly, Cloud bills start skyrocketing out of control due to the lightly governed nature of Cloud spends.

  3. Afraid, the Cloud adventure is brutally dampened after the last of too many wake-up call, leading to (a) a waterfall approach, (b) mixed and sometime conflicting cloud cost reductions strategies and (c) no trust and the Cloud solutions anymore.

Pytest against a wide range of data with Python hypothesis

The Hypothesis Python pytest library allows you to run your python tests against a wild range of data matching a set of hypothesis. In other words, your test function is provided with data matching the setup specifications and runs your Code Under Test (CUT) against it.

It is a nice way to automatically discover edge cases in your code without you even having to think about it.

Let’s go through an example. Let’s say you want to test the following function:

def divide_list_elements(my_list, denominator):
    return [item/denominator for item in my_list]
python> divide_list_elements([2, 4, 6], 2)
[1.0, 2.0, 3.0]

If you are like me, you would have then implemented your test strategy manually, grouped under a class because it is neat:

import unittest

class TestDivideListElements(unittest.TestCase):

    def test_divide_list_elements_one_element(self):
        result = divide_list_elements([42], 2)
        assert result == [21.0]

    def test_divide_list_elements_no_element(self):
        result = divide_list_elements([], 4)
        assert result == []
zsh> poetry run pytest tests/test_hypothesis.py::TestDivideListElements
collected 2 items

tests/test_hypothesis.py::TestDivideListElements::test_divide_list_elements_no_element PASSED
tests/test_hypothesis.py::TestDivideListElements::test_divide_list_elements_one_element PASSED

======================= 2 passed in 0.13s =======================

Well, all good right? We could have stopped there.

Now, let’s say, instead of manually defining your inputs, you let the hypothesis library managing this for you:

from hypothesis import given
from hypothesis import strategies as st

@given(st.lists(st.integers()), st.integers())
def test_divide_list_elements(input_list, input_denominator):
    result = divide_list_elements(input_list, input_denominator)
    expected = list(map(lambda x: x/input_denominator, input_list))
    assert result == expected

Running the test leaves you with an unexpected outcome:

zsh> poetry run pytest tests/test_hypothesis.py
>   return [item/denominator for item in my_list]
E   ZeroDivisionError: division by zero
E   Falsifying example: test_divide_list_elements(
E       input_list=[0],
E       input_denominator=0,
E   )

tests/test_hypothesis.py:17: ZeroDivisionError

You have obviously forgot to check about the division by 0…

Here is what is so beautiful about hypothesis: it can discovers for you edge cases you have forgotten about.

Let’s (1) redact our function:

def divide_list_elements(my_list: list, denominator: int) -> list:
    assert denominator != 0
    return [item/denominator for item in my_list]

(2) change the tests and (3) add the faulty test-case into our testing suit:

import pytest
import unittest
from hypothesis import given, example
from hypothesis import strategies as st


@given(st.lists(st.integers()), st.integers())
@example(input_list=[42], input_denominator=0)
def test_divide_list_elements(input_list, input_denominator):
    if input_denominator == 0:
        with pytest.raises(AssertionError) as exc_info:
            divide_list_elements(input_list, input_denominator)
            expected = "assert 0 != 0"
            assert expected == str(exc_info.value)
    else:
        result = divide_list_elements(input_list, input_denominator)
        expected = list(map(lambda x: x/input_denominator, input_list))
        assert result == expected

(4) run the tests again:

zsh> poetry run pytest -s tests/test_hypothesis.py::test_divide_list_elements
collected 1 item

tests/test_hypothesis.py::test_divide_list_elements PASSED

========================= 1 passed in 0.28s =====================

Notes:

  • The assert denominator != 0 statement ensures our function is given correct preconditions (referring to The Pragmatic Programmer, design by contracts and crash early! “Dead Programs Tell No Lies: A dead program does a lot less damage than a crippled one.“)

  • The @example(input_list=[42], input_denominator=0) statement is using the example decorator, which ensures a specific example is always tested. Here we want to make sure this edge case we missed is always checked.

  • The with pytest.raises(AssertionError) ensures that whatever is in the next block of code should raise an AssertionError exception. If not exception is raised, the test fails.

To learn more about parametrization: Factorize your pytest functions using the parameterized fixture.