Do not return null values

It is always a bad idea to write a method returning a null value because it requires the client to remember to check for null:

  1. it is foisting problems upon the caller methods, postponing conditional checks and creating work for later on that one might forget to implement.
  2. it invites errors; all it takes is one missing none checks to have your application spinning out of control.
  3. if you are still tempted to return none from a method, consider throwing an exception or special-case object instead.

Note: this works well for methods returning iterable types e.g. list, set, strings… as you can just return an empty list. Returning an empty custom-object such as instantiated class is more hairy. In such edge-case only, you can return null.

from config.constants import REGISTERED_ITEMS

def retrieve_registered_item_information(item):
    if item is not None:
        item_id = item.get_id()
        if item_id is not None:
            return REGISTERED_ITEMS.get_item(item_id).get_info()

As a demonstration for our second aforementioned point, did you noticed the fact that there wasn’t a null check in the last line? What about the item not being retrieved among the REGISTERED_ITEMS but you, still trying to access the get_info() method of a None element? You will get an error for sure.

Example

You have the following structured json object you want to extract the id from:

{
    "id": "42",
    "name": "some_name",
    "data": [...]
}
def get_object_id(object: dict) -> str | None:
    candidate_id = None
    try:
        candidate_id = object["id"]
    except KeyError as message:
        logger.error(f"Error retrieving the object id: {message}")
    return candidate_id

The above method is not ideal:

  1. You have a mixed type between str and None. You do not want your method to be schizophrenic but rather it to be type-consistent instead.
  2. Some python versions do not accept type annotations with | operators. Python 3.9+ solves this problem.

Instead, always favour the following accessor method as a nice remedy:

def get_object_id(object: dict) -> str:
    candidate_id = ""
    try:
        candidate_id = object["id"]
    except KeyError as message:
        logger.error(f"Error retrieving the object id: {message}")
    return candidate_id

There are multiple reasons and benefits for that:

  1. You remove the returned type ambiguity and the returned type is consistent. Whatever might happens, you always return a string value.
  2. It removes the type annotation error you might get on the old python versions otherwise.

Caution: Last but not the least, note that python always implicitly returns a None value; wether you add a return statement or not. The three following code snippets are equivalent:

def foo():
    pass

def faa():
    return

def fii():
    return None

You can try it yourself:

python> result = foo()
python> print(result)
None

The advantage of explicitly using a return is that it acts as a break statement:

def fuu():
    return 42
    a = 5
    return a
python> fuu()
42

Notes:

  • We have used a logger object to handle logs for us. More on python logging in this article (in progress).
  • We have prefixed the names of our accessor methods using get. More on how to find meaningful names for your variables in this article (in progress).

As a conclusion, you do not want to rock the boat. Be careful when returning a null value and always favour other options. Your code will be way cleaner and you will minimize the chance of getting an error 🛶

What to expect from your manager

It is essential to pick your manager very wisely, a fortiori at the early stages of your corporate life, as a good manager can make a huge difference in your career and be a strong enabler. Your manager should be your number one ally, mentor and best advocate.

Characteristics of a good manager

  • Your manager is familiar with the bureaucracy of the company. They know how to play the game and can get the attention of important people. As a liaison, he knows how to effectively work toward what is expected from the organization and which areas you need to focus on in order to grow expertise and bring satisfaction. He knows what matters most. There are fantastic models to learn from.

  • As a mentor, he can point out and evaluate opportunities, identify, assign and stretch projects where you will learn things that matters for your career. Thus, helping you gathering important knowledge, achievements and skills to help you get that promotion. Via his strong network of peers across the board and companies, he can pinpoint at any experts, mentors, conferences or resources that might be relevant for you to get in touch with.

  • He helps you understand the value of your work even though it is not glamourous at the first glance or provides guidances on how to resolve conflicts occurring within the team. He his trustworthy and reliable.

Note: one thing to keep in mind is that your work should speaks for itself. If your line of work is appreciated by your team i.e. you are a valuable contributor, social driver (bringing people together instead of fueling dissensions), and possess an extreme sense of ownership, automatically, chances are high that your manager will likes you too. Remain yourself. And if you are striving toward expertise (ultimately, truth) and what is commonly seen as good, you cannot do anything wrong. Are you Senior Engineer provides a checklist moving that direction.

Skills matrix

A good manager:

  • Helps you to play the game and navigate the corporate ladder;
  • Points out opportunities and help you focus on what really matters;
  • Knows what is important and maintains a productive environment;
  • Helps you achieving “mastery“;
  • Provides feedbacks early on to help you grow;
  • Schedules regular, predictable and dedicated 1-1s;
  • Do not uses 1-1s as status meetings to discuss about critical projects;
  • Allows vulnerability in front of each other to develop necessary trust;
  • Praises your work in public and keeps the criticisms for your 1-1s;
  • Possess strong communication skills;
  • Advocates your work and uses his network to support you;
  • Possess strong technical foundations giving him credibility;
  • Performs as mentor.

In contrast, a bad manager:

  • Avoids meetings with you, always reschedules or replaces the agenda at the last minute.

  • Micromanages you, questions every details and refuses to let you make any decisions.

  • Assigns without consultation high visibility projects destined to fail from the beginning, shifting responsibilities on you; literally throwing you under the bus to avoid accountability. E.g. “Olivier is happy to assist you” sent to your 3rd-level manager. Yeah, thank you, good luck with that!

  • Gaslights you, presenting false information making you doubt your own memory, perception or sanity e.g. “As discussed in the last meeting…” or “This is not what we have agreed upon“.

  • Do takes part in the office gossips or speaks evil of other employees. E.g. “You will see, John Doe is a low performer” is a no-go (on your first day and targeting a colleague with a cancer!).

Notes:

  • Even if we are not used to received behavioral feedbacks other than from our parents, do not be disoriented. Inevitably, everyone will screw up in some fashion and those are the fastest way to learn and progress.

  • Your manager remains human and strives toward what is best for the company and the team first. He will sometime be stressed, make mistakes, be unfair, harmful or say silly things.

  • Should this be a repetitive pattern though, bring this to his attention. If not possible, I would recommend you to speak to your skip manager, address a note with data points to your HR department or try to change team or company whenever possible. Depending of the circumstances (it is hard to build generalities), you might also wait as those kind of managers – if not supported by a deficient hierarchy – won’t last long. Those job switchers generally only focus on building a portfolio and leaves after 1.5 years.

See also: What are the questions to ask during interviews.

Questions to ask in interviews

Interview is a two-ways street: it enables the future employer to know more about your likelihood to fit into the team; but it is also for you the best opportunity to know more about your future management-style, team and colleagues. It is a rare chance to develop a feeling for the company before you accept (or reject) the job offer. It might either confirm or quash your initial beliefs. Last but not the least, it is also a way for you to give a very nice first impression. Your aim is to show that you are already projecting yourself into the job, striving to be a technical asset and social enabler for those you gonna work with.

Note: if you are looking for red-flags, you will always find some. Sometimes, ignoring what you think might be off is a good strategy. For me, receiving the job offer very shortly after the second interviews (less than a few days) always felt a bit weird but more often led to very pleasant experiences. Of course, there is no way to tell wether or not you have dodged the bullet until you have waited long enough for the trigger to be pulled. The evaluation period is also a way for you to test the water. In the rare occasions where should the trial period be a bummer, you can always resume it. Turn the associated perks to your own benefit and do use it!

Straight to the point, hereafter the questions. Feel free to take from it:

  1. What your typical day is looking like? What are the key milestones of your days and weeks?

  2. What are the main technical and managerial challenges you are currently facing with? What are the solutions your are walking forward?

  3. How the team stays in tune with the current and emerging technologies?

  4. What the onboarding journey will look like? What are the learning paths or processes in place? What is your mentoring process?

  5. Before taking any final decision, would it be possible for me to meet the whole team and the manager?

  6. When could I start?

  7. What can I do to surpass your expectations and be a positive element of your team and organization?

  8. What degree of initiatives one can have within the team? How are you enabling the teams to be self-directed and proactivity?

  9. What is your technical stack? What are the provided working devices? What kind of access rights do people have on their equipments?

  10. What technical debts do you have? How are you coping with it?

  11. How do you bring the team together? What are the biggest concerns shared across the team at the moment?

  12. Who are your main stakeholders?

  13. What is the vision the team is striving for? How the team is stirring toward those goals?

  14. How are you making sure you are keeping track with the road map?

  15. How are you coping with errors and mistakes to occur?

  16. Are there any career milestones and evolution pathways already in place? What the perspectives would look like?

  17. How the scopes, milestones, timelines and deliveries of a project are estimated?

  18. How are you disambiguating ambiguous problem statements to get to the root of problems, incoming requests and situations?

  19. What amount of details should I provide to the manager for him to stay in the loop without drawing him in unnecessary information? What is the satisfactory update frequency one should adopt?

  20. Where do you draw the line, finding the good balance between action and delivery but without over-compromising on quality?

  21. How is your code, legacy and processes documented? What is your estimated coverage?

  22. What are the standards and best practices you have in place to guarantee good codebase quality? How to ensure the reliability of your data pipelines?

  23. Regarding Git and Gitlab, what are your main CI pipelines jobs consist of?

  24. What is your home-office policy? What actions are in place to stimulate the “working together” sentiment?

  25. Beside my mother tongue, I do speak english at a very proficient level. I however ensure to speak german – which I consistently learn since two years with the objective of being perfectly fluent by 2025 – on a minimum daily base. I can so far hold causal conversations. I intend to adopt the mean of communication the team is the most comfortable with. Should it be german, what would you expect from me to ease my integration within the team, quickly close any cultural gaps that might be and promote effective cooperations?

  26. Which data stage are you? E.g. Monolithic on-premises systems or moved already on the Cloud. Reverse ETL in place? What are your observability and DataOps (DevOps and FinOps) strategies?

  27. Is there an explicit agreement (SLA/SLO) between the upstream data source teams and the data engineering team?

  28. Who are your upstream and downstream stakeholders?

  29. How is the data architects/data engineering tandem working? How involved are each parties in the decision-making process?

  30. Is the workloads internal-facing (upright stream from source systems to analytics and ML teams) or external-facing (feedback-loop from the application to the data-pipeline)?

Note: those questions have a purpose. On top of providing useful information for you to make your choice, they are matching the inquiries any Senior Data Engineer might have. Proving at the same time that you have already owned your way in the Senior team. And if you have already those concerns in mind, congratulations, you are a Senior Data Engineer! 🥳

See also: Are you a Senior Data Engineer?

Are you Senior Data Engineer

It is not easy to know when you have reached the specific milestone. Here is a checklist to guide your process and helping you through. Whether you keep it merely informational or strive for the Senior Data Engineer position, the following material might help you to stay on track:

Note: starting from the beginning, you can check the scopes of a Data Engineer in the What is a Data Engineer article.

  1. You can conduct end-to-end projects with no or at least very limited guidance (e.g. to keep track of the legacy in place before you start). You are proactive and enabled.

  2. You have a complete overview on the business (you know what matters most for the company) and own the technical stack (you have the full picture of the toolings in place and can intervene at any steps of the process). You are a source of truth for your peers without being adamant about your viewpoints. You accept other valid solutions. You challenge but also respect code that came before you. There are probably reasons for everything that exists on production (it might even be an unclear business-related thingy people gradually became unaware of).

  3. You are actively involved in the road map, bringing up initiatives, and keeping track of progresses. You are a mini Tech Lead and can support him on demand.

  4. You can effectively communicate with non-technical employees, interpret and deliver on requests with minimal technical information. You are a relatable touch point for stakeholders, project managers and project owners.

  5. You get involve with hiring for your team, leading (technical) interviews and presented technical assessments. You can support the Team Lead on demand to maintain a high bar for hiring quality candidates.

  6. You are accountable for issues and errors to occur and can provide significant support for the team. You are a problem-solver. You do not pass the blame but rather stop the buck.

  7. You have a track record of delivered products, projects and meaningful contributions across the board. You provide scalable solutions for high risk projects without over-engineering. You can stay pragmatic.

  8. You constantly stay in tune with the current and emerging technologies. You share your findings with your peers and build small prototypes.

  9. You can accurately estimate the scopes of your projects, timelines and deliver on the commitments you made. You can make your work measurable.

  10. You are disambiguating ambiguous problem statements, constantly asking “why” until you get to the root of the problems and situations.

  11. You maintain a high quality, genuine and trustworthy network even outside your organization or core department. You have strong endorsements to help you navigate and grow in the company. You can pinpoint referrals and recommend precise people for mentorship. You know who works on what, with whom, when and how. You can explain what other people on the team are busy with.

  12. You keep your manager on track in the loop but without drawing him in unnecessary details (e.g. sticking to data points). You can keep people up-to-date in an efficient manner while writing professional emails.

  13. You are good at mentoring. You are the one others refer and come to for guidance and advice. You are reachable and trustworthy. You are involved in multiples projects as consultant, reviewer and mentor. You can provide constructive feedbacks while staying away from politic or office gossiping. Praise or say nothing but never diminish another co-worker.

  14. When working on a project, focus on action and delivery but without over-compromising on quality. You manage to push back if required. You strive for high-quality work (even from others e.g. during PR reviews) but without stretching yourself too thin to be effective. You relentlessly simply code, systems and architectures without overdoing it. You know where the good balance stands. When the incremental cost to develop is too high, you proactively prioritizing fixing the technical debt.

  15. You documentation extensively (e.g. via readme, docstrings or Confluence) the “why” more than the “how” and demand it from others. You are involved in grooming incoming requests and actively manage onboardings or off-boardings or your peers.

Note: This aforementioned list is highly suggestive. It is the result of my personal observations, looking at the Seniors performing at best in the different workplaces I have worked by and scrutinizing what is expected from managers, mentors and C-people. I am every now and then skimming through it to know where I am at. May this help you defining your own agenda and leading effective 1-1s with your manager. All the best 💪🏻

Tools and Extensions for Data Engineering

Here are the tools I am using daily as Data Software Engineer.

Visual Studio Code Extensions

Name Description Purpose
Git Lens Supercharge Git within VSC. See the last author, date and commit message for each lines. I can retrieve the associated ticket in the Jira history (thanks to the existing git commit message conventions) or exchange directly with the original author.
Git Graph View a Git Graph of your repository. Help troubleshooting git operations e.g. traveling back of merging branches.
Code Spell Checker Spelling checker for source code. Avoid grammar mistakes in READMEs and docstrings.
Better TOML Syntax highlighting for toml files. Self-explanatory.
HashiCorp Terraform Syntax highlighting and autocompletion for tf files. Self-explanatory.
HashiCorp HCL Syntax highlighting and autocompletion for terragrunt files. Self-explanatory.

Command Line Interface Extensions

Since I am developing on MacOS, I am using zsh but equivalents also exist for bash.

Name Description Purpose
powerlevel110k Theme for zsh. Highlight the status and branch you are pointing at.
zsh-autosuggestions Suggests commands as you type based on history. Display the existing make commands.

Git Utilities

Regarding git, you can:

Chrome Extensions

Name Description
Ublock Origin Block ads on your web browser.
Json Viewer Json highlighter.
Jira Static Link Copy static Jira link into the clipboard.
Confluence Static Link Copy static Confluence link into the clipboard.

Git branch naming convention

When working on a project tracked with git, you will sure do create branches. You have the main branch of course, but then a good practice is also to have one branch per features you are developing. Below what it might look like at then end:

> git branch
* master
  dev/JIRA-1234
  dev/JIRA-5487_add_users_filtering
  dev/add_google_sign_in_authentication_form
  dev/ISSUE-987

Let’s have a closer look on how to name them (even though the above snippet already gives you a hint).

Main branch

The main or master branch is treated as the unique source of truth, the official working code base. This is a place where everything must be working. It is the default branch you come up with when you initialize a new git project.

Notes:

  • Since 2020 and 2022, Github and Gitlab (respectively) renamed their default branch from master to main to avoid language that might be seen as offensive.
  • If you do not want to politicize git and still prefer the old naming convention, you can still rename main for master manually. How to rename git branch explains how to do it.

Feature branches

It is time to add features into our main branch. Since main is the place where everything must be working, you first want to test your changes on a development branch. Usually, you end up with 1 branch per feature. Each of them templated as follow:

dev/JIRA-1234
dev/JIRA-1234_add_users_filtering
dev/add_google_sign_in_authentication_form
dev/ISSUE-987

In a nutshell, the following rules apply:

  • Prefix the non-main branch with dev/. It will makes easier to trigger the dedicated Gitlab CI jobs via branches filtering.
  • If you use an issue tracker like Jira, include the ticket’s ID. Gitlab includes a Jira Integration facility tool e.g. creating links to the Jira ticket on the fly.
  • Beside the ticket ID, stick to the good old snake_case. See also Why you should use Snake Case.
  • If you do not use an issue tracker e.g. for personal projects, simply describe the feature you are implementing.

git checkout vs. git switch

The difference between git checkout and git switch:

  • Both are used to perform actions on branches (create, delete, navigate)
  • git checkout is doing more than git switch
  • git switch has been created to be more simpler

In practice, I always use git switch instead of git checkout. But let’s explore some use cases.

git switch

To switch to an existing branch:

git switch <existing-branch>

To create a new branch and directly jump into it:

git switch -c <new-branch>

Notes:

git checkout

To switch to an existing branch:

git checkout <existing-branch>

To create a new branch and directly jump into it:

git checkout -b <new-branch>

To delete a branch:

git checkout -D <deleted-branch>

Note: in practice you won’t have to delete a branch manually. This will be done automatically after merging the Pull Request (on Github) or Merge Request (on Gitlab).

Last few words

We haven’t explored the full capacities provided by both methods. However, we have seen that both of them provides overlapping logics. To go in-depth, you can dive into the official documentation:

On more thing: we didn’t had the chance to stop and talk about git branch, a functionality allowing us to perform more actions on branches such as renaming a branch This will be done in another chapter (in progress).

Undo git add operation before commit

Simply use:

git restore --staged <file>...

The aforementioned command will unstage the files staged for commit you have accidentally added via git add. However, this relies on you not having committed already. If you have committed but still want to undo the operation, see (in progress).

Example

You have added a couple of files to be staged for commit:

git add .

Following the best practices, you always check the status after such operations:

git status

At this point, you notice that a couple of unwanted files have been added into the list of files to be staged for commit:

On branch dev/VERS-5928
Your branch is up to date with 'origin/dev/VERS-5928'.

Changes to be committed:
(use "git restore --staged <file>..." to unstage)

    modified:   .gitlab-ci.yml
    new file:   .gitmodules
    modified:   Makefile
    modified:   README.md
    new file:   project_foo/__pycache__/__init__.cpython-310.pyc
    new file:   project_foo/__pycache__/main.cpython-310.pyc
    new file:   project_foo/__pycache__/utils.cpython-310.pyc

In our example, you do not want push the last 3 files into the remote repository. The simplest solution is then to undo the git add operation so those files are no longer staged for commit:

git restore --staged project_foo/__pycache__/__init__.cpython-310.pyc

You can then repeat the operation for the last 2 remaining files.

However, even though this manual manoeuver is easily doable in our relative simple case, what about if in place of 3 files, way more files should be deleted?

In such case, the workaround to undo git add operations before commit for multiple files is as follow:

git restore --staged $(git diff --name-only --cached | grep "__pycache__")

Notes:

  • To undo the git add operation after commit, see (in progress).
  • To know more about the git restore command (in progress).
  • The difference between git reset, git restore and git rm (in progress).

What is the difference between random choices and random sample python functions.

The main difference is:

  • The random choices function draws random elements from the sequence, eventually including duplicates since the same element in the sequence can be drawn multiple times (with replacement).

  • The random sample function draws random elements from the sequence but without duplicates since once elements are picked they are removed from the sequence to sample (without replacement).

Imagine a lottery game. In the first case, we are putting back the ball into the drawing lot while on the second case, the ball is definitively removed from the set.

Note: without duplicated does not mean the same value cannot be seen several times in the resulting sampled sequence. If several balls hold the same value in the lot, and these balls are drawn, the occurrence will also be reflected in the result. But the same ball, once drawn, cannot be drawn ever again.

Examples: choices vs. sample

pool = "abcd"
print("".join((random.choices(pool, k=5))))

In the above example we are extracting 5 random elements from the pool of elements to pick from. Once drawn, the value is replaced in the pool so it can eventually be picked up again:

addaa

Note: since you have a replacement, you can extract more elements than the population originally contains. Hence k=5 while the sequence only contains 4 elements.

population = "abcd"
print("".join((random.sample(population, k=4))))

In the aforementioned example, we ask the random function to draw 4 elements from the population without replacement. This means that once the element is picked up, it is removed from the population:

abdc

Note: since you do not have a replacement, you cannot have k to be greater than the length of your sequence. Should you try, you will get a ValueError: Sample larger than population or is negative error raised at you.

Use-case Example: Alphanumeric Generation

To generate a sequence of 32 random alphanumeric values:

population = string.ascii_letters + string.digits
result = "".join((random.choices(population, k=32)))
print(result)
coqHR7HrsCsKcvGvmlClJI1OnWZjvwH9

Notes:

  • It is always a very bad idea to use python’s randomness to generate passwords or other sensitive secrets since the random function is not really random.
  • Worse than that, never write your own random function as it is prone to vulnerabilities. Rather use a public and scientifically proved method (this is the beauty of maths: being capable of generating indecipherable secrets, with the generating method know by all).
  • Even worse: never base the robustness of your encryption protocol on the secrecy of the generation method.

As least those are the (rare) few takeaways I still remember from my Master of Science in Computer Science specialized in Cybersecurity.

And you, what is your score on Root Me? 🏴‍☠️

Change the URL of a Git repository

What you have to do:

  1. On Github/Gitlab, change the url of (remote) repository via the UI.
  2. On your local repo, use git remote set-url origin <new-url> to replicate the changes and thus restore the bridge between the remote and local repo.
  3. Confirm the changes are now reflected, using git remote -v.

Example

You created a Gitlab repo called client-snapchat-api but you made a mistake. The name should be snapchat-api-client instead. Additional complexity: you have cloned the project on your local environment already.

Note: the name convention related to API clients is quite standard in the industry; the -client part always suffixing (and not prefixing) the client’s name i.e. name-of-the-api-client and not client-name-of-the-api.

First step is to perform the modifications on the remote repository; via Gitlab’s settings on the User Interface, you edit to your likings:

  1. The project’s name
  2. The project’s path via the Settings Advanced section.

Notes:

  • It is common for the repo’s URL to respect the kebab-case formatting. See also Why using snake_case to spotlight the differences.
  • It is common for the project’s name and the repo’s URL to be the same.

Second step (since you have already cloned the project on local) is to reflect those new remote settings in your local repository.

This because your local project still uses the old URLs as “origin” (i.e. the source) to track down, fetch (pull) and push the incoming and out-coming changes.

In other terms, your local repo is still linked to the old remote URLs. But those URLs are no longer attached to an existing remote Gitlab project since you have just changed them (depreciated):

> git remote -v
origin git@gitlab.com:obenard/old-project-in-kebab-case.git (fetch)
origin git@gitlab.com:obenard/old-project-in-kebab-case.git (push)

Notes:

  • The -v option is short for --verbose
  • The URLs for fetching and pushing (i.e. where you get the changes from and where you push the changes at) can be different.

The following image may help you: it’s like uprooting a tree and plotting it somewhere else. The leaves being the local repos instantiated by your developers and the root the remote repository (aka the unique source of truth). You need to rejuvenate the thing by reconnecting the leaves to the torso.

In our example, our local repo is pointing at the now depreciated URLs:

> git remote -v
origin git@gitlab.com:obenard/client-snapchat-api.git (fetch)
origin git@gitlab.com:obenard/client-snapchat-api.git (push)

To edit the URLs:

git remote set-url origin git@gitlab.com:obenard/snapchat-api-client.git

To see the result:

> git remote -v
origin git@gitlab.com:obenard/snapchat-api-client.git (fetch)
origin git@gitlab.com:obenard/snapchat-api-client.git (push)

Congratulations, your local project is now again linked with a valid remote Gitlab repository and can send or retrieve information from it.

Run the extra mile: URI, URL and URN

Before getting any further in polishing our 360° overview and have a full grasp over the concept, we need to understand the difference between URIs, URLs and URNs:

  • An URI – Uniform Resource Identifier – is an identifier, like the id primary key in a table or the social security number for a person. It is used to uniquely discriminate a resource e.g. a Gitlab repository (two Gitlab repositories cannot have the name URI since you won’t know which one you are referring to otherwise).

  • An URL – Uniform Resource Locator – is an URI but with the additional specificity of also being a locator. An URL allows you to locate and access a resource on the Internet e.g. a web page like https://olivierbenard.fr/how-to-create-git-aliases/. This page is unique across the Internet: you can not find the same URL anywhere else, and also, the URL is associated with a page you can navigate through.

  • Every URLs are URIs but there are URIs which are not URLs. For instance, URN – Uniform Resource Name – uniquely identifies a resource by a name in a particular namespace. It is a nice way to talk about a resource but without implying anything about its location or how to access it. URNs are intended to be unique across time and space e.g. the ISBN – International Standard Book Number – is a unique worldwide book identifier.

Note: speaking about ISBNs, you may find book recommendations to get started as Data Engineer in the related article What is a Data Engineer.

Now that the semantic is set, you will notice that I have lied to you 😱

I have indeed implied that we will have to change the URLs of a Git repository. This was an incorrect statement.

What we have done instead with the git remote set-url origin <new-uri> command was to change the URIs, not the URLs:

> git remote -v
origin  git@github.com:olivierbenard/olivierbenard.git (fetch)
origin  git@github.com:olivierbenard/olivierbenard.git (push)

This makes sense: if you have a closer look, you will notice that the git@github.com:olivierbenard/olivierbenard.git thingy does not lead anywhere. For good reason: it is a pure URI!

And for sure, the repo is located by the following https://github.com/olivierbenard URL.

Notes:

  • To be even more specific, the command is changing the URIs and then, Gitlab takes over, changing the URLs in the background.
  • The same result as with git remote set-url origin <new-uri> can be obtained directly by editing the ~/local-git-repo/.git/config file:

    [remote "origin"]
        url = git@github.com:olivierbenard/olivierbenard.git
        fetch = +refs/heads/*:refs/remotes/origin/*
    [branch "master"]
            remote = origin
            merge = refs/heads/master
    

And you, what’s your excuse for not having a green thumb? 🌱