What to expect from your manager

It is essential to pick your manager very wisely, a fortiori at the early stages of your corporate life, as a good manager can make a huge difference in your career and be a strong enabler. Your manager should be your number one ally, mentor and best advocate.

Characteristics of a good manager

  • Your manager is familiar with the bureaucracy of the company. They know how to play the game and can get the attention of important people. As a liaison, he knows how to effectively work toward what is expected from the organization and which areas you need to focus on in order to grow expertise and bring satisfaction. He knows what matters most. There are fantastic models to learn from.

  • As a mentor, he can point out and evaluate opportunities, identify, assign and stretch projects where you will learn things that matters for your career. Thus, helping you gathering important knowledge, achievements and skills to help you get that promotion. Via his strong network of peers across the board and companies, he can pinpoint at any experts, mentors, conferences or resources that might be relevant for you to get in touch with.

  • He helps you understand the value of your work even though it is not glamourous at the first glance or provides guidances on how to resolve conflicts occurring within the team. He his trustworthy and reliable.

Note: one thing to keep in mind is that your work should speaks for itself. If your line of work is appreciated by your team i.e. you are a valuable contributor, social driver (bringing people together instead of fueling dissensions), and possess an extreme sense of ownership, automatically, chances are high that your manager will likes you too. Remain yourself. And if you are striving toward expertise (ultimately, truth) and what is commonly seen as good, you cannot do anything wrong. Are you Senior Engineer provides a checklist moving that direction.

Skills matrix

A good manager:

  • Helps you to play the game and navigate the corporate ladder;
  • Points out opportunities and help you focus on what really matters;
  • Knows what is important and maintains a productive environment;
  • Helps you achieving “mastery“;
  • Provides feedbacks early on to help you grow;
  • Schedules regular, predictable and dedicated 1-1s;
  • Do not uses 1-1s as status meetings to discuss about critical projects;
  • Allows vulnerability in front of each other to develop necessary trust;
  • Praises your work in public and keeps the criticisms for your 1-1s;
  • Possess strong communication skills;
  • Advocates your work and uses his network to support you;
  • Possess strong technical foundations giving him credibility;
  • Performs as mentor.

In contrast, a bad manager:

  • Avoids meetings with you, always reschedules or replaces the agenda at the last minute.

  • Micromanages you, questions every details and refuses to let you make any decisions.

  • Assigns without consultation high visibility projects destined to fail from the beginning, shifting responsibilities on you; literally throwing you under the bus to avoid accountability. E.g. “Olivier is happy to assist you” sent to your 3rd-level manager. Yeah, thank you, good luck with that!

  • Gaslights you, presenting false information making you doubt your own memory, perception or sanity e.g. “As discussed in the last meeting…” or “This is not what we have agreed upon“.

  • Do takes part in the office gossips or speaks evil of other employees. E.g. “You will see, John Doe is a low performer” is a no-go (on your first day and targeting a colleague with a cancer!).

Notes:

  • Even if we are not used to received behavioral feedbacks other than from our parents, do not be disoriented. Inevitably, everyone will screw up in some fashion and those are the fastest way to learn and progress.

  • Your manager remains human and strives toward what is best for the company and the team first. He will sometime be stressed, make mistakes, be unfair, harmful or say silly things.

  • Should this be a repetitive pattern though, bring this to his attention. If not possible, I would recommend you to speak to your skip manager, address a note with data points to your HR department or try to change team or company whenever possible. Depending of the circumstances (it is hard to build generalities), you might also wait as those kind of managers – if not supported by a deficient hierarchy – won’t last long. Those job switchers generally only focus on building a portfolio and leaves after 1.5 years.

See also: What are the questions to ask during interviews.

Questions to ask in interviews

Interview is a two-ways street: it enables the future employer to know more about your likelihood to fit into the team; but it is also for you the best opportunity to know more about your future management-style, team and colleagues. It is a rare chance to develop a feeling for the company before you accept (or reject) the job offer. It might either confirm or quash your initial beliefs. Last but not the least, it is also a way for you to give a very nice first impression. Your aim is to show that you are already projecting yourself into the job, striving to be a technical asset and social enabler for those you gonna work with.

Note: if you are looking for red-flags, you will always find some. Sometimes, ignoring what you think might be off is a good strategy. For me, receiving the job offer very shortly after the second interviews (less than a few days) always felt a bit weird but more often led to very pleasant experiences. Of course, there is no way to tell wether or not you have dodged the bullet until you have waited long enough for the trigger to be pulled. The evaluation period is also a way for you to test the water. In the rare occasions where should the trial period be a bummer, you can always resume it. Turn the associated perks to your own benefit and do use it!

Straight to the point, hereafter the questions. Feel free to take from it:

  1. What your typical day is looking like? What are the key milestones of your days and weeks?

  2. What are the main technical and managerial challenges you are currently facing with? What are the solutions your are walking forward?

  3. How the team stays in tune with the current and emerging technologies?

  4. What the onboarding journey will look like? What are the learning paths or processes in place? What is your mentoring process?

  5. Before taking any final decision, would it be possible for me to meet the whole team and the manager?

  6. When could I start?

  7. What can I do to surpass your expectations and be a positive element of your team and organization?

  8. What degree of initiatives one can have within the team? How are you enabling the teams to be self-directed and proactivity?

  9. What is your technical stack? What are the provided working devices? What kind of access rights do people have on their equipments?

  10. What technical debts do you have? How are you coping with it?

  11. How do you bring the team together? What are the biggest concerns shared across the team at the moment?

  12. Who are your main stakeholders?

  13. What is the vision the team is striving for? How the team is stirring toward those goals?

  14. How are you making sure you are keeping track with the road map?

  15. How are you coping with errors and mistakes to occur?

  16. Are there any career milestones and evolution pathways already in place? What the perspectives would look like?

  17. How the scopes, milestones, timelines and deliveries of a project are estimated?

  18. How are you disambiguating ambiguous problem statements to get to the root of problems, incoming requests and situations?

  19. What amount of details should I provide to the manager for him to stay in the loop without drawing him in unnecessary information? What is the satisfactory update frequency one should adopt?

  20. Where do you draw the line, finding the good balance between action and delivery but without over-compromising on quality?

  21. How is your code, legacy and processes documented? What is your estimated coverage?

  22. What are the standards and best practices you have in place to guarantee good codebase quality? How to ensure the reliability of your data pipelines?

  23. Regarding Git and Gitlab, what are your main CI pipelines jobs consist of?

  24. What is your home-office policy? What actions are in place to stimulate the “working together” sentiment?

  25. Beside my mother tongue, I do speak english at a very proficient level. I however ensure to speak german – which I consistently learn since two years with the objective of being perfectly fluent by 2025 – on a minimum daily base. I can so far hold causal conversations. I intend to adopt the mean of communication the team is the most comfortable with. Should it be german, what would you expect from me to ease my integration within the team, quickly close any cultural gaps that might be and promote effective cooperations?

  26. Which data stage are you? E.g. Monolithic on-premises systems or moved already on the Cloud. Reverse ETL in place? What are your observability and DataOps (DevOps and FinOps) strategies?

  27. Is there an explicit agreement (SLA/SLO) between the upstream data source teams and the data engineering team?

  28. Who are your upstream and downstream stakeholders?

  29. How is the data architects/data engineering tandem working? How involved are each parties in the decision-making process?

  30. Is the workloads internal-facing (upright stream from source systems to analytics and ML teams) or external-facing (feedback-loop from the application to the data-pipeline)?

Note: those questions have a purpose. On top of providing useful information for you to make your choice, they are matching the inquiries any Senior Data Engineer might have. Proving at the same time that you have already owned your way in the Senior team. And if you have already those concerns in mind, congratulations, you are a Senior Data Engineer! 🥳

See also: Are you a Senior Data Engineer?

Are you Senior Data Engineer

It is not easy to know when you have reached the specific milestone. Here is a checklist to guide your process and helping you through. Whether you keep it merely informational or strive for the Senior Data Engineer position, the following material might help you to stay on track:

Note: starting from the beginning, you can check the scopes of a Data Engineer in the What is a Data Engineer article.

  1. You can conduct end-to-end projects with no or at least very limited guidance (e.g. to keep track of the legacy in place before you start). You are proactive and enabled.

  2. You have a complete overview on the business (you know what matters most for the company) and own the technical stack (you have the full picture of the toolings in place and can intervene at any steps of the process). You are a source of truth for your peers without being adamant about your viewpoints. You accept other valid solutions. You challenge but also respect code that came before you. There are probably reasons for everything that exists on production (it might even be an unclear business-related thingy people gradually became unaware of).

  3. You are actively involved in the road map, bringing up initiatives, and keeping track of progresses. You are a mini Tech Lead and can support him on demand.

  4. You can effectively communicate with non-technical employees, interpret and deliver on requests with minimal technical information. You are a relatable touch point for stakeholders, project managers and project owners.

  5. You get involve with hiring for your team, leading (technical) interviews and presented technical assessments. You can support the Team Lead on demand to maintain a high bar for hiring quality candidates.

  6. You are accountable for issues and errors to occur and can provide significant support for the team. You are a problem-solver. You do not pass the blame but rather stop the buck.

  7. You have a track record of delivered products, projects and meaningful contributions across the board. You provide scalable solutions for high risk projects without over-engineering. You can stay pragmatic.

  8. You constantly stay in tune with the current and emerging technologies. You share your findings with your peers and build small prototypes.

  9. You can accurately estimate the scopes of your projects, timelines and deliver on the commitments you made. You can make your work measurable.

  10. You are disambiguating ambiguous problem statements, constantly asking “why” until you get to the root of the problems and situations.

  11. You maintain a high quality, genuine and trustworthy network even outside your organization or core department. You have strong endorsements to help you navigate and grow in the company. You can pinpoint referrals and recommend precise people for mentorship. You know who works on what, with whom, when and how. You can explain what other people on the team are busy with.

  12. You keep your manager on track in the loop but without drawing him in unnecessary details (e.g. sticking to data points). You can keep people up-to-date in an efficient manner while writing professional emails.

  13. You are good at mentoring. You are the one others refer and come to for guidance and advice. You are reachable and trustworthy. You are involved in multiples projects as consultant, reviewer and mentor. You can provide constructive feedbacks while staying away from politic or office gossiping. Praise or say nothing but never diminish another co-worker.

  14. When working on a project, focus on action and delivery but without over-compromising on quality. You manage to push back if required. You strive for high-quality work (even from others e.g. during PR reviews) but without stretching yourself too thin to be effective. You relentlessly simply code, systems and architectures without overdoing it. You know where the good balance stands. When the incremental cost to develop is too high, you proactively prioritizing fixing the technical debt.

  15. You documentation extensively (e.g. via readme, docstrings or Confluence) the “why” more than the “how” and demand it from others. You are involved in grooming incoming requests and actively manage onboardings or off-boardings or your peers.

Note: This aforementioned list is highly suggestive. It is the result of my personal observations, looking at the Seniors performing at best in the different workplaces I have worked by and scrutinizing what is expected from managers, mentors and C-people. I am every now and then skimming through it to know where I am at. May this help you defining your own agenda and leading effective 1-1s with your manager. All the best 💪🏻

Tools and Extensions for Data Engineering

Here are the tools I am using daily as Data Software Engineer.

Visual Studio Code Extensions

Name Description Purpose
Git Lens Supercharge Git within VSC. See the last author, date and commit message for each lines. I can retrieve the associated ticket in the Jira history (thanks to the existing git commit message conventions) or exchange directly with the original author.
Git Graph View a Git Graph of your repository. Help troubleshooting git operations e.g. traveling back of merging branches.
Code Spell Checker Spelling checker for source code. Avoid grammar mistakes in READMEs and docstrings.
Better TOML Syntax highlighting for toml files. Self-explanatory.
HashiCorp Terraform Syntax highlighting and autocompletion for tf files. Self-explanatory.
HashiCorp HCL Syntax highlighting and autocompletion for terragrunt files. Self-explanatory.

Command Line Interface Extensions

Since I am developing on MacOS, I am using zsh but equivalents also exist for bash.

Name Description Purpose
powerlevel110k Theme for zsh. Highlight the status and branch you are pointing at.
zsh-autosuggestions Suggests commands as you type based on history. Display the existing make commands.

Git Utilities

Regarding git, you can:

Chrome Extensions

Name Description
Ublock Origin Block ads on your web browser.
Json Viewer Json highlighter.
Jira Static Link Copy static Jira link into the clipboard.
Confluence Static Link Copy static Confluence link into the clipboard.

Interpunct Keyboard Shortcuts

The interpunct · can be typed using the following keystrokes:

Operating System Keystroke Combination Keyboard Country ISO Code
Microsoft Windows Alt + 250 or Alt + 0183 n/a
 Apple macOS Option + Shift + 9 generic
 Apple macOS Option + Shift + . NOR/SWE
 Apple macOS Option + . DNK
 Apple macOS Option + Shift + H CAN
 Apple macOS Option + Shift + F FRA

I particularly like to use it when it comes to separate items of the same nature, e.g. in emails:

Dear colleagues,
please find attached the recap of the presentation here.

Discussed topics: billing, costs, replication, active directory sync.
Issued action plan: VERS-0909 · VERS-0601 · VERS-1606

Kind regards,

Olivier Bénard
Data Software Engineer for Data Platform Services
O. Benard GmbH & Co.
github.com/olivierbenard

So, convinced? 😅

What is a Data Engineer

What is a Data Engineer in a nutshell

A Data Engineer is like a gas or oil pipeline operator. He must:

  • oversee the full Cloud Data Warehouse;
  • make data available 24/7 on the platform;
  • move data from A to B;
  • ingest, update, retire or transform data coming from upstream data sources;
  • turn data into by-products and monitor the overall flow.

On top of ingesting, transforming, serving and storing, those tasks mandates strong proficiency in Security, Data Management, Data|FinOps, Data Architecture, Orchestration & Software Engineering.

The main objective is to serve data to the analysts/BI/data science teams + back to the software teams (reverse ETL).

It enables the company to make data-driven decisions (an example of data driven decisions is given at the end).

One big aspect though: a Data Engineer brings the oil to the different teams but is not responsible for consuming it. He is not responsible for turning data into meaningful charts or actionable decisions. We mostly do not bother much about the business-logic behind. Our role is to save data into a place (being the unique source of truth) where it can be consumed by the teams who need it.

Note: A lot of organizations or applicants tend to have a very poor understanding of what is really meant behind the term Data Engineering. They rather see the position as a patchwork of different roles and responsibilities. The profession is indeed quite new – as you can see on Google Trend the interest only exploded in late 2019. It will still needs some time before people (me included) have a full grasp over it.

Instead, Data Engineer is a technical job that requires you to be proficient in writing code (mainly Python and Java). Therefore, you need to have strong Software Engineering skills. Developers (more than Data Scientists or Data Analysts) are in turn highly valued. That is why I prefer to call it Data Software Engineer to remove any ambiguity.

The different missions

  • Build, Orchestrate and Monitor ELT pipelines (using Airflow & Google Cloud Composer).
  • Manage data infrastructures and services in the Cloud (e.g. tables, views, datasets, projects, storage, access rights, network).
  • Ingest sources from external databases (using Python, Docker, Kubernetes & Change Data Capture tools)
  • Develop REST API clients (facebook, snapchat, jira, rocketchat, gitlab, external providers)
  • Publish open-source projects e.g. on github.com/e-breuninger such as Python libraries, Bots or Google Chrome Extensions (ok, this one is rather specific to my job)
  • Lead workshops and interviews (e.g. BigQuery Introduction, Code Standardization & Best Practices)

Keep in mind: Data Engineers are Data Paddlers, not Data Keepers.

If the source data is corrupted, correcting or improving data quality is out of scope. You can see us as an incorruptible blind-folded carrier, moving the baggage assigned to us from A to B, without looking into it throughout the journey. You don’t want us to start opening the bags and fold your different shirts as it should be.

See Should Data Engineers work closely with the business logic?

Tools used by a Data Engineer

At least, those I am currently working with on a daily base:

  • Airflow to manage your ETL pipelines.
  • Google Cloud Platform as Cloud Provider.
  • Terraform as Infrastructure as Code solution.
  • Python, Bash, Docker, Kubernetes to build feeds and snapshots readers.
  • SQL as Data Manipulation Language (DML) to query data.
  • Git as versioning tool.

The main challenges in the job

Based on my own experience, my biggest challenges at the moment are:

  • Improve existing pipelines reliability (pushing for more tests, ISO-standardization, data validation & monitoring)
  • Get rid of the technical debt (moving toward 100% automation, documentation and infrastructure as code coverage)
  • Keep up with the upcoming technologies (learning new skills, going more in-deep vertically and horizontally)
  • Enforce the software development best practices and standardization principles (via multitude hours of workshops and conferences)
  • Strengthen the international part, actively connecting the teams (via department tours and promoting the use of english as primarily source of communication)

I believe them to be representative in this industry.

Get started as Data Engineer

This will need an article on its own. However, you can get started with the immediate following take-aways:

Technical books

  • Fundamentals of Data Engineering, Reis & Housley, O’REILLY, 2022
  • Google BigQuery: The Definitive Guide, Lakshmanan & Tigani, O’REILLY, 2019
  • Terraform: Up & Running, 3rd Edition, Yevgeniy Brikman, O’REILLY, 2022
  • Learning SQL, 3rd Edition, Alan Beaulieu, O’REILLY, 2020
  • Docker: Up & Running, 3rd Edition, Kane & Matthias, O’REILLY, 2023
  • The Kubernetes Book, Nigel Poulton, Edition 2022

Online courses

  • The Git & Github Bootcamp, Colt Steele on Udemy
  • Apache Airflow: The Hands-On Guide, Marc Lamberti on Udemy
  • Terraform Tutorials, HashiCorp Learn

An example of Data Driven Decisions

Imagine you are the CEO of a bicycle rental company. You have multiple stations across Manhattan. You have the following 3 want-to-know questions:

  • You want to know which stations perform the best and are in high demand so you can anticipate any disruptions ahead, having more technicians standing-by in the area, increase the fleet and anticipate future expansion.
  • On the other hand, you want to retire poorly performing stations, adjusting your implantation to make it fit the market needs more accurately.
  • You want to monitoring the overall usage so you know what are the off-peaks and rush hours, average journey length or most appreciated commute options. You can then adapt the offer accordingly, e.g. offering discounts at specific times of the day/week/month to boost customer acquisition or match better your customers’ needs.

It is part of the Data Engineering journey to consume data coming from the different sources (e.g. bikes stations, bicycles, Open-Weather API, Google Map API etc.) so the marketing and business intelligence teams can solely focus on answering your questions without getting their hands dirty, deep-diving into the data ingestion part.

To conclude, as it is often the case in history, recent jobs have many similarities with sectors that have existed long before them. For instance, the data engineering field and the energy sector share closed similarities (you have to move an expandable from A to B and distribute it to consumers). They simply inherit the lingo. Ideas remain the same but are now applied to different “objects”. Data has replaced oil but the paradigm keeps working.

However, good luck getting your car to run with it! 🏎️💨