SQL technical interview

Leaning Objectives

  • Build-up a general idea of what a SQL technical job interview is like;

  • Learn a handful of SQL best practices;

  • Get familiar with sqlite3 Python’s library.

Business-case

As a phone provider company, you want to know whom – among your list of customers – spent a total accumulated time of 10 or more minutes on the phone.

You can imagine that this list of identified top customers can be later used for marketing purposes.

This list of top customers should contain only the names, alphabetically sorted.

Hence, you have two tables:

  1. The list of customers;

  2. The calls history.

The table customers contains the following information:

customer_id name phone
1 Tim 1234
2 John 5678
3 Mona 9101

The table calls contains the following information:

id caller callee duration
1 1234 5678 4
2 1234 9101 2
3 5678 1234 5
4 5678 9101 7

On this simple test-case dataset, the returned table should be:

name
John
Tim

Solution

The steps are the following:

  1. For each caller – compute the total duration of their calls;

  2. For each callee – compute the total duration of their calls;

  3. A caller can be at other times a callee and the other way around. Thus, both tables must be merged into one using the phone number as identifier;

  4. The total duration – as a caller and as a callee – must be summed up;

  5. The filter can be applied to only retain the values for which the total duration is superior or equal to 10 minutes;

  6. Last but not the least, in place of the phone number, we map with the associated name and sort the result alphabetically.

Local implementation

Setting sqlite3

For you to be able to work with the data itself, you can configure your own local SQL database, store the dataset and query the server with your own custom SQL statements.

Because it is easy to set up, I am using sqlite3 – a Python library.

The first step is to import the library. It is natively built-in along with your python’s distribution:

import sqlite3

Then, create a local file named database.db where to store your tables. The command returns a connection object:

connection = sqlite3.connect("database.db")

Create a cursor object. You can see the cursor like a mouse tracker that allow you to target specific actions on the database based on where the focus is actually on:

cursor = connection.cursor()

Using the cursor, you can create our two tables customers and calls:

cursor.execute(
  "CREATE TABLE IF NOT EXISTS customers(customer_id, name, phone)"
)
cursor.execute(
  "CREATE TABLE IF NOT EXISTS calls(id, caller, callee, duration)"
)

Then, prepare the data to be injected within our tables:

sql_query_customers = """
INSERT INTO
    customers (customer_id, name, phone)
VALUES
    (1, "Tim", "1234"),
    (2, "John", "5678"),
    (3, "Mona", "9101");
"""

sql_query_calls = """
INSERT INTO
    calls (id, caller, callee, duration)
VALUES
    (1, "1234", "5678", 4),
    (2, "1234", "9101", 2),
    (3, "5678", "1234", 5),
    (4, "5678", "9101", 7);
"""

Execute the queries:

cursor.execute(sql_query_customers)
cursor.execute(sql_query_calls)

Do not forget the save the operation:

connection.commit()

Test that the data is actually there:

result = cursor.execute("SELECT * FROM customers")
print(result.fetchall())

This should return the following output:

[
    (1, 'Tim', '1234'),
    (2, 'John', '5678'),
    (3, 'Mona', '9101')
]

Do not forget to close the connection:

connection.close()

Now that our environment is setup, let’s work toward the solution.

Building the intermediate tables

The first step is to find all the durations, grouped by caller for each callers:

WITH caller_durations AS (
    SELECT
      caller AS phone_number,
      SUM(duration) AS duration
  FROM calls
  GROUP BY caller
)
SELECT * FROM caller_durations;

This gives us the following caller_durations table:

phone_number duration
1234 6
5678 12

Interpretation:

  • The customer associated with the phone number 1234 spent 6 minutes as a “caller”;

  • The customer associated with the phone number 5678 spent 12 minutes as a “caller”;

  • The customer associated with the phone number 9101 never called.

We do the same for the callee:

WITH callee_durations AS (
  SELECT
    callee AS phone_number,
    SUM(duration) AS duration
  FROM calls
  GROUP BY callee
)
SELECT * FROM callee_durations;

Which gives us the following callee_durations table:

phone_number duration
1234 5
5678 4
9101 9

Interpretation:

  • The phone number 1234 received a total of 5 minutes call;

  • The phone number 5678 received a total of 4 minutes call;

  • The phone number 9101 received a total of 9 minutes call.

Next step is to combined both tables using a JOIN statement:

WITH joined_duration AS (
  SELECT
    caller_durations.phone_number AS phone_number_caller,
    caller_durations.duration AS duration_as_caller,
    callee_durations.phone_number AS phone_number_callee,
    callee_durations.duration AS duration_as_callee
  FROM caller_durations
  FULL JOIN callee_durations
  ON caller_durations.phone_number=callee_durations.phone_number
)
SELECT * FROM joined_durations;
phone_number_caller duration_as_caller phone_number_callee duration_as_callee
1234 6 1234 5
5678 12 5678 4
None None 9101 9

Interpretation:

  • The phone number “1234” placed a total number of 6 minutes call as a caller and 5 as a callee;

  • The phone number “9101” never placed a phone call but received a couple of incoming phone calls for a total duration of 9 minutes;

  • The columns phone_number_caller and phone_number_callee should always refer to the same phone number.

We can reduce the above table in something more “readable”:

WITH trimmed_durations AS (
  SELECT
    COALESCE(phone_number_caller, phone_number_callee) AS phone,
    COALESCE(duration_as_caller, 0) AS duration_as_caller,
    COALESCE(duration_as_callee, 0) AS duration_as_callee
  FROM joined_durations
)
SELECT * FROM trimmed_durations;
phone duration_as_caller duration_as_callee
1234 6 5
5678 12 4
9101 0 9

Note: the COALESCE function allows to replace the value by another one should the initial value be NULL.

Now, we can add the durations to have the total_duration i.e. the total number of minute the customer was one the phone:

WITH total_durations AS (
  SELECT
    phone AS phone,
    duration_as_caller + duration_as_callee AS total_durations
  FROM trimmed_durations
)
SELECT * FROM total_durations;
phone durations
1234 11
5678 16
9101 9

It is now time to filter to only consider the durations >= 10:

WITH top_users AS (
  SELECT
    phone AS phone
  FROM total_durations
  WHERE total_durations >= 10
)
SELECT * FROM top_users;
phone
1234
5678

However, the phone numbers must be replaced by the associated names. Thus, a second JOIN is needed to map the values with the customers table:

WITH final_table AS (
  SELECT
    customers.name AS name
  FROM top_users
  LEFT JOIN customers
  ON top_users.phone=customers.phone
  ORDER BY name ASC
)
SELECT * FROM final_table;

Which gives us the expected result:

name
John
Tim