sftp basic CLI commands survival guide

SFTP cheat sheet

> sftp <username>@<hostname>
sftp> lcd <change-the-local-working-directory-by-this-one>
sftp> cd <change-the-remote-working-directory-by-this-one>
sftp> get <copy-this-remote-file-into-the-local-working-directory>
sftp> put <copy-this-local-file-into-the-remote-working-directory>
sftp> exit

Note: for the put and get commands, you can add the -r flag should you download/upload directories (r standing for recursive).

Real Business Case Example – Missing data need to be reloaded on SFTP server.

You have csv files uploaded on a SFTP server at a daily rate. Those csv files are sent by an external provider (contain results from a marketing campaign).

Once a file is uploaded into the SFTP server, it is picked up by a function listening to upload events (e.g. GCP Cloud or AWS Lambda function).

When the function is triggered (after an upload event), it looks for the uploaded file matching the following pattern: feedback_YYYY_MM_DD.csv.

Then the function parses the matching file and aggregates the content into a table on your data warehouse.

You got a complain from your stakeholders that they have missing data for the 2022-11-13:

SELECT PARTITIONTIME as pt, COUNT(*) as nb_rows
FROM `datapool.marketing_campaign.feedbacks_report`
WHERE DATE(PARTITIONTIME) >= "2022-11-11"
GROUP BY PARTITIONTIME
ORDER BY PARTITIONTIME ASC

which returns the following:

pt nb_rows
2022-11-11 247
2022-11-12 308

Connecting on the sftp server (or direclty looking into the bucket it reflects – or even better – looking into the logs), you notice that there is a tipo in the filename for that missing day. So even if the upload event was detected, the file has not been parsed and ingested by the function (failed to match the defined pattern).

You need to download this file on your local machine, change the name and upload it back on the SFTP server so the ingestion process is processed again by the function:

> sftp obenard@olivierbenard.fr
obenard@olivierbenard.fr's password:
sftp> cd inputs_folder/
sftp> ls -l
feedback_2022_11_11.csv  feedback_2022_11_12.csv  fedback_2022_11_13.csv
sftp> lcd ~/Downloads/
sftp> get fedback_2022_11_13.csv

-- from your local machine, go to ~/Downloads/ and edit "fedback" for "feedback".

sftp> put feedback_2022_11_13.csv
sftp> ls -l
feedback_2022_11_11.csv  feedback_2022_11_12.csv  fedback_2022_11_13.csv  feedback_2022_11_13.csv
sftp> exit

Note: put feedback_2022_11_13.csv and not put ~/Downloads/feedback_2022_11_13.csv because your are already looking into the ~Downloads/ folder due to the lcd command you have executed before.

pt nb_rows
2022-11-11 247
2022-11-12 308
2022-11-13 296

The data is now there! ✅

Notes: Editing the name cannot be done directly from the sftp environement, e.g. using mv <old-name> <new-name>, because the function need an upload event to be triggered.

SFTP Basic File Transfer Commands in detail

  • sftp command connects you to the sftp server:

    > sftp <username>@<hostname>
    
  • exit command closes the connection:

    sftp> exit
    
  • lcd changes your Local Current working Directory on the local machine (remote file will be downloaded there):

    lcd <local-directory>
    
  • cd changes your Current working Directory on the remote sftp machine:

    sftp> cd <your-directory>
    
  • put command copies the file from the local machine to the remote sftp machine:

    sftp> put <local-file>
    
  • get command copy the file from the remote machine to the local machine:

    sftp> get <remote-file>
    

What is an SFTP server for

An SFTP server allows users to use an SFTP client – like the FileZilla software – to connect to the bucket but without directly connecting to the bucket itself. The SFTP server being used as primary interface then as relay to pass on requests on the bucket it is associated with. That way, any users using the client can query the bucket and perform operations such as retrieving, listing or consulting the files stored into that bucket.

Advantages are: the SFTP client provides a more “user-friendly” interface and you do not allow users to directly connecting into the bucket, but rather as its additional interface.

When not to use an SFTP server

On Google Cloud Platform, opening a new SFTP server is not the way to go: if set-up correctly, you do not need SFTP. We can give their google service account permissions to upload in the bucket (requires a google account though). The user can then use the gcloud command line tool to upload/download directly from the bucket.

Leave a Reply

Your email address will not be published. Required fields are marked *