SFTP cheat sheet
> sftp <username>@<hostname>
sftp> lcd <change-the-local-working-directory-by-this-one>
sftp> cd <change-the-remote-working-directory-by-this-one>
sftp> get <copy-this-remote-file-into-the-local-working-directory>
sftp> put <copy-this-local-file-into-the-remote-working-directory>
sftp> exit
Note: for the put
and get
commands, you can add the -r
flag should you download/upload directories (r standing for recursive).
Real Business Case Example – Missing data need to be reloaded on SFTP server.
You have csv files uploaded on a SFTP server at a daily rate. Those csv files are sent by an external provider (contain results from a marketing campaign).
Once a file is uploaded into the SFTP server, it is picked up by a function listening to upload events (e.g. GCP Cloud or AWS Lambda function).
When the function is triggered (after an upload event), it looks for the uploaded file matching the following pattern: feedback_YYYY_MM_DD.csv
.
Then the function parses the matching file and aggregates the content into a table on your data warehouse.
You got a complain from your stakeholders that they have missing data for the 2022-11-13:
SELECT PARTITIONTIME as pt, COUNT(*) as nb_rows
FROM `datapool.marketing_campaign.feedbacks_report`
WHERE DATE(PARTITIONTIME) >= "2022-11-11"
GROUP BY PARTITIONTIME
ORDER BY PARTITIONTIME ASC
which returns the following:
pt |
nb_rows |
2022-11-11 |
247 |
2022-11-12 |
308 |
Connecting on the sftp server (or direclty looking into the bucket it reflects – or even better – looking into the logs), you notice that there is a tipo in the filename for that missing day. So even if the upload event was detected, the file has not been parsed and ingested by the function (failed to match the defined pattern).
You need to download this file on your local machine, change the name and upload it back on the SFTP server so the ingestion process is processed again by the function:
> sftp obenard@olivierbenard.fr
obenard@olivierbenard.fr's password:
sftp> cd inputs_folder/
sftp> ls -l
feedback_2022_11_11.csv feedback_2022_11_12.csv fedback_2022_11_13.csv
sftp> lcd ~/Downloads/
sftp> get fedback_2022_11_13.csv
-- from your local machine, go to ~/Downloads/ and edit "fedback" for "feedback".
sftp> put feedback_2022_11_13.csv
sftp> ls -l
feedback_2022_11_11.csv feedback_2022_11_12.csv fedback_2022_11_13.csv feedback_2022_11_13.csv
sftp> exit
Note: put feedback_2022_11_13.csv
and not put ~/Downloads/feedback_2022_11_13.csv
because your are already looking into the ~Downloads/ folder due to the lcd
command you have executed before.
pt |
nb_rows |
2022-11-11 |
247 |
2022-11-12 |
308 |
2022-11-13 |
296 |
The data is now there! ✅
Notes: Editing the name cannot be done directly from the sftp environement, e.g. using mv <old-name> <new-name>
, because the function need an upload event to be triggered.
SFTP Basic File Transfer Commands in detail
-
sftp
command connects you to the sftp server:
> sftp <username>@<hostname>
-
exit
command closes the connection:
sftp> exit
-
lcd
changes your Local Current working Directory on the local machine (remote file will be downloaded there):
lcd <local-directory>
-
cd
changes your Current working Directory on the remote sftp machine:
sftp> cd <your-directory>
-
put
command copies the file from the local machine to the remote sftp machine:
sftp> put <local-file>
-
get
command copy the file from the remote machine to the local machine:
sftp> get <remote-file>
What is an SFTP server for
An SFTP server allows users to use an SFTP client – like the FileZilla software – to connect to the bucket but without directly connecting to the bucket itself. The SFTP server being used as primary interface then as relay to pass on requests on the bucket it is associated with. That way, any users using the client can query the bucket and perform operations such as retrieving, listing or consulting the files stored into that bucket.
Advantages are: the SFTP client provides a more “user-friendly” interface and you do not allow users to directly connecting into the bucket, but rather as its additional interface.
When not to use an SFTP server
On Google Cloud Platform, opening a new SFTP server is not the way to go: if set-up correctly, you do not need SFTP. We can give their google service account permissions to upload in the bucket (requires a google account though). The user can then use the gcloud command line tool to upload/download directly from the bucket.