PostgreSQL Database Archiving & Backup w/ Wal-G to S3

4 min readMar 5, 2024

Before diving in give Wal-g a star on GitHub, because it’s an awesome tool we will use today, to keep our database backups.

I’m assuming you’ve already set up the Postgres database & familiar with a terminal text editor like Vim or Nano.

Installing

Go to war-g’s releases tab in their GitHub repository, find the latest stable release (v2.0.1 as of wiring this article), and find the distribution for your database & os with the right architecture.

for me it was wal-g-pg-ubuntu-18.04-amd64.tar.gz

Open your terminal

curl -LJO https://github.com/user/repository/releases/download/version/file.zip

-L: Follow redirects.
-J: Save the file with the suggested filename from the URL (if the server provides one).
-O: Save the file with the original name.

and replace the URL with the chosen release file above and download the file, it will save it to the working directory.

Unpack the executable & save it to your env. list

tar -zxvf wal-g-DBNAME-OSNAME-amd64.tar.gz
mv wal-g-DBNAME-OSNAME-amd64 /usr/local/bin/wal-g

Configuring Wal-g

By default Wal-g configuration in $HOME/.walg.json, but we are going to write a new configuration file $HOME/.walg.env as follows.

You need to create an AWS IAM user that has full access to S3, after that, you can obtain the access key and secret key that you need in the configuration.

While you’re in the AWS Console, also get your s3:// URI to your desired destination, you can find it in the properties tab of your s3 folder

# .walg.env

# PostgreSQL connection parameters
PGHOST=localhost
PGPORT=5432
PGUSER=your_username
PGPASSWORD=your_password
PGDATABASE=your_database

# AWS S3 settings
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
WALG_S3_PREFIX=s3://your-s3-bucket/path/to/backups
WALG_COMPRESSION_METHOD=brotli

Running PostgreSQL in Archive Mode

run this command to find the path of your config file and open that postgresql.conf file in an editor.

psql -U your_username -h localhost -p 5432 -c "SHOW config_file;"

If you haven’t modified the original config file you should have this line commented with off, change that to on, and uncomment the line.

also, add this archive_command for Postgres to use wal-g

archive_mode = on
archive_command = 'wal-g wal-push %p'

Remember to restart your PostgreSQL server after making changes to the postgresql.conf file for the changes to take effect:

sudo systemctl restart postgresql   # For systemd-based systems
sudo service postgresql restart      # For systems using the init system

Perform a test backup

run this command as a Postgres user, that is a Linux user which has permissions to the Postgresql data directory.

/usr/local/bin/wal-g backup-push --config $HOME/.walg.env /var/lib/postgresql/${PG_VERSION}/main

If you are getting logs that look like these

INFO: 2024/03/05 16:02:24.878579 Calling pg_start_backup()
INFO: 2024/03/05 16:02:25.007969 Starting a new tar bundle
INFO: 2024/03/05 16:02:25.008007 Walking ...
INFO: 2024/03/05 16:02:25.008228 Starting part 1 ...
INFO: 2024/03/05 16:02:25.257560 Packing ...
INFO: 2024/03/05 16:02:25.258104 Finished writing part 1.
INFO: 2024/03/05 16:02:25.790298 Starting part 2 ...
INFO: 2024/03/05 16:02:25.790427 /global/pg_control
INFO: 2024/03/05 16:02:25.791274 Finished writing part 2.
INFO: 2024/03/05 16:02:25.791286 Calling pg_stop_backup()
INFO: 2024/03/05 16:02:25.834398 Starting part 3 ...
INFO: 2024/03/05 16:02:25.834452 backup_label
INFO: 2024/03/05 16:02:25.834465 tablespace_map
INFO: 2024/03/05 16:02:25.836654 Finished writing part 3.
INFO: 2024/03/05 16:02:26.322453 Wrote backup with name base_000000010000000000000006

Then your backup was successful 🎉

Setting up a Cron Job

Now that our test backup was successful, stay in the Postgres Linux user and open the crontab list for editing

crontab -e

It will ask you to choose an editor when you are running it for the first time, I use vim-basic.

0 0 * * * /usr/local/bin/wal-g backup-push --config $HOME/.walg.env /var/lib/postgresql/${PG_VERSION}/main

This will run the backup every day at 00:00 or 12:00 AM every day, if you want a different scheduled backup, explore chron scheduling here

Cron is a time-based job scheduler in Unix-like operating systems. It allows you to schedule tasks (commands or scripts) to run automatically at specified intervals. The syntax for a cron job consists of five fields, representing the minute, hour, day of the month, month, and day of the week. Here’s a breakdown of the syntax:

* * * * *
- - - - -
| | | | |
| | | | +----- Day of the week (0 - 6) (Sunday to Saturday; 7 is also Sunday on some systems)
| | | +------- Month (1 - 12)
| | +--------- Day of the month (1 - 31)
| +----------- Hour (0 - 23)
+------------- Minute (0 - 59)

Each field can be either a specific value, a range of values, a list of values, or an asterisk (*) which means “any.” Here are some examples:

* * * * * - This pattern means "every minute, every hour, every day, every month, every day of the week."
0 2 * * * - This pattern means "at 2:00 AM every day."
30 3 * * 1-5 - This pattern means "at 3:30 AM, Monday to Friday."
0 0 1 1 * - This pattern means "at midnight on the first day of every month."

Conclusion

This method is very naive and not recommended if you have huge databases, but it’s a great starting point, also explore what wal-g can offer, it’s a really powerful tool, that can provide backups for large databases as well using streaming & synchronization rather than just archiving data.

Thanks for reading, and I hope that was helpful

Connect with me on X (Formerly Twitter) where I write about my indie development journey, about my product Folksable, and some small talk with friends.