Clickhouse backups and storage used
I self host Plausible Analytics in my personal Kubernetes cluster, and that uses Clickhouse as datastore. Two questions:
1) Clickhouse out of the box uses a ton of storage for logs, so I changed some config as explained in https://theorangeone.net/posts/calming-down-clickhouse/ to calm logs down.
It still uses more storage than needed. For example it was using 4GB already for metrics spanning a short time. By running the command suggested in the article, the storage went down to just 20M, which is reasonable for the tiny amount of data it has collected.
Does anyone know a way to avoid doing this? At the moment I am running that command periodically.
2) What is the best way to back up a Clickhouse database? I found this https://hub.docker.com/r/alexakulov/clickhouse-backup - but it doesn't support the table format used by one of Plausible's tables. So for the time being I am using Velero (since I'm in K8s) to do the backup of the filesystem, and freeze the filesystem during the backup to ensure the backup is not stored with inconsistent/corrupted data. Is there anything better? I would prepare something like the normal dumps we do for Postgres and MySQL.