Matillion Instance out of disk space

My Matillion instance has stopped because it's out of disk space. It looks like the internal postgres 9.6 db has lost its mind. Any ideas on how to fix this?

Hi @JulieThomas

I can see you have raised this with our support team, myself and the community would be really interested to hear what they suggest and what resolves this issue for you.

Kind regards, Joe

OK--here's what I know and what we've done so far. The Matillion internal database appears to have filled up with task output. The retention period was set to a year and we run 8 jobs every 6 minutes, round-the-clock, so I'm currently of the opinion that the disk space filled up in the course of normal operations.

 

First, we extended the volume to get Matillion back up and running. Everything appeared to be healthy, but the database was, in fact, still growing.

 

James the Engineer and I got on a Zoom and performed the following steps:

 

1) Backed up the Matillion instance with a volume snapshot

2) Reconfigured task history retention to more normal values

3) Restarted Tomcat

4) Vacuum'ed the Postgres database--currently in progress. We've got 200GB of database that I'm hoping to take to ~50 GB, so it's taking a while.

5) Restart server (TBD)

 

To be clear, I executed these steps at the direction and with the guidance of Matillion support.

I'm not a Linux sysadmin nor a Postgres DBA and wouldn't leap into this on my own.

Final state: we reconfigured the log retention so it will stop growing, but we can't reclaim the disk space because the task logging table is 142 GB.

 

In order to reduce the space used by the table, Postgres needs to be able to make a copy of it, which would require extending the volume again. Since an EBS volume, it can't be shrunk after its extended without creating a new volume, copying the data over, and remounting the volume.

 

Given all that trouble, it seems better to just roll it into my next upgrade of Matillion, which I will do on a new instance. We will then extend the instance, vacuum the database, backup Matillion, create a new instance, and restore the back-up on a new, smaller volume, then delete the old instance and its storage.

@JulieThomas​ , we are running into the same issue. Looking at the steps you took to resolve...where did you reconfigure the task history retention?