Matillion out of memory

Botn · June 4, 2021, 6:41am

We constantly getting a java.lang.OutOfMemoryError: Java heap space

This is often due to one tomcat process trying to allocate over the 2GB ram limit and thereby initiating restart of tomcat. We get this issue when we right click a transformation/orchestration and selecting "Task History".

Resulting in crashing the application and restarting all running jobs.

Sam-Matillion · June 4, 2021, 11:47am

Hi. Sam from Matillion here.

With regards to the OutOfMemory errors you are experiencing, it's worth knowing that

Matillion jobs are vulnerable to a variety OOM problems, with the main symptoms including

OOM
Java heap space
java.lang.OutOfMemoryError

You will need to identify which job is causing the OOM as It's impossible to tell that from the catalina log file. Although there are edge cases, it's usually caused by one of three things:

A Python Script (especially if running in Jython mode) which is using a lot of memory
A Database Query, especially if it's trying to fetch many columns or wide columns (LOB, JSON, XML etc)
An iterator in an orchestration job, especially in Concurrent mode, and especially if it contains either of the above

After hitting an OOM:

You must restart Matillion because nothing works properly after an OOM. It will detect an OOM and restart itself up to 5 times per day.
Please watch out for large .hprof file(s) in /tmp which can quickly consume a lot of disk space. They can just be deleted if the restart does not clean them up

Things to bear in mind:

Don't write Python scripts which require a lot of memory. If required, do this outside of Matillion
Don't query many columns or wide columns using the Database Query
Run less things in parallel (don't use concurrent-mode iterators, and schedule jobs such that start times are staggered)

Customers with HA enabled might have a worse experience with OOM issues. What happens is:

The job runs on instance 1, and hits a java.lang.OutOfMemoryError: Java heap space error
Matillion detects the error and shuts downs instance 1 for restart
Instance 2 then detects that the job needs to be run, launches the job, and then very likely hits the same java.lang.OutOfMemoryError: Java heap space error
Instance 2 shuts down for restart
Repeat

I hope this helps.

Botn · June 4, 2021, 12:25pm

Thank you for your response.

We will try to adjust the implementation of the jobs that are causing these errors. However it is important for us that Matillion also take part in solving these issues as they are well known for multiple customers.

Other software implement disk caching and blocking implementations to overcome these issues, is this something that is evaluated for Matillion?

A fix for this issue would possibly also contribute to the issue of compiling larger transformations without splitting them in multiple steps and thereby minimizing the complexity of customer implementation.

AKelly · August 17, 2022, 11:42pm

@Sam-Matillion

You mention that the following may cause an OOM:

"An iterator in an orchestration job, especially in Concurrent mode, and especially if it contains either of the above".

Does this recommendation to avoid using a Concurrent mode iterator in an orchestration job especially if it contains a Database Query, apply where the Concurrent mode iterator is attached to a Run Orchestration component that runs an Orchestration job containing a Database Query?

Topic		Replies	Views
Out of Memory - Jobs not releasing memory for certain connectors Matillion ETL	3	1	August 24, 2021
Recently facing "Out of Memory Error detected" issue in Matillion Matillion ETL	14	7	February 5, 2025
About GC Overhead limit exceeded error Matillion ETL	2	1	September 16, 2021
Preventing Outages in Matillion ETL and supporting near real time update pipelines Matillion ETL	4	1	June 9, 2022
Matillion Instance out of disk space Matillion ETL	4	0	October 5, 2022

Matillion out of memory

Related topics