We constantly getting a java.lang.OutOfMemoryError: Java heap space
This is often due to one tomcat process trying to allocate over the 2GB ram limit and thereby initiating restart of tomcat. We get this issue when we right click a transformation/orchestration and selecting "Task History".
Resulting in crashing the application and restarting all running jobs.
With regards to the OutOfMemory errors you are experiencing, it's worth knowing that
Matillion jobs are vulnerable to a variety OOM problems, with the main symptoms including
OOM
Java heap space
java.lang.OutOfMemoryError
You will need to identify which job is causing the OOM as It's impossible to tell that from the catalina log file. Although there are edge cases, it's usually caused by one of three things:
A Python Script (especially if running in Jython mode) which is using a lot of memory
A Database Query, especially if it's trying to fetch many columns or wide columns (LOB, JSON, XML etc)
An iterator in an orchestration job, especially in Concurrent mode, and especially if it contains either of the above
After hitting an OOM:
You must restart Matillion because nothing works properly after an OOM. It will detect an OOM and restart itself up to 5 times per day.
Please watch out for large .hprof file(s) in /tmp which can quickly consume a lot of disk space. They can just be deleted if the restart does not clean them up
Things to bear in mind:
Don't write Python scripts which require a lot of memory. If required, do this outside of Matillion
Don't query many columns or wide columns using the Database Query
Run less things in parallel (don't use concurrent-mode iterators, and schedule jobs such that start times are staggered)
Customers with HA enabled might have a worse experience with OOM issues. What happens is:
The job runs on instance 1, and hits a java.lang.OutOfMemoryError: Java heap space error
Matillion detects the error and shuts downs instance 1 for restart
Instance 2 then detects that the job needs to be run, launches the job, and then very likely hits the same java.lang.OutOfMemoryError: Java heap space error
We will try to adjust the implementation of the jobs that are causing these errors. However it is important for us that Matillion also take part in solving these issues as they are well known for multiple customers.
Other software implement disk caching and blocking implementations to overcome these issues, is this something that is evaluated for Matillion?
A fix for this issue would possibly also contribute to the issue of compiling larger transformations without splitting them in multiple steps and thereby minimizing the complexity of customer implementation.
"An iterator in an orchestration job, especially in Concurrent mode, and especially if it contains either of the above".
Does this recommendation to avoid using a Concurrent mode iterator in an orchestration job especially if it contains a Database Query, apply where the Concurrent mode iterator is attached to a Run Orchestration component that runs an Orchestration job containing a Database Query?