About GC Overhead limit exceeded error

Hi,

 

I had many jobs failing with exception: GC overhead limit exceeded yesterday morning and night before that and I am trying to understand why it happened.

Background:

I was testing an API job which had taken a little longer than expected, so I cancelled it in Matillion. Henceforth, whichever job I was trying to test was giving the GC overhead limit exceeded error.

Matillion could not connect to server(Snowflake) and any action was time-consuming. I checked the Snowflake history if any query had taken longer than expected but nothing unusual. All scheduled jobs gave same error next day.

We had to restart the Matillion instance to fix the issue.

So my question is that why the memory was not freed up even if I cancelled the job and what exactly is GC overhead limit exceeded and how to avoid such situation.

Hi @NeelamMacwan​,

I'd say first that any java.lang.OutOfMemoryError can be difficult to track down, because they are cumulative problems .. the total amount of memory needed by every running thread (including the Matillion UI itself) has exceeded some threshold.

There are two very similar errors: "Java heap space" and "GC Overhead limit exceeded":

  • "Java heap space" - a thread requested more memory than was available
  • "GC Overhead limit exceeded" - the JVM calculated it very likely that a thread was about to request more memory than was available

In your case I'd agree it's likely that the failed API job was the root cause. It would definitely be worth some investigation. Perhaps the API was returning a large number of records without paging? Or perhaps relatively few, but very wide records?

My guess is that you would be able to avoid a "GC Overhead limit exceeded" error just by restarting your server. Then the JVM would forget that it had recently been struggling to find more memory, and would be much less likely to issue a "GC Overhead limit exceeded" error.

Hope that makes sense and is helpful.

Best regards,

Ian

Hi Ian,

 

Thank you so much for the explanation.

 

There is no way for me to find out what actually happened, but I am almost sure that it was the API query which had caused the memory issue.

My next question would be where do I start the investigation.

 

And yes, we restarted the Matillion instance which solved the problem.

 

Once again, thanks for your help.

 

Regards,

Neelam