Job on GitLab runner terminates with Error 137

Spanish Version

Error 137 is usually an out-of-memory error from your container, but I have seen some cases in which random jobs terminate with this error.
After some research, I found that this condition can be caused by the container throwing large amounts of output into stdout in a very rapid fashion, it will eventually need to be fixed by GitLab, but, in the meantime, there is a workaround.

Validate

The easiest way to prove if the problem is coming from your container or the condition stated above, send all your output to null or a file. If the problem goes away, continue reading for some workarounds, if not, then your container is really running out of memory.

Potential Temporary Workarounds

1. You control the output – If the specific output that is causing the problem is not really necessary, and you have control over the output, then just send smaller strings or less output.

2. You can’t control the output and you don’t need it anyway – In your pipeline, just throw away the output, the job will still fail if the process returns an error.

script:
  - myverbosejob > null

3. You need the output for troubleshooting if the job fails – You can throw the output into a file, artifact it, and view it later by downloading the artifact. The job will still fail if the process returns an error.

  script:
    - myverbosejob > logs
  artifacts:
    paths:
      - logs
    expire_in: 1 day

4. You want some of the logs to show if the job fails, but you don’t care about the exit code – Pipe the output and grep for known errors. The job WILL NOT fail even if the process returns an error code.

  script:
    - myverbosejob | grep -i error

5. You want some of the logs to show if the job fails and you want to error out the job if the process fails – Throw the output to a file but save the process exit code to fail the job.

script:
    - myverbosejob > logs && true ;
        STATUS=$? && if [ $STATUS -eq 1 ]; then 
        cat logs | grep -i error ; false ; fi'

This workaround is a little complex, but it does accomplish the job. ‘&& true’ in the first line prevents the job from stopping immediately, but preserves the exit code. In the second line we capture the exit code into the STATUS variable and compare it to 1 (false). If the process ended with an error code then we show the logs that contain an error (you can change this to a more relevant grep). In the third line, after showing the filtered logs, ‘false’ sends the error code to the job, so it gets flagged as failed. If you know of a simpler way, please share. Peace.