How to handle timeouts in a python lambda?

Question

I know this has been questioned before, but no real solution was proposed and I was wondering if there any new ways nowadays.

Is there anyway to hook an event using any AWS service to check if a lambda has timed out? I mean it logs into the CloudWatch logs that it timed out so there must be a way.

Specifically in Python because its not so simple to keep checking if its reaching the 20 minute mark as you can with Javascript and other naturally concurrent languages.

Ideally I want to execute a lambda if the python lambda times out, with the same payload the original one received.

John Rotenstein · Accepted Answer · 2019-07-29 22:50:40Z

20

Here's an example from cloudformation-custom-resources/lambda/python · GitHub showing how an AWS Lambda function written in Python can realise that it is about to timeout.

(I've edited out the other stuff, here's the relevant bits):

import signal

def handler(event, context):

    # Setup alarm for remaining runtime minus a second
    signal.alarm((context.get_remaining_time_in_millis() / 1000) - 1)

    # Do other stuff
    ...

def timeout_handler(_signal, _frame):
    '''Handle SIGALRM'''
    raise Exception('Time exceeded')

signal.signal(signal.SIGALRM, timeout_handler)

answered Jul 29, 2019 at 22:50

John Rotenstein

273k28 gold badges456 silver badges541 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pramesh Bajracharya Over a year ago

For future readers, (context.get_remaining_time_in_millis() / 1000) should be an int. Hence, consider using int(context.get_remaining_time_in_millis() / 1000).

Sharon Lifshits Over a year ago

@john-rotenstein Can you explain how this will actually work? isn't signal a global that will be used in multiple executions of handler within the same run environment? Doesn't this solution raise an 'Time exceeded' error when the environment times out and not a single handler execution?

Cody DeGhetto · Accepted Answer · 2020-11-03 23:43:43Z

I want to update on @John Rotenstein answer which worked for me yet resulted in the following errors populating the cloudwatch logs:

START RequestId: ********* Version: $LATEST
Traceback (most recent call last):
    File "/var/runtime/bootstrap", line 9, in <module>
    main()
    File "/var/runtime/bootstrap.py", line 350, in main
    event_request = lambda_runtime_client.wait_next_invocation()
    File "/var/runtime/lambda_runtime_client.py", line 57, in wait_next_invocation
    response = self.runtime_connection.getresponse()
    File "/var/lang/lib/python3.7/http/client.py", line 1369, in getresponse
    response.begin()
    File "/var/lang/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
    File "/var/lang/lib/python3.7/http/client.py", line 271, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    File "/var/lang/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
    File "/var/task/lambda_function.py", line 6, in timeout_handler
    raise Exception('Time limit exceeded')
Exception: Time limit exceeded
END RequestId

So I just had to reset the signals alarm before returning each response:

import logging
import signal


def timeout_handler(_signal, _frame):
    raise Exception('Time limit exceeded')


signal.signal(signal.SIGALRM, timeout_handler)


def lambda_handler(event, context):
    try:
        signal.alarm(int(context.get_remaining_time_in_millis() / 1000) - 1)

        logging.info('Testing stuff')
        # Do work

    except Exception as e:
        logging.error(f'Exception:\n{e}')

    signal.alarm(0)# This line fixed the issue above!
    return {'statusCode': 200, 'body': 'Complete'}

Nir Alfasi · Accepted Answer · 2019-07-29 17:55:07Z

2

Two options I can think of, the first is quick and dirty, but also less ideal:

run it in a step function (check out step functions in AWS) which has the capability to retry on timeouts/errors
a better way would be to re-architect your code to be idempotent. In this example, the process that triggers the lambda checks a condition, and as long as this condition is true, trigger the lambda. That condition needs to remain true unless the lambda finished executing the logic successfully. This can be obtained by persisting the parameters sent to the lambda in a table in the DB, for example, and have an extra field called "processed" which will be modified to "true" only once the lambda finished running successfully for that event.

Using method #2 will make your code more resilient, easy to re-run on errors, and also easy to monitor: basically all you have to do is check how many such records do you have which are not processed, and what's their create/update timestamp on the DB.

answered Jul 29, 2019 at 17:55

Nir Alfasi

53.6k11 gold badges94 silver badges138 bronze badges

4 Comments

Mojimi Over a year ago

But solution 2 wouldn't be event based right? It would be checking periodically

Nir Alfasi Over a year ago

@Mojimi it could be event based (trigger a lambda based on an event) or scheduled (trigger the lambda based on a scheduled cloudwatch event). Or it can be both :) That's part of the beauty of it, you can set whatever you need and also run things manually in case of a problem (easy recovery mechanism, because you don't lose information).

Mojimi Over a year ago

but how would it be event based if there's no trigger for the timeout? (in method 2) I can't picture which aws service/configuration would work this way

Nir Alfasi Over a year ago

Event can be anything, something that you build to check the records and trigger a lambda, or time-based job (like cron/cloudwatch event). I don't have enough context into your service so any one of the two might be suitable. Further, if this happens a lot you should consider stop using a lambda to run it, instead you can have the lambda spin up a container and run it there where it doesn't have a time-limitation. Another option is to check why does it take over 20 minutes and optimize & improve it to run quicker.

Nikolay Grishchenko · Accepted Answer · 2019-07-30 12:11:18Z

If you care not only to identify the timeout, but to give your Lambdas an option of a "healthy" shutdown and pass the remaining payload to another execution automatically, you may have a look at the Siblings components of the sosw package.

Here is an example use-case where you call the sibling when the time is running out. You pass a pointer to where you have left the job to the Sibling. For example you may store the remaining payload in S3 and the cursor will show where you have stopped processing.

You will have to grant the Role of this Lambda permission to lambda:InvokeFunction on itself.

import logging
import time
from sosw import Processor as SoswProcessor
from sosw.app import LambdaGlobals, get_lambda_handler
from sosw.components.siblings import SiblingsManager


logger = logging.getLogger()
logger.setLevel(logging.INFO)


class Processor(SoswProcessor):

    DEFAULT_CONFIG = {
        'init_clients':    ['Siblings'],    # Automatically initialize Siblings Manager
        'shutdown_period': 10,  # Some time to shutdown in a healthy manner.
    }

    siblings_client: SiblingsManager = None


    def __call__(self, event):

        cursor = event.get('cursor', 0)

        while self.sufficient_execution_time_left:
            self.process_data(cursor)
            cursor += 1
            if cursor == 20:
                return f"Reached the end of data"

        else:
            # Spawning another sibling to continue the processing
            payload = {'cursor': cursor}

            self.siblings_client.spawn_sibling(global_vars.lambda_context, payload=payload, force=True)
            self.stats['siblings_spawned'] += 1


    def process_data(self, cursor):
        """ Your custom logic respecting current cursor. """
        logger.info(f"Processing data at cursor: {cursor}")
        time.sleep(1)


    @property
    def sufficient_execution_time_left(self) -> bool:
        """ Return if there is a sufficient execution time for processing ('shutdown period' is in seconds). """
        return global_vars.lambda_context.get_remaining_time_in_millis() > self.config['shutdown_period'] * 1000


global_vars = LambdaGlobals()
lambda_handler = get_lambda_handler(Processor, global_vars)

So it's a child process? Interesting. Are you sure that's allowed on lambda?
No. This is not a child process. When running out of time, the function asynchronously calls itself again and exits the current execution. Should be careful with the code to avoid infinite loops of invocations.

Collectives™ on Stack Overflow

How to handle timeouts in a python lambda?

4 Answers 4

2 Comments

Comments

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related