7

I know this has been questioned before, but no real solution was proposed and I was wondering if there any new ways nowadays.

Is there anyway to hook an event using any AWS service to check if a lambda has timed out? I mean it logs into the CloudWatch logs that it timed out so there must be a way.

Specifically in Python because its not so simple to keep checking if its reaching the 20 minute mark as you can with Javascript and other naturally concurrent languages.

Ideally I want to execute a lambda if the python lambda times out, with the same payload the original one received.

4 Answers 4

20

Here's an example from cloudformation-custom-resources/lambda/python · GitHub showing how an AWS Lambda function written in Python can realise that it is about to timeout.

(I've edited out the other stuff, here's the relevant bits):

import signal

def handler(event, context):

    # Setup alarm for remaining runtime minus a second
    signal.alarm((context.get_remaining_time_in_millis() / 1000) - 1)

    # Do other stuff
    ...

def timeout_handler(_signal, _frame):
    '''Handle SIGALRM'''
    raise Exception('Time exceeded')

signal.signal(signal.SIGALRM, timeout_handler)
Sign up to request clarification or add additional context in comments.

2 Comments

For future readers, (context.get_remaining_time_in_millis() / 1000) should be an int. Hence, consider using int(context.get_remaining_time_in_millis() / 1000).
@john-rotenstein Can you explain how this will actually work? isn't signal a global that will be used in multiple executions of handler within the same run environment? Doesn't this solution raise an 'Time exceeded' error when the environment times out and not a single handler execution?
17

I want to update on @John Rotenstein answer which worked for me yet resulted in the following errors populating the cloudwatch logs:

START RequestId: ********* Version: $LATEST
Traceback (most recent call last):
    File "/var/runtime/bootstrap", line 9, in <module>
    main()
    File "/var/runtime/bootstrap.py", line 350, in main
    event_request = lambda_runtime_client.wait_next_invocation()
    File "/var/runtime/lambda_runtime_client.py", line 57, in wait_next_invocation
    response = self.runtime_connection.getresponse()
    File "/var/lang/lib/python3.7/http/client.py", line 1369, in getresponse
    response.begin()
    File "/var/lang/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
    File "/var/lang/lib/python3.7/http/client.py", line 271, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    File "/var/lang/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
    File "/var/task/lambda_function.py", line 6, in timeout_handler
    raise Exception('Time limit exceeded')
Exception: Time limit exceeded
END RequestId

So I just had to reset the signals alarm before returning each response:

import logging
import signal


def timeout_handler(_signal, _frame):
    raise Exception('Time limit exceeded')


signal.signal(signal.SIGALRM, timeout_handler)


def lambda_handler(event, context):
    try:
        signal.alarm(int(context.get_remaining_time_in_millis() / 1000) - 1)

        logging.info('Testing stuff')
        # Do work

    except Exception as e:
        logging.error(f'Exception:\n{e}')

    signal.alarm(0)# This line fixed the issue above!
    return {'statusCode': 200, 'body': 'Complete'}

Comments

2

Two options I can think of, the first is quick and dirty, but also less ideal:

  1. run it in a step function (check out step functions in AWS) which has the capability to retry on timeouts/errors

  2. a better way would be to re-architect your code to be idempotent. In this example, the process that triggers the lambda checks a condition, and as long as this condition is true, trigger the lambda. That condition needs to remain true unless the lambda finished executing the logic successfully. This can be obtained by persisting the parameters sent to the lambda in a table in the DB, for example, and have an extra field called "processed" which will be modified to "true" only once the lambda finished running successfully for that event.

Using method #2 will make your code more resilient, easy to re-run on errors, and also easy to monitor: basically all you have to do is check how many such records do you have which are not processed, and what's their create/update timestamp on the DB.

4 Comments

But solution 2 wouldn't be event based right? It would be checking periodically
@Mojimi it could be event based (trigger a lambda based on an event) or scheduled (trigger the lambda based on a scheduled cloudwatch event). Or it can be both :) That's part of the beauty of it, you can set whatever you need and also run things manually in case of a problem (easy recovery mechanism, because you don't lose information).
but how would it be event based if there's no trigger for the timeout? (in method 2) I can't picture which aws service/configuration would work this way
Event can be anything, something that you build to check the records and trigger a lambda, or time-based job (like cron/cloudwatch event). I don't have enough context into your service so any one of the two might be suitable. Further, if this happens a lot you should consider stop using a lambda to run it, instead you can have the lambda spin up a container and run it there where it doesn't have a time-limitation. Another option is to check why does it take over 20 minutes and optimize & improve it to run quicker.
0

If you care not only to identify the timeout, but to give your Lambdas an option of a "healthy" shutdown and pass the remaining payload to another execution automatically, you may have a look at the Siblings components of the sosw package.

Here is an example use-case where you call the sibling when the time is running out. You pass a pointer to where you have left the job to the Sibling. For example you may store the remaining payload in S3 and the cursor will show where you have stopped processing.

You will have to grant the Role of this Lambda permission to lambda:InvokeFunction on itself.

import logging
import time
from sosw import Processor as SoswProcessor
from sosw.app import LambdaGlobals, get_lambda_handler
from sosw.components.siblings import SiblingsManager


logger = logging.getLogger()
logger.setLevel(logging.INFO)


class Processor(SoswProcessor):

    DEFAULT_CONFIG = {
        'init_clients':    ['Siblings'],    # Automatically initialize Siblings Manager
        'shutdown_period': 10,  # Some time to shutdown in a healthy manner.
    }

    siblings_client: SiblingsManager = None


    def __call__(self, event):

        cursor = event.get('cursor', 0)

        while self.sufficient_execution_time_left:
            self.process_data(cursor)
            cursor += 1
            if cursor == 20:
                return f"Reached the end of data"

        else:
            # Spawning another sibling to continue the processing
            payload = {'cursor': cursor}

            self.siblings_client.spawn_sibling(global_vars.lambda_context, payload=payload, force=True)
            self.stats['siblings_spawned'] += 1


    def process_data(self, cursor):
        """ Your custom logic respecting current cursor. """
        logger.info(f"Processing data at cursor: {cursor}")
        time.sleep(1)


    @property
    def sufficient_execution_time_left(self) -> bool:
        """ Return if there is a sufficient execution time for processing ('shutdown period' is in seconds). """
        return global_vars.lambda_context.get_remaining_time_in_millis() > self.config['shutdown_period'] * 1000


global_vars = LambdaGlobals()
lambda_handler = get_lambda_handler(Processor, global_vars)

2 Comments

So it's a child process? Interesting. Are you sure that's allowed on lambda?
No. This is not a child process. When running out of time, the function asynchronously calls itself again and exits the current execution. Should be careful with the code to avoid infinite loops of invocations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.