0

I was wondering if there are any AWS Services or projects which allow us to configure a data pipeline using AWS Lambdas in code. I am looking for something like below. Assume there is a library called pipeline

from pipeline import connect, s3, lambda, deploy
p = connect(s3('input-bucket/prefix'),
            lambda(myPythonFunc, dependencies=[list_of_dependencies])
            s3('output-bucket/prefix'))
deploy(p)

There can be many variations of this idea of course. This use case assumes only one s3 bucket for e.g. There could be a list of input s3 buckets.

Can this be done by AWS Data Pipeline? The documentation I have(quickly) read says that Lambda is used to trigger a pipeline.

1 Answer 1

1

I think the closest thing that is available is the State Machine functionality within the newly released Lambda Step Functions. With these you can coordinate multiple steps that transform your data. I don't believe that they support standard event sources, so you would have to create a standard lambda function (potentially using the Serverless Application Model) to read from S3 and trigger your State Machine.

Sign up to request clarification or add additional context in comments.

1 Comment

I think the Serverless Application Model fits what I need. I have to now investigate how to do that in Python :). Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.