1

I have a Docker container which executes a python script inside it as the ENTRYPOINT. This is the DockerFile

FROM python:3
ADD script.py / 
EXPOSE 80
RUN pip install boto3
RUN pip install uuid
ENTRYPOINT ["python","./script.py"]

This is the Python script:

import boto3
import time
import uuid
import os

guid = uuid.uuid4()
timestr = time.strftime("%Y%m%d-%H%M%S")
job_index = os.environ['AWS_BATCH_JOB_ARRAY_INDEX']

filename = 'latest_test_' + str(guid) + '_.txt'
with open(filename, 'a+') as f:
    data = job_index
    f.write(data)

client = boto3.client(
    's3',
    # Hard coded strings as credentials, not recommended.
    aws_access_key_id='',
    aws_secret_access_key=''
)
response = client.upload_file(filename, 'api-dev-dpstorage-s3', 'docker_data' + filename + '.txt')
with open('response2.txt', 'a+') as f:
    f.write('all done')
    exit

It is simply designed to create a file, write the job array index into the file and push it to an S3 Bucket. The job array index from AWS Batch is being sourced from one of the pre-defined environment variables. I have uploaded the image to AWS ECR, and have set up an AWS Batch to run a job with an array of 10. This should execute the job 10 times, with my expectation that 10 files are dumped into S3, each containing the array index of the job itself.

If I don't include the environment variable and instead just hard code a value into the text file, the AWS Batch job works. If I include the call to os.environ to get the variable, the job fails with this AWS Batch error:

Status reasonEssential container in task exited

I'm assuming there is an issue with how I'm trying to obtain the environment variable. Does anyone know how I could correctly reference either one of the built in environment variables and/or a custom environment variable defined in the job?

1 Answer 1

5

AWS provides docker env configuration by job definition parameters, where you specify:

"environment" : [
    { "AWS_BATCH_JOB_ARRAY_INDEX" : "string"},
]

This will be turned into docker env parameter:

$ docker run --env AWS_BATCH_JOB_ARRAY_INDEX=string $container $cmd

Thus can be accessed by:

import os

job_id = os.environ['AWS_BATCH_JOB_ARRAY_INDEX']

But watch out if you are passing sensitive data in this way, it is not wise to pass credentials in plain text. Instead, in this case, you may want to create a compute environment.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. So even though this is a 'built in' environment variable of AWS Batch, I still have to manually define it in the job definition? That doesn't seem right? As I wouldn't be passing in a value for this environment variable, I would instead expect that AWS batch provides the value to me (e.g. the job array id 0 to 9 of a 10 array job?)
@JamesMatson It seems it's not passed into docker env by default. You'll still need to specify that when docker is run: docs.aws.amazon.com/batch/latest/userguide/… The env variable only exists in batch bcs, and docker is run within & after bcs instance is up.
Thanks. Does that not defeat the purpose of the built-in environment variables? e.g. the one I've referenced? As it's meant to be provided by AWS Batch, not something you specify the value of manually (e.g. specifying AWS_BATCH_JOB_ARRAY_INDEX: 3 in job definitions). It won't be dynamic if it has to be manually specified. I want to make the python script within the container 'aware' of the job id it's running in. Hmmm. Maybe I've misunderstood the use of the environment variables in this case.
@JamesMatson Batch compute provides a virtual machine (the bcs) and you can access the AWS_BATCH_JOB_ARRAY_INDEX in the bcs shell. But your docker is run inside the shell, and thus cannot access the host machine (the bcs) shell variables. Does it explain now?
There's a benefit of specifying it manually. Imagine you don't want to just spawn 1000 identical jobs (e.g., Monte Carlo simulation), but you want to parallelize, where each n has a different role (incrementing AWS_BATCH_JOB_ARRAY_INDEX needs to translate into a multi-dimensional sweep over parameters - it can get complex)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.