1

I pulled this image - rayproject/ray:2.51.0.801bd7-extra-py310 from dockerhub with podman (ray prefers podman internally).

On AWS-EC2 then I started a head instance with - Deep Learning Base AMI with Single CUDA (Ubuntu 22.04), ami-06eff6f62c23006e9 (64-bit (x86))

On the instance, I have:

ubuntu@ip-172-31-15-118:~$ python --version
Python 3.10.19
ubuntu@ip-172-31-15-118:~$ python3 --version
Python 3.10.19
ubuntu@ip-172-31-15-118:~$ docker --version
Docker version 28.5.1, build e180ab8
ubuntu@ip-172-31-15-118:~$ podman --version
podman version 3.4.4


python3 -m venv ~/raycli-env
source ~/raycli-env/bin/activate
pip install -U "ray[default]"

(raycli-env) ubuntu@ip-172-31-15-118:~$ ray --version
ray, version 2.51.0

I have exported the following environment variables also:

export RAY_RUNTIME_ENV_DOCKER=1
export RAY_RUNTIME_ENV_PODMAN_EXE=/usr/bin/docker
export PATH=/usr/bin:$PATH

And I have started the ray cluster with:

ray start --head --port=6379 --dashboard-host=0.0.0.0

Then I have done:

mkdir ray-job
cd ray-job

echo 'import ray
ray.init(address="auto")
@ray.remote
def hello():
    return "Hello World from Ray on GPU!"
print(ray.get(hello.remote()))' > hello_ray.py

Then I extended the image with the following Dockerfile:

ARG RAY_UID=1000
ARG RAY_GID=100

FROM rayproject/ray:2.51.0.801bd7-extra-py310

USER root
RUN pip install "ray[default]"

# go back to ray user for running workloads
USER ray
WORKDIR /home/ray

CMD ["python", "hello_ray.py"]

And built the image with:

podman build -t ray-image:latest

But when I run, the following command, I don't get any errors, and the job stays in pending forever.

ray job submit \
  --address="http://127.0.0.1:8265" \
  --runtime-env-json '{"image_uri": "localhost/ray-image:latest"}' \
  -- python hello_ray.py

But when I run this with python-slim with python versions matched, I am able to successfully submit the job.

I don't know what I am doing wrong. The python and ray versions on EC2 instance and the podman image are same. Please help me. I have been stucked here for so long.

1 Answer 1

0

Two issues likely:

  1. you built with Podman but Ray is using Docker (RAY_RUNTIME_ENV_PODMAN_EXE=/usr/bin/docker). Docker can’t see images built by Podman, and “localhost/ray-image:latest” is parsed as a registry, so it won’t find it.

  2. The runtime_env key should be {"container":{"image":"…"}}, not image_uri.
    Try: ray job submit --runtime-env-json '{"container":{"image":"ray-image:latest"}}'
    after building with docker build, or point Ray to /usr/bin/podman and push the image to a real registry (ECR/Docker Hub).

Docs: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#container-runtime-env.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.