Set Up the Ray Cluster
This section provides a step-by-step guide to set up a Ray Cluster on AWS EKS.
It is recommended to use the following script to set up the cluster. The script will guide you through the setup process on AWS, including Docker image building, EKS cluster creation, and deployment of Ray on Kubernetes.
Prerequisites
Before you begin, ensure you have the following:
AWS CLI installed and configured.
Docker installed and running.
Helm installed.
kubectl installed.
AWS ECR credentials.
AWS EKS access.
Steps to Set Up Ray Cluster on AWS EKS
Configure AWS Credentials
Run the following commands to set up AWS credentials:
aws configure set aws_access_key_id <YOUR_AWS_ACCESS_KEY> aws configure set aws_secret_access_key <YOUR_AWS_SECRET_KEY> aws configure set region <YOUR_AWS_REGION>
Make sure to replace <YOUR_AWS_ACCESS_KEY>, <YOUR_AWS_SECRET_KEY>, and <YOUR_AWS_REGION> with your actual credentials and region.
Log in to AWS ECR Public
To push Docker images to AWS ECR, you need to log in to the public ECR:
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
Build and Push Docker Image to ECR
To build and push the Docker image, run the following commands:
docker buildx build --platform linux/amd64 -t public.ecr.aws/i7t1s5i1/fedgraph:img . --push
Create EKS Cluster
Create a dynamic EKS cluster with the following command:
eksctl create cluster -f eks_cluster_config.yaml --timeout=60m
Update Kubeconfig
Update your kubeconfig file to access the newly created EKS cluster:
aws eks --region <YOUR_AWS_REGION> update-kubeconfig --name <CLUSTER_NAME>
Clone KubeRay Repository and Install Prometheus/Grafana
Clone the KubeRay repository and install monitoring tools:
git clone https://github.com/ray-project/kuberay.git cd kuberay ./install/prometheus/install.sh
Install KubeRay Operator
To manage Ray on Kubernetes, you need to install the KubeRay operator:
helm repo add kuberay https://ray-project.github.io/kuberay-helm/ helm repo update helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1
Deploy Ray Kubernetes Cluster
Apply the Kubernetes configuration to deploy Ray on EKS:
kubectl apply -f ray_kubernetes_cluster.yaml kubectl apply -f ray_kubernetes_ingress.yaml
Verify Pod Status
Check the status of the pods to ensure that they are running:
kubectl get pods
Port Forwarding for Ray Dashboard, Prometheus, and Grafana
Forward the necessary ports for accessing the Ray dashboard and monitoring tools:
kubectl port-forward service/raycluster-autoscaler-head-svc 8265:8265 & kubectl port-forward raycluster-autoscaler-head-47mzs 8080:8080 & kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 -n prometheus-system 9090:9090 & kubectl port-forward deployment/prometheus-grafana -n prometheus-system 3000:3000 &
Final Check
To ensure everything is set up correctly, perform a final check:
kubectl get pods --all-namespaces -o wide
Submit a Ray Job (Optional)
If you want to submit a Ray job, use the following command:
ray job submit --runtime-env-json '{"working_dir": "./", "excludes": [".git"]}' --address http://localhost:8265 -- python3 run.py
Stop a Ray Job (Optional)
To stop a Ray job, use:
ray job stop <job_id> --address http://localhost:8265
Clean Up Resources
To clean up resources, delete the RayCluster and EKS cluster:
kubectl delete -f ray_kubernetes_cluster.yaml kubectl delete -f ray_kubernetes_ingress.yaml kubectl get nodes -o name | xargs kubectl delete eksctl delete cluster --region <YOUR_AWS_REGION> --name <CLUSTER_NAME>
Setup completed successfully!