TensorWave and AMD Support Now Available on SkyPilot

We’re excited to announce that TensorWave and AMD GPU support is now available on SkyPilot, the leading platform for managing and scaling AI/ML workloads.

What Is SkyPilot?

The open-source SkyPilot framework, developed at the Sky Computing Lab at the University of California, Berkeley, automatically determines the most cost-effective environment (cloud provider, region, and zone) for executing a given workload, given your requirements (number and type of GPU or other resource). 

Until recently, SkyPilot was available only for the “big three” cloud service providers and their preferred GPU hardware. This limitation prevented organizations from taking advantage of new options, such as AMD’s Instinct MI300 series of GPUs. TensorWave has been collaborating with the SkyPilot team and AMD to add support for TensorWave’s cloud environment, including its AMD GPUs.

At TensorWave, we’ve been working closely with the SkyPilot team to incorporate full AMD GPU compatibility into their powerful orchestration and scheduling engine. While official support is still in pre-alpha stages, we’ve prepared a custom Docker image with all the integration pieces necessary for our valued customers to start testing their workloads today.

Getting Started with AMD on SkyPilot

If you’re an existing SkyPilot user, getting started is easy. Simply download our custom Docker image and load it into your environment:

docker load -i com.tensorwave.skypilot-alpha-9f8871_[customer].tar.gz

docker run -it com.tensorwave.skypilot-alpha-9f8871:[customer]

Your TensorWave AMD GPU cluster will be pre-configured and ready to go. Use the standard sky check command to see it populated in your SkyPilot dashboard. Kubectl is also pre-loaded to directly manage cluster resources if needed.

Example Workloads

To help jump start your testing, we’ve included some example workloads in the Docker image:

examples/amd_task.yaml: Provisions a single MI210 GPU with ROCm-enabled PyTorch and outputs GPU utilization stats

examples/amd_opt.yaml: Runs distributed training of large language models across 32 MI210 GPUs

For fans of the Llama models, we’ve also pre-loaded the 7B and 13B versions ready for use.

Feedback Welcome

This is an exciting first step in bringing AMD GPU support to the SkyPilot platform. Our engineering teams welcome any and all feedback during this preview period as we work toward full integration.

If you encounter any issues or have suggestions, please don’t hesitate to reach out to our dedicated support staff through our website here. We’re committed to making SkyPilot the premier orchestration solution for all accelerator hardware.

Leave a Reply

Your email address will not be published. Required fields are marked *