Optimize your Azure HPC clusters with a custom Slurm image for secure, internet-free environments and faster startup. This step-by-step guide covers creating a locked-down Almalinux image pre-installed with Slurm, adjusting repos, generalizing the VM, and updating CycleCloud templates for efficient scaling. Unique :

Speed Up Your Azure HPC Clusters with a Custom Slurm Image
If you’re running High Performance Computing (HPC) clusters on Azure CycleCloud, you know how crucial startup speed and security are. Microsoft just shared a detailed guide on creating a custom Slurm image tailored for locked-down environments and faster scaling. This is a game-changer for teams that need clusters without internet access or want to pre-install Slurm packages for quicker deployments.
What’s New?
The big update here is the ability to build a custom Azure VM image with Slurm and all necessary packages pre-installed. This means your HPC nodes can spin up faster, and you don’t have to rely on internet access during cluster scaling. The guide uses CycleCloud 8.7.1, Slurm 23.110-2, and AlmaLinux HPC image gen 2 8.10 as the base environment.
“Many CycleCloud users need to run their HPC cluster in a secure environment without internet access.”
Major Updates and Steps to Follow
Build Your Custom VM Image
- Create an Azure VM with AlmaLinux HPC image.
- Download and install Slurm packages from the official GitHub release.
- Add Slurm and Munge users with specific user IDs matching CycleCloud defaults.
- Install additional packages like xfsdump, cryptsetup, and lvm2 required by CycleCloud’s Chef setup.
- Configure OS repositories carefully to work in locked-down mode, enabling at least one repo.
- Generalize the VM using Azure CLI commands to prepare for image capture.
- Capture the VM image in Azure Portal, ensuring it’s stored in a Shared Image Gallery for CycleCloud compatibility.
Modify CycleCloud Templates
Next, update your CycleCloud cluster template to skip Slurm installation since it’s already baked into your custom image. Add slurm.do_install = false in the configuration section. Then, import the updated template and create your cluster using the custom image resource ID.
“If you can open the cycleserver to have internet access for one time only while starting the cluster, that will help in caching the project for the first time.”
Why This Matters
Pre-installing Slurm and dependencies drastically reduces cluster startup time. It also ensures your HPC environment stays secure by eliminating the need for internet access during node provisioning. Plus, this approach avoids common pitfalls in earlier CycleCloud versions by using a patch that fixes installation flags.
For tech teams managing sensitive workloads or large-scale HPC jobs, this method offers both speed and peace of mind. Microsoft’s step-by-step guide is a must-read for anyone using Azure CycleCloud with Slurm.
Final Thoughts
Custom images are a smart way to optimize HPC clusters in Azure. With this new approach, you get faster scaling, enhanced security, and less hassle during deployments. Whether you’re locked down or just want to save precious minutes, this update is worth exploring.
Check out the full guide and scripts on Microsoft’s GitHub and CycleCloud documentation to get started today.
From the New blog articles in Microsoft Community Hub
