How to Optimize Azure CycleCloud HPC Clusters with a Custom Slurm Image for Secure, Fast Scaling

Posted by

Optimize your Azure CycleCloud HPC clusters with a custom Slurm image for secure, internet-free environments and faster scaling. This step-by-step guide covers VM setup, package installation, repo configuration, image capturing, and template modification to streamline HPC deployments efficiently. Unique :

Speed Up Your Azure HPC Clusters with a Custom Slurm Image

If you’re managing HPC clusters with Azure CycleCloud, you know how crucial security and startup speed are. Microsoft just shared a neat way to build a custom Slurm image that locks down your environment and slashes node startup times. Let’s break down what you need to know.

What’s New?

The new approach lets admins pre-install Slurm packages directly onto an Almalinux HPC image. This means your cluster nodes don’t waste time downloading and installing Slurm during scaling. Plus, it supports locked-down environments with no internet access—perfect for sensitive workloads.

“Many CycleCloud users or admins need to run their HPC cluster in a secure environment without internet access.”

Using Azure CycleCloud 8.7.1 and Slurm 23.110-2, you create a standalone VM with all required packages. Then, capture this as a custom image for your cluster nodes.

Major Updates and Steps

Prepare Your VM

  • Create an Azure VM with Almalinux HPC image gen 2 (8.10).
  • Download and install Slurm packages from the official GitHub release.
  • Add Slurm and Munge users with specific user IDs for consistency.
  • Install CycleCloud dependencies like Chef-required packages (xfsdump, cryptsetup, etc.).
  • Adjust OS repositories to enable only those needed in your locked environment.

Generalize and Capture the Image

  • Run waagent --deprovision+user --force to clean user data.
  • Use Azure CLI to generalize the VM and capture it as a managed image.
  • Remember to select a shared image gallery, not just a managed image, for CycleCloud compatibility.

3. Modify Your CycleCloud Template

  • Prevent CycleCloud from reinstalling Slurm by setting slurm.do_install = false in your cluster template.
  • Update user IDs in the template to match your custom image setup.
  • Import the updated template and create your cluster using the custom image resource ID.

Why It Matters

This method drastically reduces cluster node startup times by skipping redundant Slurm installs. Also, it ensures your HPC environment stays secure without internet access. If you can briefly connect to the internet during initial startup, CycleCloud caches project data, smoothing future launches.

“If you want to know more about that command… Deprovision or generalize a VM before creating an image.”

For CycleCloud versions earlier than 8.7.1, a patch resolves issues with the slurm.do_install = false flag. Newer versions include this fix by default.

Final Thoughts

Building a custom Slurm image for Azure CycleCloud is a game-changer for HPC admins aiming for speed and security. Follow the step-by-step guide, tweak your templates, and enjoy faster, locked-down cluster deployments. This is a must-try for anyone running HPC workloads on Azure.

  • Pre-install Slurm and required packages on an Almalinux HPC VM to speed up cluster node start-up times.
  • Configure locked-down environments by enabling only essential OS repositories for package management.
  • Generalize and capture a custom VM image in Azure, ensuring compatibility with CycleCloud cluster creation.
  • Modify CycleCloud templates to disable Slurm installation, leveraging the custom image for faster deployment.
  • Apply patches for CycleCloud versions ≤8.7.1 to resolve issues with disabling Slurm installation during cluster setup.
  • From the New blog articles in Microsoft Community Hub



    Related Posts
    Unlock New Possibilities with Windows Server Devices in Intune!

      Windows Server Devices Now Recognized as a New OS in Intune Microsoft has announced that Windows Server devices are Read more

    Unlock the Power of the Platform: Your Guide to Power Platform at Microsoft Ignite 2022

    Microsoft Power Platform is leading the way in AI-generated low-code app development. With the help of AI, users can quickly Read more

    Unlock the Power of Microsoft Intune with the 2210 October Edition!

    Microsoft Intune is an enterprise mobility management platform that helps organizations manage mobile devices, applications, and data. The October edition Read more

    Unlock the Power of Intune 2.211: What’s New for November!

    Microsoft Intune has released its November edition, featuring new updates to help IT admins better manage their organization’s mobile devices. Read more