Frequently Asked Question

MACx AI Accelerators
Last Updated 20 days ago

Hardware Specifications
The MAC card lineup consists of high-performance computing nodes. Each node is equipped with 8 GPUs, designed for parallel processing and large-scale inference tasks.
How to use the AI accelerators

Accessing Nodes

Log in to the cluster via SSH.

Allocate a MAC node using the Slurm scheduler. For an interactive session on the node, execute the following command:
srun -pmac -N1 --pty bash

Wait for SLURM to allocate one free node on the mac partition - you will be logged into the node.

Load the environment modules neccesary for MACx GPUs with this command:

    module load MAC


    Load the Python environment where custom Python wheels are installed by running the source command:
    source /usr/local/software/MAC/3.3/macenv/bin/activate

    Supported software

    PyTorch   2.6.0
    SGLand    0.5.9
    vLLM        0.14.1
    bitsandbytes-0.42.0
    flash_attn 2.6.3

    transformers (version depending on model)
    LlamaFactory
    HIP

    Other supported software can be listed with:
    Load the necessary software environment and modules by running the source command:
    uv pip freeze

    Supported models and execution commands


    - Recommend `transformers==5.3.0`
    - Gemma4  `transformers==5.5.1`

    Some pre-downloaded models are located at /ai/models/
    Other models should also work fine, these are just tested and validated. If you need some other models, let us know. We can also provide API access to these models. You can use SSH tunnels and port forwarding via SSH if you want to connect to your local tools.


    Qwen3-30B-A3B
    and Qwen3-30B-A3B-Instruct-2507
    model link: https://huggingface.co/Qwen/Qwen3-30B-A3B

    HIP_VISIBLE_DEVICES=0,1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/Qwen3-30B-A3B-Instruct-2507 --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm

    GaMS3-12B-Instruct
    model link: https://huggingface.co/cjvt/GaMS3-12B-Instruct

    HIP_VISIBLE_DEVICES=0,1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/GaMS3-12B-Instruct --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0

    Qwen3.5-35B-A3B


    model link: https://huggingface.co/Qwen/Qwen3.5-35B-A3B

    ROCBLAS_USE_HIPBLASLT=1 HIP_VISIBLE_DEVICES=0,1 SGLANG_ENABLE_SPEC_V2=1 python -m sglang.launch_server --model /ai/models/Qwen3.5-35B-A3B --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm --attention-backend fa3 --mamba-scheduler-strategy extra_buffer --mem-fraction-static 0.9

    MiniMax-M2.5-W8A8 / MiniMax-M2.7-W8A8

    model link: https://www.modelscope.com/models/metax-tech/MiniMax-M2.7-W8A8


    ROCBLAS_USE_HIPBLASLT=1 SGLANG_USE_AITER=1 HIP_GRAPH_DEC_REFCOUNT_CALLBACK=1 python -m sglang.launch_server --model /ai/models/MiniMax-M2.7-W8A8 --tp 8 --pp-size 1 --dtype bfloat16 --trust-remote-code --moe-runner-backend hip_asm --port 8000 --host 0.0.0.0 --quantization w8a8_int8 --tool-call-parser minimax-m2 --reasoning-parser minimax-append-think

    Qwen3.5-122B-A10B-W8A8

    model link: https://www.modelscope.com/models/metax-tech/Qwen3.5-122B-A10B-W8A8

    HIP_VISIBLE_DEVICES=0,1,2,3 ROCBLAS_USE_HIPBLASLT=1 SGLANG_ENABLE_SPEC_V2=1 SGLANG_USE_AITER=1 HIP_GRAPH_DEC_REFCOUNT_CALLBACK=1 python -m sglang.launch_server --model /ai/models/Qwen3.5-122B-A10B-W8A8 --tp 4 --dtype bfloat16 --port 8000 --moe-runner-backend hip_asm --attention-backend fa3 --mamba-scheduler-strategy extra_buffer --quantization w8a8_int8 --mem-fraction-static 0.9

    gemma-4-26B-A4B-it


    model link: https://huggingface.co/google/gemma-4-26B-A4B-it
    tested on transformers==5.5.1
    HIP_VISIBLE_DEVICES=2,3 ROCBLAS_USE_HIPBLASLT=1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/gemma-4-26B-A4B-it --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm

    This website relies on temporary cookies to function, but no personal data is ever stored in the cookies.
    OK

    Loading ...