Frequently Asked Question

MACx AI Accelerators

Last Updated 20 days ago

Hardware Specifications
The MAC card lineup consists of high-performance computing nodes. Each node is equipped with 8 GPUs, designed for parallel processing and large-scale inference tasks.
How to use the AI accelerators

Accessing Nodes

Log in to the cluster via SSH.

Allocate a MAC node using the Slurm scheduler. For an interactive session on the node, execute the following command:
srun -pmac -N1 --pty bash

Wait for SLURM to allocate one free node on the mac partition - you will be logged into the node.

Load the environment modules neccesary for MACx GPUs with this command:

module load MAC

Load the Python environment where custom Python wheels are installed by running the source command:
source /usr/local/software/MAC/3.3/macenv/bin/activate

Supported software
PyTorch 2.6.0
SGLand 0.5.9
vLLM 0.14.1
bitsandbytes-0.42.0
flash_attn 2.6.3

transformers (version depending on model)
LlamaFactory
HIP

Other supported software can be listed with:
Load the necessary software environment and modules by running the source command:
uv pip freeze

Supported models and execution commands

- Recommend `transformers==5.3.0`
- Gemma4 `transformers==5.5.1`

Some pre-downloaded models are located at /ai/models/
Other models should also work fine, these are just tested and validated. If you need some other models, let us know. We can also provide API access to these models. You can use SSH tunnels and port forwarding via SSH if you want to connect to your local tools.

Qwen3-30B-A3B and Qwen3-30B-A3B-Instruct-2507
model link: https://huggingface.co/Qwen/Qwen3-30B-A3B

HIP_VISIBLE_DEVICES=0,1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/Qwen3-30B-A3B-Instruct-2507 --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm

GaMS3-12B-Instruct
model link: https://huggingface.co/cjvt/GaMS3-12B-Instruct

HIP_VISIBLE_DEVICES=0,1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/GaMS3-12B-Instruct --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0

Qwen3.5-35B-A3B

model link: https://huggingface.co/Qwen/Qwen3.5-35B-A3B

ROCBLAS_USE_HIPBLASLT=1 HIP_VISIBLE_DEVICES=0,1 SGLANG_ENABLE_SPEC_V2=1 python -m sglang.launch_server --model /ai/models/Qwen3.5-35B-A3B --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm --attention-backend fa3 --mamba-scheduler-strategy extra_buffer --mem-fraction-static 0.9

MiniMax-M2.5-W8A8 / MiniMax-M2.7-W8A8

model link: https://www.modelscope.com/models/metax-tech/MiniMax-M2.7-W8A8

ROCBLAS_USE_HIPBLASLT=1 SGLANG_USE_AITER=1 HIP_GRAPH_DEC_REFCOUNT_CALLBACK=1 python -m sglang.launch_server --model /ai/models/MiniMax-M2.7-W8A8 --tp 8 --pp-size 1 --dtype bfloat16 --trust-remote-code --moe-runner-backend hip_asm --port 8000 --host 0.0.0.0 --quantization w8a8_int8 --tool-call-parser minimax-m2 --reasoning-parser minimax-append-think

Qwen3.5-122B-A10B-W8A8

model link: https://www.modelscope.com/models/metax-tech/Qwen3.5-122B-A10B-W8A8

HIP_VISIBLE_DEVICES=0,1,2,3 ROCBLAS_USE_HIPBLASLT=1 SGLANG_ENABLE_SPEC_V2=1 SGLANG_USE_AITER=1 HIP_GRAPH_DEC_REFCOUNT_CALLBACK=1 python -m sglang.launch_server --model /ai/models/Qwen3.5-122B-A10B-W8A8 --tp 4 --dtype bfloat16 --port 8000 --moe-runner-backend hip_asm --attention-backend fa3 --mamba-scheduler-strategy extra_buffer --quantization w8a8_int8 --mem-fraction-static 0.9

gemma-4-26B-A4B-it

model link: https://huggingface.co/google/gemma-4-26B-A4B-it
tested on transformers==5.5.1
HIP_VISIBLE_DEVICES=2,3 ROCBLAS_USE_HIPBLASLT=1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/gemma-4-26B-A4B-it --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm

Frequently Asked Question

Loading ...