Frequently Asked Question
Hardware Specifications
The MAC card lineup consists of high-performance computing nodes. Each node is equipped with 8 GPUs, designed for parallel processing and large-scale inference tasks.
How to use the AI accelerators
Accessing Nodes
Log in to the cluster via SSH.
Allocate a MAC node using the Slurm scheduler. For an interactive session on the node, execute the following command:srun -pmac -N1 --pty bash
Wait for SLURM to allocate one free node on the mac partition - you will be logged into the node.
Load the environment modules neccesary for MACx GPUs with this command:
module load MAC
Load the Python environment where custom Python wheels are installed by running the source command:
source /usr/local/software/MAC/3.3/macenv/bin/activate
Supported software
PyTorch 2.6.0
SGLand 0.5.9
vLLM 0.14.1
bitsandbytes-0.42.0
flash_attn 2.6.3
transformers (version depending on model)
LlamaFactory
HIP
Other supported software can be listed with:
Load the necessary software environment and modules by running the source command:
uv pip freeze
Supported models and execution commands
- Recommend `transformers==5.3.0`
- Gemma4 `transformers==5.5.1`
Some pre-downloaded models are located at /ai/models/
Other models should also work fine, these are just tested and validated. If you need some other models, let us know. We can also provide API access to these models. You can use SSH tunnels and port forwarding via SSH if you want to connect to your local tools.
Qwen3-30B-A3B and Qwen3-30B-A3B-Instruct-2507
model link: https://huggingface.co/Qwen/Qwen3-30B-A3BHIP_VISIBLE_DEVICES=0,1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/Qwen3-30B-A3B-Instruct-2507 --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm
GaMS3-12B-Instruct
model link: https://huggingface.co/cjvt/GaMS3-12B-InstructHIP_VISIBLE_DEVICES=0,1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/GaMS3-12B-Instruct --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0
Qwen3.5-35B-A3B
model link: https://huggingface.co/Qwen/Qwen3.5-35B-A3BROCBLAS_USE_HIPBLASLT=1 HIP_VISIBLE_DEVICES=0,1 SGLANG_ENABLE_SPEC_V2=1 python -m sglang.launch_server --model /ai/models/Qwen3.5-35B-A3B --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm --attention-backend fa3 --mamba-scheduler-strategy extra_buffer --mem-fraction-static 0.9
MiniMax-M2.5-W8A8 / MiniMax-M2.7-W8A8
model link: https://www.modelscope.com/models/metax-tech/MiniMax-M2.7-W8A8ROCBLAS_USE_HIPBLASLT=1 SGLANG_USE_AITER=1 HIP_GRAPH_DEC_REFCOUNT_CALLBACK=1 python -m sglang.launch_server --model /ai/models/MiniMax-M2.7-W8A8 --tp 8 --pp-size 1 --dtype bfloat16 --trust-remote-code --moe-runner-backend hip_asm --port 8000 --host 0.0.0.0 --quantization w8a8_int8 --tool-call-parser minimax-m2 --reasoning-parser minimax-append-think
Qwen3.5-122B-A10B-W8A8
model link: https://www.modelscope.com/models/metax-tech/Qwen3.5-122B-A10B-W8A8
HIP_VISIBLE_DEVICES=0,1,2,3 ROCBLAS_USE_HIPBLASLT=1 SGLANG_ENABLE_SPEC_V2=1 SGLANG_USE_AITER=1 HIP_GRAPH_DEC_REFCOUNT_CALLBACK=1 python -m sglang.launch_server --model /ai/models/Qwen3.5-122B-A10B-W8A8 --tp 4 --dtype bfloat16 --port 8000 --moe-runner-backend hip_asm --attention-backend fa3 --mamba-scheduler-strategy extra_buffer --quantization w8a8_int8 --mem-fraction-static 0.9
gemma-4-26B-A4B-it
model link: https://huggingface.co/google/gemma-4-26B-A4B-it
tested on transformers==5.5.1HIP_VISIBLE_DEVICES=2,3 ROCBLAS_USE_HIPBLASLT=1 SGLANG_USE_AITER=1 python -m sglang.launch_server --model /ai/models/gemma-4-26B-A4B-it --tp 2 --dtype bfloat16 --port 8000 --host 0.0.0.0 --moe-runner-backend hip_asm