User Guide Overview¶
This project is intentionally small. Most user-facing behavior comes from the
interaction between vllm-omni, vllm-metax, and the runtime environment.
How activation works¶
The plugin entry point is:
At startup, it follows this logic:
- If
VLLM_OMNI_METAX_DISABLEis set, the plugin stays disabled. - If
VLLM_OMNI_METAX_FORCEis set, the plugin activates immediately. - Otherwise, it asks
vllm-metaxto probe the MetaX runtime throughpymxsml. - During activation, it may install MetaX-specific runtime patches needed by
vllm-omni 0.20.0model paths. - Only when the runtime probe succeeds does it register the Omni platform class.
This keeps runtime ownership with the MetaX backend while allowing a narrow compatibility fix where upstream CUDA-only imports would otherwise block MetaX.
Runtime patch layer¶
The new patch layer is intentionally narrow:
- It is enabled by default.
- It can be disabled with
VLLM_OMNI_METAX_DISABLE_PATCHES=1. - It installs a shim for the rotary embedding import path used by Omni diffusion/image code.
- Its current purpose is to support
Qwen3-OmniandQwen-Image-Edit-2511on the0.20.0stack.
Runtime behavior¶
MetaxOmniPlatform inherits from both:
vllm_omni.platforms.interface.OmniPlatformvllm_metax.platform.MacaPlatform
That means the plugin reuses the concrete MetaX hardware implementation instead of creating a parallel backend.
Device visibility¶
During Omni stage setup, the plugin keeps environment variables synchronized:
CUDA_VISIBLE_DEVICESMACA_VISIBLE_DEVICES
For operators, this means stage-level worker placement is easier to reason about, especially in multi-device debugging sessions.
Attention backend policy¶
For diffusion attention, the plugin deliberately keeps a conservative policy:
- If the selected backend is explicitly requested and supported, it is used.
- If
FLASH_ATTNis requested but capability or package checks fail, the plugin falls back toTORCH_SDPA. - If no backend is specified, it prefers
FLASH_ATTNonly when both hardware capability and package availability checks pass.
This keeps behavior close to the upstream GPU-oriented policy while avoiding aggressive assumptions on MetaX systems.
Operational tips¶
- Start with automatic detection first and only use
VLLM_OMNI_METAX_FORCE=1when isolating startup problems. - Keep
VLLM_OMNI_METAX_DISABLE_PATCHES=1for A/B debugging only, not as the default deployment mode for0.20.0. - Treat
vllm-metaxas the source of truth for runtime health. - Keep version combinations stable across
vllm-metax,vllm-omni, and this repository. - If Omni behavior changes after an upstream upgrade, re-verify platform plugin discovery before debugging model logic.