User Guide Overview¶

This project is intentionally small. Most user-facing behavior comes from the interaction between vllm-omni, vllm-metax, and the runtime environment.

How activation works¶

The plugin entry point is:

vllm_omni_metax.plugin:metax_omni_platform_plugin

At startup, it follows this logic:

If VLLM_OMNI_METAX_DISABLE is set, the plugin stays disabled.
If VLLM_OMNI_METAX_FORCE is set, the plugin activates immediately.
Otherwise, it asks vllm-metax to probe the MetaX runtime through pymxsml.
During activation, it may install MetaX-specific runtime patches needed by vllm-omni 0.20.0 model paths.
Only when the runtime probe succeeds does it register the Omni platform class.

This keeps runtime ownership with the MetaX backend while allowing a narrow compatibility fix where upstream CUDA-only imports would otherwise block MetaX.

Runtime patch layer¶

The new patch layer is intentionally narrow:

It is enabled by default.
It can be disabled with VLLM_OMNI_METAX_DISABLE_PATCHES=1.
It installs a shim for the rotary embedding import path used by Omni diffusion/image code.
Its current purpose is to support Qwen3-Omni and Qwen-Image-Edit-2511 on the 0.20.0 stack.

Runtime behavior¶

MetaxOmniPlatform inherits from both:

vllm_omni.platforms.interface.OmniPlatform
vllm_metax.platform.MacaPlatform

That means the plugin reuses the concrete MetaX hardware implementation instead of creating a parallel backend.

Device visibility¶

During Omni stage setup, the plugin keeps environment variables synchronized:

CUDA_VISIBLE_DEVICES
MACA_VISIBLE_DEVICES

For operators, this means stage-level worker placement is easier to reason about, especially in multi-device debugging sessions.

Attention backend policy¶

For diffusion attention, the plugin deliberately keeps a conservative policy:

If the selected backend is explicitly requested and supported, it is used.
If FLASH_ATTN is requested but capability or package checks fail, the plugin falls back to TORCH_SDPA.
If no backend is specified, it prefers FLASH_ATTN only when both hardware capability and package availability checks pass.

This keeps behavior close to the upstream GPU-oriented policy while avoiding aggressive assumptions on MetaX systems.

Operational tips¶

Start with automatic detection first and only use VLLM_OMNI_METAX_FORCE=1 when isolating startup problems.
Keep VLLM_OMNI_METAX_DISABLE_PATCHES=1 for A/B debugging only, not as the default deployment mode for 0.20.0.
Treat vllm-metax as the source of truth for runtime health.
Keep version combinations stable across vllm-metax, vllm-omni, and this repository.
If Omni behavior changes after an upstream upgrade, re-verify platform plugin discovery before debugging model logic.