Xinyu Dong 董新宇
AI Infra Engineer at Baidu. Core contributor to the vLLM-Kunlun community, building high-performance LLM inference engines for Kunlun XPU.
vLLMvLLM-KunlunXGrammarLLM InferenceKunlun XPU
OPEN SOURCE
Contributions & Work
Based on actual GitHub pull request records. Click each project to view details.
Community-maintained vLLM hardware plugin for Kunlun XPU. Supports 15+ mainstream LLMs with quantization, LoRA, and multi-modal capabilities. 390 Stars on GitHub.
🧠Model Support7
Merged
#233Support qwen3-next model in v0.15.1
Merged
#84DeepSeek Support MTP (Multi-Token Prediction)
Merged
#62Support XiaoMi MIMO Flash V2
Merged
#71Support gpt-oss and update model list
Merged
#19Support llama3 on v0.11.0
Merged
#261Recover use reshape and cache kernel to update mamba cache
Open
#195DeepSeekV2 Add indexer_rotary_emb to control MLA rope style
⚡Kernel & Performance5
🔄v0.15.1 Major Upgrade6
Merged
#209Implement and register Fused MoE Kunlun kernels using OOT method
Merged
#227Partially supports torch compile
Merged
#212Remove V0 code and fix circular reference
Merged
#203Unified the registration of custom operators to torch.ops
Merged
#201Fixed Kunlun plugin initialization failure due to circular references
Merged
#202Optimize utils, remove VLLM_USE_V1 check, correct dependency sources
🔧Bugfix6
Merged
#288Fixed MiniMax-M2 parser failed to validate function names
Merged
#285enable_thinking: False, Qwen3.5 model returns error in content stream
Merged
#262Fix function call calling xgrammar failed
Merged
#252Fix cache indices problem for Qwen3.5-Moe
Merged
#231Fixed the distributed environment initialization issue
Merged
#193Fixed Kunlun Graph Failed
CONNECT
Get In Touch
Interested in AI inference, vLLM contributions, or hardware-software co-design? Feel free to reach out.
GitHub
xyDong0223
Email
[email protected]
Documentation
vllm-kunlun.readthedocs.io
Slack
vllm-kunlun community
© 2026 Xinyu Dong · Built with React + Vite