Xinyu Dong 董新宇

AI Infra Engineer at Baidu. Core contributor to the vLLM-Kunlun community, building high-performance LLM inference engines for Kunlun XPU.

vLLMvLLM-KunlunXGrammarLLM InferenceKunlun XPU

OPEN SOURCE

Contributions & Work

Based on actual GitHub pull request records. Click each project to view details.

Community-maintained vLLM hardware plugin for Kunlun XPU. Supports 15+ mainstream LLMs with quantization, LoRA, and multi-modal capabilities. 390 Stars on GitHub.

🧠Model Support7

Merged

#233Support qwen3-next model in v0.15.1

Merged

#84DeepSeek Support MTP (Multi-Token Prediction)

Merged

#62Support XiaoMi MIMO Flash V2

Merged

#71Support gpt-oss and update model list

Merged

#19Support llama3 on v0.11.0

Merged

#261Recover use reshape and cache kernel to update mamba cache

Open

#195DeepSeekV2 Add indexer_rotary_emb to control MLA rope style

⚡Kernel & Performance5

Merged

#277Use kernel to fast GemmaRMSNorm

Merged

#265Reduce Host and device sync in Qwen3.5

Merged

#224Register custom_op for kunlun graph (torch compile)

Merged

#177Migrate XTorch operations to Kunlun operations (accelerating iteration)

Merged

#3Enable fast random sample on Kunlun3 Platform

🔄v0.15.1 Major Upgrade6

Merged

#209Implement and register Fused MoE Kunlun kernels using OOT method

Merged

#227Partially supports torch compile

Merged

#212Remove V0 code and fix circular reference

Merged

#203Unified the registration of custom operators to torch.ops

Merged

#201Fixed Kunlun plugin initialization failure due to circular references

Merged

#202Optimize utils, remove VLLM_USE_V1 check, correct dependency sources

🔧Bugfix6

Merged

#288Fixed MiniMax-M2 parser failed to validate function names

Merged

#285enable_thinking: False, Qwen3.5 model returns error in content stream

Merged

#262Fix function call calling xgrammar failed

Merged

#252Fix cache indices problem for Qwen3.5-Moe

Merged

#231Fixed the distributed environment initialization issue

Merged

#193Fixed Kunlun Graph Failed

CONNECT

Get In Touch

Interested in AI inference, vLLM contributions, or hardware-software co-design? Feel free to reach out.

GitHub

xyDong0223

Email

[email protected]

Documentation

vllm-kunlun.readthedocs.io

Slack

vllm-kunlun community

Xinyu Dong 董新宇

Contributions & Work

vLLM-Kunlun

vLLM

XGrammar-XPU

Get In Touch