Chaoyu Wang

王超宇

AI Researcher LLM Alignment × Systems

About

Hi, I'm Chaoyu 👋, Founder at MAL.LAB and independent ML researcher — interested in building AI systems that are more capable, more reliable, and genuinely beneficial to humanity.

I received an M.S. from Northwestern Logo Northwestern University and a B.S. from UCSD Logo UC San Diego, both in Applied Mathematics — two places I am deeply grateful for. I was fortunate to be mentored by Prof. Zhaoran Wang at Northwestern and Prof. Ioana Dumitriu at UCSD, both of whom shaped how I think about research. My previous work spans LLM fine-tuning and alignment, retrieval-augmented generation, and synthetic data construction.

You can find more about my background in my CV.

🔬 Interests: Agentic Reinforcement Learning, Long-Horizon Decision-Making & Credit Assignment, Self-Evolving Agents & Environments, Scalable Agent Data Synthesis, Efficient Training & Inference Systems.

I am applying to CS PhD programs for Fall 2027, and actively looking for long-term Research Assistant positions in the meantime.

More than a position, I am looking for the right fit — a lab that takes its time with ideas, a collaborator who wants to build something meaningful over the long run, or a mentor genuinely invested in helping someone learn to think independently. I believe this kind of match has to go both ways.

I am available to work fully onsite for six months or more, and I take that commitment seriously — good research takes time, and I am not looking to pass through.

If any of this resonates, I would love to talk: email · calendly

Latest News

2026.2

Launchpad S1 successfully held in Shanghai

🚀 Co-organized Launchpad S1 with MAL.LAB — a product launch and go-to-market event in Minhang, Shanghai. Brought together founders and builders, sponsored by DeepTech. From idea to execution in two months.

2025.6

Graduated from Northwestern University

🎓 Completed M.S. in Engineering Science & Applied Mathematics at Northwestern University.

2024.6

Graduated from UC San Diego

🎓 Completed B.S. in Applied Mathematics at UC San Diego.

Selected Projects

Check out my latest work

Syncopate_Async_AgenticRL

Jun 2026 – Present

Head-to-head study of synchronous vs fully-async agentic RL training on verl, featuring multi-turn tool-calling GRPO, long-tail rollout profiling, and staleness/partial-rollout ablations. Quantifies when asynchronous training pays off — and at what scale the crossover arrives.

Github

Darkroom_VeRL-Omni

Jun 2026 – Present

Diffusion RL post-training on verl: Flow-GRPO on Qwen-Image where rollouts are denoising trajectories, evidence-based upstream reconnaissance, and a CPU OCR reward replacing the VLM judge with 6.9× headroom — the full generate-then-grade loop validated on one RTX 5090.

Github

Switchboard_MoE-DeepEP

May – Jun 2026

Hand-built MoE expert parallelism: routing-skew capture from DeepSeek-V2-Lite, fused grouped-GEMM Triton kernels (1.28×), 2-all2all dispatch/combine, and overlap ablations on 2×H100 — proving by counter-example why DeepEP exists.

Github

Halftone_QuantizedKVCache

May 2026

Data-driven INT8 KV-cache quantization: SQNR-guided granularity, near-lossless PPL (+0.2%), 0.50× memory, and a fused Triton int8 decode-attention kernel at 1.56× — with the gap to the 2× ceiling fully explained.

Github

Laminar_GPU-Bubble-Lab

Jun 2026

Profile-first CUDA microbenchmark lab for killing GPU pipeline bubbles: multi-stream overlap (1.69×), CUDA Graphs vs torch.compile, and speculative-decoding overlap (1.44×) — every mechanism Nsight-verified on sm_120.

Github

AdCampaignAgent-SFT

Jan 2026 – Present

Rule-based synthetic SFT dataset for mobile game UA agents, featuring tool-calling, multi-turn reasoning chains, and ROAS/Retention safety baselines. Fine-tuned Qwen3-0.6B with LoRA, achieving 86%+ end-to-end task completion.

Github

HuggingFace

Research

Publications

View the full list of my publications on Google Scholar

Dissecting Conditional Injection in Diffusion Transformers for Compositional 3D Scene Generation

In Preparation · Targeting ICLR 2027

A mechanistic study of how diffusion transformers consume auxiliary conditions in compositional 3D scene generation.

Authors: Ching-Yuen Huang, Chaoyu Wang

3D Gen

DiT

arXiv

Skills

Python

PyTorch

JavaScript

SQL

SFT

LoRA / PEFT

GRPO

RAG

Contrastive Learning

Dense Retrieval

Reranking

Synthetic Data Construction

FastAPI

Next.js

React

PostgreSQL

Milvus

Elasticsearch

Docker

Education

Northwestern University

2024 - 2025

M.S. in Engineering Science & Applied Mathematics

UC San Diego

2022 - 2024

B.S. in Applied Mathematics

Work Experience

Contact

Get in Touch

Want to chat? Feel free to reach out via email or Zoom →

• Ask questions
• Explore collaboration opportunities

Chaoyu Wang

About

Latest News

Launchpad S1 successfully held in Shanghai

Graduated from Northwestern University

Graduated from UC San Diego

Check out my latest work

Syncopate_Async_AgenticRL

Darkroom_VeRL-Omni

Switchboard_MoE-DeepEP

Halftone_QuantizedKVCache

Laminar_GPU-Bubble-Lab

AdCampaignAgent-SFT

Publications

Dissecting Conditional Injection in Diffusion Transformers for Compositional 3D Scene Generation

Skills

Education

Northwestern University

UC San Diego

Work Experience

SGLang

vLLM

MAL.LAB

Get in Touch