CS336: Language Modeling from Scratch (Spring 2026)

Stanford CS336 课程笔记，基于 2026 Spring 学期 lecture 视频生成。

Lecture 1: Overview & Tokenization — 课程总览、大模型演进与分词器
Lecture 2: PyTorch、einops 与资源核算 — PyTorch 实战、einops 抽象与计算/内存预算
Lecture 3: Architectures — Transformer 架构变体与设计选择
Lecture 4: 注意力替代方案与混合专家模型 — 线性注意力、SSM、MoE
Lecture 5: GPUs & TPUs — GPU/TPU 硬件模型、Flash Attention、Tiling
Lecture 6: GPU Kernel 编程 — 从硬件原理到 Triton 实战
Lecture 7: 分布式并行训练 — 集合通信到多维并行策略
Lecture 8: 大规模并行训练 — 从 DDP 到 4D 并行
Lecture 9: Scaling Laws — 大语言模型的工程预测框架
Lecture 10: LLM 推理优化 — 从算术强度到系统设计
Lecture 11: Scaling Laws 实战 — muP、WSD、Muon 与超参数控制
Lecture 12: 评估 — Perplexity、benchmark 与 LLM-as-judge
Lecture 13: 数据来源与数据集 — 互联网爬取、版权与预训练数据集演进
Lecture 14: 数据处理与数据混合 — 转换、过滤、去重、MinHash/LSH 与数据配比
Lecture 15: 中训练与后训练 — SFT、RLHF、DPO/PPO 与 post-training 数据
Lecture 16: RLVR 与推理模型 — PPO、GRPO、DeepSeek R1 与可验证奖励强化学习
Lecture 17: 多模态模型 — CLIP、SigLIP、LLaVA、Qwen-VL、Chameleon 与 VLM 架构
Guest Lecture: Dan Fu — LLM 推理系统从零构建 — Token 生命周期、Continuous Batching、KV Cache、Prefill-Decode 分离、Megakernel 与循环 Transformer

评论