Yx’s Blog
- Studies
- R3GAN: A Modern Baseline GAN
- DeepSeek LLM 在 2024 年的进展
- SGD with Large Step Sizes Learns Sparse Features
- Decoupled Weight Decay Regularization
- Improved Training and Scaling Strategies
- A ConvNet for the 2020s
- Battle of the Backbones: A Large-Scale Comparison of Pretrained Models
- Is Cosine-Similarity of Embeddings Really About Similarity?
- Circle Loss A Unified Perspective of Pair Similarity Optimization
- Transformer 为何取得成功
- In-Context Learning as Implicit Bayesian Inference
- L1 正则为何导致稀疏性
- Large Language Models Cannot Self-Correct Reasoning Yet
- Deep Double Descent Where Bigger Models and More Data Hurt
- Posts