1-bit LLMs, representing an extreme yet promising form of model quantization where weights and potentially activations are constrained to binary {-1, +1} or ternary {-1, 0, +1}, offer a compelling solution to the efficiency challenges.
https://github.com/microsoft/BitNet
GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs
Official inference framework for 1-bit LLMs. Contribute to microsoft/BitNet development by creating an account on GitHub.
github.com
- Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
- Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
- Activations are quantized to 8-bit integers using absmax quantization (per-token).
- Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.
실제로 테스트 해보니 반응 속도가 생각보다 빠르고 데스크탑 CPU의 사용량도 높지 않음.
'LLM' 카테고리의 다른 글
Gemini-cli (0) | 2025.07.05 |
---|