본문 바로가기

LLM

Microsoft BitNet

1-bit LLMs, representing an extreme yet promising form of model quantization where weights and potentially activations are constrained to binary {-1, +1} or ternary {-1, 0, +1}, offer a compelling solution to the efficiency challenges. 

 

 

 

https://github.com/microsoft/BitNet

 

GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs

Official inference framework for 1-bit LLMs. Contribute to microsoft/BitNet development by creating an account on GitHub.

github.com

 

  • Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
    • Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
    • Activations are quantized to 8-bit integers using absmax quantization (per-token).
    • Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.

 

실제로 테스트 해보니 반응 속도가 생각보다 빠르고 데스크탑 CPU의 사용량도 높지 않음.

 

'LLM' 카테고리의 다른 글

Gemini-cli  (0) 2025.07.05