mt logoMyToken
ETH Gas
EN

Unsloth's extreme compression of the 753B model using GLM-5.2 enables smooth local deployment and operation on Mac.

2026-06-25 06:54:23
Shareshare
According to CoinWorld, Unsloth AI announced that it has compressed the size of Zhipu AI's 753B parameter large model GLM-5.2 by more than 80% using dynamic quantization technology, and released a GGUF format version that supports local deployment on Mac. Through dynamic 1-bit and 2-bit quantization, the original 1.51 TB model can be reduced to 217 GB (1-bit variant) to 239 GB (2-bit variant), allowing ordinary developers and small and medium-sized enterprises to deploy and run it locally offline using only a single Mac Studio. The quantized version achieved a smooth speed of 21.6 tokens/s on a Mac Studio M3 Ultra (256 GB unified memory) device, while retaining 76% to 82% of the original model's accuracy. Currently, the GLM-5.2 GGUF weights are available for download on the Hugging Face platform, and users can load and run them directly through llama.cpp or Unsloth Studio.
Disclaimer: This article is copyrighted by the original author and does not represent MyToken’s views and positions. If you have any questions regarding content or copyright, please contact us.(www.mytokencap.com)contact
More exciting content is available on
X(https://x.com/MyTokencap)
or join the community to learn more:MyToken-English Telegram Group
https://t.me/mytokenGroup