Unsloth's extreme compression of the 753B model using GLM-5.2 enables smooth local deployment and operation on Mac.

2026-06-25 06:54:23

According to CoinWorld, Unsloth AI announced that it has compressed the size of Zhipu AI's 753B parameter large model GLM-5.2 by more than 80% using dynamic quantization technology, and released a GGUF format version that supports local deployment on Mac. Through dynamic 1-bit and 2-bit quantization, the original 1.51 TB model can be reduced to 217 GB (1-bit variant) to 239 GB (2-bit variant), allowing ordinary developers and small and medium-sized enterprises to deploy and run it locally offline using only a single Mac Studio. The quantized version achieved a smooth speed of 21.6 tokens/s on a Mac Studio M3 Ultra (256 GB unified memory) device, while retaining 76% to 82% of the original model's accuracy. Currently, the GLM-5.2 GGUF weights are available for download on the Hugging Face platform, and users can load and run them directly through llama.cpp or Unsloth Studio.

Disclaimer: This article is copyrighted by the original author and does not represent MyToken’s views and positions. If you have any questions regarding content or copyright, please contact us.(www.mytokencap.com)contact

About MyToken:https://www.mytokencap.com/en/aboutusArticle Link:https://www.mytokencap.com/en/choicenews/3346360.html

More exciting content is available on
X(https://x.com/MyTokencap)or join the community to learn more:MyToken-English Telegram Group
（https://t.me/mytokenGroup）

Unsloth&#39;s extreme compression of the 753B model using GLM-5.2 enables smooth local deployment and operation on Mac.

Unsloth's extreme compression of the 753B model using GLM-5.2 enables smooth local deployment and operation on Mac.