@dphnAI @Alibaba_Qwen Aaaaah I might have found the issue, I think it's being partially offloaded to CPU despite the model technically being under 24 GB
I guess I should need either the 9B or a 3-bit quant of the 35b?
Disclaimer: This article is copyrighted by the original author and does not represent MyToken’s views and positions. If you have any questions regarding content or copyright, please contact us.(www.mytokencap.com)contact