vitalik.eth@VitalikButerin@dphnAI @Alibaba_Qwen Aaaaah I might have found the issue, I think it's being partially offloaded to CPU despite the model technically being under 24 GB I guess I should need either the 9B or a 3-bit quant of the 35b?