MoonshotAI has released Kimi K2 Thinking, the first reasoning variant of the Kimi K2 model family. This model boasts a total of 1 trillion parameters, 32 billion active parameters, utilizes INT4 precision, and has a size of approximately 594GB.
In the Artificial Analysis Tau²-Bench Telecom agent tool usage benchmark, Kimi K2 Thinking achieved a score of 93%, surpassing models like Claude 3.5 Sonnet and GPT-4o. This test simulates customer service scenarios and evaluates the model’s ability to use tools in long-cycle agent tasks.
Unlike the previously released K2 Instruct model which used FP8 precision, K2 Thinking natively employs INT4 precision. Moonshot utilized quantization-aware training techniques in the post-training phase. This choice has reduced the model size by nearly half, improving both inference and training efficiency. Some analysts believe that INT4 is more suitable for achieving efficiency gains on NVIDIA GPUs predating Blackwell.
However, the model’s actual performance cannot be fully reflected by a single benchmark. Some users have reported that K2 Thinking performed poorly in self-built Markov chain probability task tests, while Grok-4 and GPT series models were able to solve them successfully. This reminds us of the gap between benchmark scores and the ability to solve real-world problems.
Another noteworthy detail is that despite the massive model parameter scale, it…
