Use the recommended settings for QwQ 32B to avoid endless thinking loops

According to Qwen, these are the recommended settings for inference:

  • Temperature of 0.6

  • Top_K of 40 (or 20 to 40)

  • Min_P of 0.1 (optional, but works well)

  • Top_P of 0.95

  • Repetition Penalty of 1.0. (1.0 means disabled in llama.cpp and transformers)

  • Chat template: <|im_start|>user\nCreate a Flappy Bird game in Python.<|im_end|>\n<|im_start|>assistant\n<think>\n

https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively
This model would be a lot better with default settings versus what I have tested locally. Thank you for the wonderful product

Please authenticate to join the conversation.

Upvoters
Status

Completed

Board
🐛

Bug Reports

Subscribe to post

Get notified by email when there are changes.