According to Qwen, these are the recommended settings for inference:
Temperature of 0.6
Top_K of 40 (or 20 to 40)
Min_P of 0.1 (optional, but works well)
Top_P of 0.95
Repetition Penalty of 1.0. (1.0 means disabled in llama.cpp and transformers)
Chat template: <|im_start|>user\nCreate a Flappy Bird game in Python.<|im_end|>\n<|im_start|>assistant\n<think>\n
https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively
This model would be a lot better with default settings versus what I have tested locally. Thank you for the wonderful product
Please authenticate to join the conversation.
Completed
Bug Reports
Get notified by email when there are changes.
Completed
Bug Reports
Get notified by email when there are changes.