Currently models like Grok 4.1, Gpt 5.1 are smarter in their respective app offerings (i.e. grok.com and chatgpt respectively) than in their API. This is because by default the API offerings do not provide default tool calling features during the “Thinking” phase. You must manually set them up in the API
Note: even if you enable web search in t3 chat or in open-router, these searches are done outside the thinking phase. I believe that t3 chat for GPT 5 does (atleaset as of early Oct 2025, not sure if changed):
<thinking>
<web search>
<thinking>
OUTPUT
This is inferior to what chaptgpt does, which is search when ever it feels the need to, react to its own search, call search or code exec based on prev search results, and all this as many times and whenever it wants
Where as t3 chat’s approach just has 1 phase where search can be done, and one one extra phase to think based on the results (no more searches after this)
I think out of all the model aggregators only perplexity does tool call during thinking (both search and code exec).
For users like me who are willing to pay for our occasional top 10% hard prompts, I dont want to compromise on the intelligence of the model.
I would happily spend $8 multiple times a month if you charge me based on usage for Gpt-5.1-pro IFF I know that im getting the same quality (or nearly similar) as the chatgpt offering.
Please authenticate to join the conversation.
Closed
Feature Request
Get notified by email when there are changes.
Closed
Feature Request
Get notified by email when there are changes.