This may sound bizarre, but what it actually is, is exactly what ChatGPT’s client does. This will essentially cut out the first messages in a chat history as it expands, so it can fit inside he offered context window. As it is right now, it is VERY limiting. I have ran out of space to use the o3-mini model as well as others due to this very problem.
Furthermore, this limiter as it is right now only appears for me once I clear my cache, so I have no idea I have hit it until I run into streaming errors or clear my cache. I have attached an image for context as to what occurs, resulting in limits, when I need to use o3-mini still.
Appreciate all the work, Theo, and I hope this can be implemented as well! Perhaps make it a toggle if you do not want to implement it? (Note: Simply allot ~8-16K tokens of headroom for responses, and make it adaptive is need be.)

Please authenticate to join the conversation.
Gathering Interest
Feature Request
Get notified by email when there are changes.
Gathering Interest
Feature Request
Get notified by email when there are changes.