-
Notifications
You must be signed in to change notification settings - Fork 97
Optimize JSON property order for prompt caching #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report! Can you provide:
|
Java example code. I believe everything relevant is included here, if something is missing then I'm willing to provide more information
Actual JSON
Desired JSON
For this particular case, I would want to just move the |
I see, so we cannot statically pick a nice order of properties. Putting that aside... I wonder if the docs are not saying that the JSON order matters. They might just be talking about the order of content within ordered stuff in the JSON (like strings and arrays). It would be pretty weird to me if the order of JSON keys mattered, since JSON objects are generally supposed to be unordered. @kwhinnery-openai do you know if this prompt caching thing is referred to the content order or the actual JSON key order? |
I agree that this may look confusing because of JSON specification stating that the order does not matter. But I have experimented with it myself from the very launch of Prompt Caching, using my own OpenAI client, where I could control the order of keys and I saw a significant difference in the amount of cache hits when supplying request content in the "desired" order above. (Note: described tests involved a payload order similar to my examples, but using Chat Completions API and function calling, NOT Responses API and Structured Outputs) If OpenAI API truly doesn't take the order of request content into account, maximizing cache hits is problematic for anyone not using their custom clients. And, of course, the actual "stuff" in those parts of the request content that can get cached (prompts, schemas, function calls, etc.) is affected by the caching mechanism in a way that you would get more hits when static order is placed first. But the question is on a bit higher level here. And referring to the docs - I would say that they are just ambiguous, covering what needs to be understood for simple cases only. More specifically, it uses word "prompt", and it is extremely unclear what exactly does it refer to. It is disclosed that multiple things can be cached, and all of them would be considered "prompt". My personal assumption is that everything in the request payload that in some way or another gets fed to the models is a prompt, which is validated by my tests. TL;DR: Yes, JSON key order in |
Thanks for the context! I chatted with @kwhinnery-openai and he's going to look into what the behavior is and get back to you/us |
After a few internal discussions, I was able to confirm that the ordering of content in the JSON request body does impact prompt caching hits, and repeated content (not just in the From an SDK standpoint, we should try to optimize this, or minimally, provide developers granular control over the ordering of content when they build request payloads. We will need to spend time thinking through how to do this and document a solution. |
@kwhinnery-openai amazing to hear this from you. Looking forward to seeing something that would help with this be implemented. I also want to mention that there may be a similar problem with other official OpenAI SDK's (python, js, etc.) |
Hello! Are there any updates to this? |
I don't think we've decided when/how to prioritize this yet |
Context/motivation: ability to reduce costs while using the APIs via this SDK.
In order to increase the chances of prompts hitting the cache, OpenAI suggests the following:
As far as I understand, and based on experimentation and monitoring, caching works for the entirety of the content passed onto the LLM, and the structure of this content is inherited from the structure of the request body JSON.
While debugging the SDK I have found out that there is no straightforward way to control the structure of Responses API requests.
Example:
Alternative structure to maximize the probability of cache hits:
I have tried to call
.text()
function onResponseCreateParams
after everything else but it has no effect on the resulting request body.Can there be a workaround or even functionality for this?
The text was updated successfully, but these errors were encountered: