Skip to content

Optimize JSON property order for prompt caching #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
IvanLuchkin opened this issue Mar 16, 2025 · 9 comments
Open

Optimize JSON property order for prompt caching #316

IvanLuchkin opened this issue Mar 16, 2025 · 9 comments

Comments

@IvanLuchkin
Copy link

IvanLuchkin commented Mar 16, 2025

Context/motivation: ability to reduce costs while using the APIs via this SDK.

Image

In order to increase the chances of prompts hitting the cache, OpenAI suggests the following:

Image

As far as I understand, and based on experimentation and monitoring, caching works for the entirety of the content passed onto the LLM, and the structure of this content is inherited from the structure of the request body JSON.

While debugging the SDK I have found out that there is no straightforward way to control the structure of Responses API requests.

Example:

  • Developer message (static)
  • User message (static)
  • User message (dynamic)
  • Structured outputs schema (static)

Alternative structure to maximize the probability of cache hits:

  • Structured outputs schema (static)
  • Developer message (static)
  • User message (static)
  • User message (dynamic)

I have tried to call .text() function on ResponseCreateParams after everything else but it has no effect on the resulting request body.

Can there be a workaround or even functionality for this?

@TomerAberbach
Copy link
Collaborator

Thanks for the report! Can you provide:

  1. Java example code
  2. The resulting JSON that's sent from the example code
  3. The JSON you would expect instead

@IvanLuchkin
Copy link
Author

Java example code. I believe everything relevant is included here, if something is missing then I'm willing to provide more information

        ResponseCreateParams request = ResponseCreateParams.builder()
                .text(structuredOutputSchemaConfig)
                .inputOfResponse(listOfPrompts)
                .model(ChatModel.GPT_4O_MINI)
                .store(true)
                .build();

        Response response = OpenAIOkHttpClient.builder()
                .apiKey(apiKey)
                .build().responses().create(request);

Actual JSON

{
  "input": [
    {
      "content": [
        {
          "text": "Developer prompt text. Static content",
          "type": "input_text"
        }
      ],
      "role": "developer"
    },
    {
      "content": [
        {
          "text": "Phone call transcription. Dynamic content",
          "type": "input_text"
        }
      ],
      "role": "user"
    }
  ],
  "model": "gpt-4o-mini",
  "store": true,
  "text": {
    "format": {
      "schema": {
        "required": [
          "key1",
          "key2",
          "key3"
        ],
        "additionalProperties": false,
        "properties": {
          "key1": {
            "type": "string",
            "description": "some prompt"
          },
          "key2": {
            "type": "array",
            "description": "some prompt",
            "items": {
              "type": "string"
            }
          },
          "key3": {
            "type": "string",
            "description": "some prompt"
          }
        },
        "type": "object"
      },
      "type": "json_schema",
      "name": "someStaticSchema",
      "strict": true
    }
  },
}

Desired JSON

{
  "text": {
    "format": {
      "schema": {
        "required": [
          "key1",
          "key2",
          "key3"
        ],
        "additionalProperties": false,
        "properties": {
          "key1": {
            "type": "string",
            "description": "some prompt"
          },
          "key2": {
            "type": "array",
            "description": "some prompt",
            "items": {
              "type": "string"
            }
          },
          "key3": {
            "type": "string",
            "description": "some prompt"
          }
        },
        "type": "object"
      },
      "type": "json_schema",
      "name": "someStaticSchema",
      "strict": true
    }
  },
  "input": [
    {
      "content": [
        {
          "text": "Developer prompt text. Static content",
          "type": "input_text"
        }
      ],
      "role": "developer"
    },
    {
      "content": [
        {
          "text": "Phone call transcription. Dynamic content",
          "type": "input_text"
        }
      ],
      "role": "user"
    }
  ],
  "model": "gpt-4o-mini",
  "store": true,
}

For this particular case, I would want to just move the text content onto the start of the payload. But for different cases and applications, the setup of which content is dynamic and which is static will differ

@TomerAberbach
Copy link
Collaborator

I see, so we cannot statically pick a nice order of properties. Putting that aside...

I wonder if the docs are not saying that the JSON order matters. They might just be talking about the order of content within ordered stuff in the JSON (like strings and arrays). It would be pretty weird to me if the order of JSON keys mattered, since JSON objects are generally supposed to be unordered.

@kwhinnery-openai do you know if this prompt caching thing is referred to the content order or the actual JSON key order?

@IvanLuchkin
Copy link
Author

IvanLuchkin commented Mar 18, 2025

I agree that this may look confusing because of JSON specification stating that the order does not matter. But I have experimented with it myself from the very launch of Prompt Caching, using my own OpenAI client, where I could control the order of keys and I saw a significant difference in the amount of cache hits when supplying request content in the "desired" order above. (Note: described tests involved a payload order similar to my examples, but using Chat Completions API and function calling, NOT Responses API and Structured Outputs)

If OpenAI API truly doesn't take the order of request content into account, maximizing cache hits is problematic for anyone not using their custom clients.
If OpenAI API does take the request content into account, that means they have made some sort of decision that, say, "input" should always be before "text", which is weird because, again, cache hits are then less likely for a significant number of situations.

And, of course, the actual "stuff" in those parts of the request content that can get cached (prompts, schemas, function calls, etc.) is affected by the caching mechanism in a way that you would get more hits when static order is placed first. But the question is on a bit higher level here.

And referring to the docs - I would say that they are just ambiguous, covering what needs to be understood for simple cases only. More specifically, it uses word "prompt", and it is extremely unclear what exactly does it refer to. It is disclosed that multiple things can be cached, and all of them would be considered "prompt". My personal assumption is that everything in the request payload that in some way or another gets fed to the models is a prompt, which is validated by my tests.

TL;DR: Yes, JSON key order in input (messages), function call and structured output schemas matters. But if you have both input and some schema for structured outputs, depending on what changes frequently and what doesn't you will get different cache performance. Which led me to a conclusion that the order of JSON keys in the whole request payload JSON matters too.

@TomerAberbach
Copy link
Collaborator

Thanks for the context! I chatted with @kwhinnery-openai and he's going to look into what the behavior is and get back to you/us

@kwhinnery-openai
Copy link
Contributor

After a few internal discussions, I was able to confirm that the ordering of content in the JSON request body does impact prompt caching hits, and repeated content (not just in the input or messages array) should be provided at the beginning of a request payload.

From an SDK standpoint, we should try to optimize this, or minimally, provide developers granular control over the ordering of content when they build request payloads. We will need to spend time thinking through how to do this and document a solution.

@IvanLuchkin
Copy link
Author

@kwhinnery-openai amazing to hear this from you. Looking forward to seeing something that would help with this be implemented. I also want to mention that there may be a similar problem with other official OpenAI SDK's (python, js, etc.)

@TomerAberbach TomerAberbach changed the title [Question] Control the order of request content for Responses API Optimize JSON property order for prompt caching Mar 25, 2025
@IvanLuchkin
Copy link
Author

Hello! Are there any updates to this?

@TomerAberbach
Copy link
Collaborator

TomerAberbach commented Apr 30, 2025

I don't think we've decided when/how to prioritize this yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants