Optimize JSON property order for prompt caching #316

IvanLuchkin · 2025-03-16T20:05:38Z

Context/motivation: ability to reduce costs while using the APIs via this SDK.

In order to increase the chances of prompts hitting the cache, OpenAI suggests the following:

As far as I understand, and based on experimentation and monitoring, caching works for the entirety of the content passed onto the LLM, and the structure of this content is inherited from the structure of the request body JSON.

While debugging the SDK I have found out that there is no straightforward way to control the structure of Responses API requests.

Example:

Developer message (static)
User message (static)
User message (dynamic)
Structured outputs schema (static)

Alternative structure to maximize the probability of cache hits:

Structured outputs schema (static)
Developer message (static)
User message (static)
User message (dynamic)

I have tried to call .text() function on ResponseCreateParams after everything else but it has no effect on the resulting request body.

Can there be a workaround or even functionality for this?

The text was updated successfully, but these errors were encountered:

TomerAberbach · 2025-03-18T20:06:41Z

Thanks for the report! Can you provide:

Java example code
The resulting JSON that's sent from the example code
The JSON you would expect instead

IvanLuchkin · 2025-03-18T20:23:15Z

Java example code. I believe everything relevant is included here, if something is missing then I'm willing to provide more information

        ResponseCreateParams request = ResponseCreateParams.builder()
                .text(structuredOutputSchemaConfig)
                .inputOfResponse(listOfPrompts)
                .model(ChatModel.GPT_4O_MINI)
                .store(true)
                .build();

        Response response = OpenAIOkHttpClient.builder()
                .apiKey(apiKey)
                .build().responses().create(request);

Actual JSON

{
  "input": [
    {
      "content": [
        {
          "text": "Developer prompt text. Static content",
          "type": "input_text"
        }
      ],
      "role": "developer"
    },
    {
      "content": [
        {
          "text": "Phone call transcription. Dynamic content",
          "type": "input_text"
        }
      ],
      "role": "user"
    }
  ],
  "model": "gpt-4o-mini",
  "store": true,
  "text": {
    "format": {
      "schema": {
        "required": [
          "key1",
          "key2",
          "key3"
        ],
        "additionalProperties": false,
        "properties": {
          "key1": {
            "type": "string",
            "description": "some prompt"
          },
          "key2": {
            "type": "array",
            "description": "some prompt",
            "items": {
              "type": "string"
            }
          },
          "key3": {
            "type": "string",
            "description": "some prompt"
          }
        },
        "type": "object"
      },
      "type": "json_schema",
      "name": "someStaticSchema",
      "strict": true
    }
  },
}

Desired JSON

{
  "text": {
    "format": {
      "schema": {
        "required": [
          "key1",
          "key2",
          "key3"
        ],
        "additionalProperties": false,
        "properties": {
          "key1": {
            "type": "string",
            "description": "some prompt"
          },
          "key2": {
            "type": "array",
            "description": "some prompt",
            "items": {
              "type": "string"
            }
          },
          "key3": {
            "type": "string",
            "description": "some prompt"
          }
        },
        "type": "object"
      },
      "type": "json_schema",
      "name": "someStaticSchema",
      "strict": true
    }
  },
  "input": [
    {
      "content": [
        {
          "text": "Developer prompt text. Static content",
          "type": "input_text"
        }
      ],
      "role": "developer"
    },
    {
      "content": [
        {
          "text": "Phone call transcription. Dynamic content",
          "type": "input_text"
        }
      ],
      "role": "user"
    }
  ],
  "model": "gpt-4o-mini",
  "store": true,
}

For this particular case, I would want to just move the text content onto the start of the payload. But for different cases and applications, the setup of which content is dynamic and which is static will differ

TomerAberbach · 2025-03-18T20:36:38Z

I see, so we cannot statically pick a nice order of properties. Putting that aside...

I wonder if the docs are not saying that the JSON order matters. They might just be talking about the order of content within ordered stuff in the JSON (like strings and arrays). It would be pretty weird to me if the order of JSON keys mattered, since JSON objects are generally supposed to be unordered.

@kwhinnery-openai do you know if this prompt caching thing is referred to the content order or the actual JSON key order?

IvanLuchkin · 2025-03-18T20:44:59Z

I agree that this may look confusing because of JSON specification stating that the order does not matter. But I have experimented with it myself from the very launch of Prompt Caching, using my own OpenAI client, where I could control the order of keys and I saw a significant difference in the amount of cache hits when supplying request content in the "desired" order above. (Note: described tests involved a payload order similar to my examples, but using Chat Completions API and function calling, NOT Responses API and Structured Outputs)

If OpenAI API truly doesn't take the order of request content into account, maximizing cache hits is problematic for anyone not using their custom clients.
If OpenAI API does take the request content into account, that means they have made some sort of decision that, say, "input" should always be before "text", which is weird because, again, cache hits are then less likely for a significant number of situations.

And, of course, the actual "stuff" in those parts of the request content that can get cached (prompts, schemas, function calls, etc.) is affected by the caching mechanism in a way that you would get more hits when static order is placed first. But the question is on a bit higher level here.

And referring to the docs - I would say that they are just ambiguous, covering what needs to be understood for simple cases only. More specifically, it uses word "prompt", and it is extremely unclear what exactly does it refer to. It is disclosed that multiple things can be cached, and all of them would be considered "prompt". My personal assumption is that everything in the request payload that in some way or another gets fed to the models is a prompt, which is validated by my tests.

TL;DR: Yes, JSON key order in input (messages), function call and structured output schemas matters. But if you have both input and some schema for structured outputs, depending on what changes frequently and what doesn't you will get different cache performance. Which led me to a conclusion that the order of JSON keys in the whole request payload JSON matters too.

TomerAberbach · 2025-03-18T21:20:01Z

Thanks for the context! I chatted with @kwhinnery-openai and he's going to look into what the behavior is and get back to you/us

kwhinnery-openai · 2025-03-20T17:36:39Z

After a few internal discussions, I was able to confirm that the ordering of content in the JSON request body does impact prompt caching hits, and repeated content (not just in the input or messages array) should be provided at the beginning of a request payload.

From an SDK standpoint, we should try to optimize this, or minimally, provide developers granular control over the ordering of content when they build request payloads. We will need to spend time thinking through how to do this and document a solution.

IvanLuchkin · 2025-03-20T18:52:08Z

@kwhinnery-openai amazing to hear this from you. Looking forward to seeing something that would help with this be implemented. I also want to mention that there may be a similar problem with other official OpenAI SDK's (python, js, etc.)

IvanLuchkin · 2025-04-28T23:56:43Z

Hello! Are there any updates to this?

TomerAberbach · 2025-04-30T17:21:24Z

I don't think we've decided when/how to prioritize this yet

TomerAberbach changed the title ~~[Question] Control the order of request content for Responses API~~ Optimize JSON property order for prompt caching Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize JSON property order for prompt caching #316

Optimize JSON property order for prompt caching #316

IvanLuchkin commented Mar 16, 2025 •

edited

Loading

TomerAberbach commented Mar 18, 2025

IvanLuchkin commented Mar 18, 2025

TomerAberbach commented Mar 18, 2025

IvanLuchkin commented Mar 18, 2025 •

edited

Loading

TomerAberbach commented Mar 18, 2025

kwhinnery-openai commented Mar 20, 2025

IvanLuchkin commented Mar 20, 2025

IvanLuchkin commented Apr 28, 2025

TomerAberbach commented Apr 30, 2025 •

edited

Loading

Optimize JSON property order for prompt caching #316

Optimize JSON property order for prompt caching #316

Comments

IvanLuchkin commented Mar 16, 2025 • edited Loading

TomerAberbach commented Mar 18, 2025

IvanLuchkin commented Mar 18, 2025

TomerAberbach commented Mar 18, 2025

IvanLuchkin commented Mar 18, 2025 • edited Loading

TomerAberbach commented Mar 18, 2025

kwhinnery-openai commented Mar 20, 2025

IvanLuchkin commented Mar 20, 2025

IvanLuchkin commented Apr 28, 2025

TomerAberbach commented Apr 30, 2025 • edited Loading

IvanLuchkin commented Mar 16, 2025 •

edited

Loading

IvanLuchkin commented Mar 18, 2025 •

edited

Loading

TomerAberbach commented Apr 30, 2025 •

edited

Loading