Skip to content

Provide a tokenizer in Java for counting input tokens #359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lukehutch opened this issue Mar 25, 2025 · 6 comments
Open

Provide a tokenizer in Java for counting input tokens #359

lukehutch opened this issue Mar 25, 2025 · 6 comments

Comments

@lukehutch
Copy link

Since billing is based on tokens, it would be very helpful to be able to measure how many input and output tokens are used by a given request. I don't see documentation about how to track that. Is this possible? If so, can you please create an example of how to measure it?

@TomerAberbach
Copy link
Collaborator

As far as I know, there's no way to do this in Java right now.

https://platform.openai.com/tokenizer gives a rule of thumb:

A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).

But the only programmatic tokenizer libraries are in Python and JS.

@kwhinnery-openai are there plans to provide support for all languages?

@IvanLuchkin
Copy link

IvanLuchkin commented Mar 27, 2025

Use this https://github.com/knuddelsgmbh/jtokkit, and port this https://github.com/hmarr/openai-chat-tokens. That will give you at least a ballpark value, if you are going to count function calls or schemas in your requests. If not, you will have accurate results.

Also, don't forget something along the lines of

EncodingType encType = request.getModel() == GPT4o_mini || request.getModel() == GPT4o ? O200K_BASE : CL100K_BASE;

since different models utilize different tokenizers. You shouldn't rely on the rule of thumb provided in the documentation if your case involves different languages. Tokenizer efficiency varies drastically depending on the language of the text.

@lukehutch
Copy link
Author

Wouldn't it make sense to just make available the count that was measured by the billing engine?

@IvanLuchkin
Copy link

It is available in the ResponseUsage object. But you only have access to it after the request.

If you need to calculate the estimate for prompt token usage before the request for some reason (rate limit handling for example), you have to use the method I've provided. If you don't need this, then it's pretty straightforward to just use ResponseUsage or its Chat Completions API equivalent.

@lukehutch
Copy link
Author

I can't see how to get a ResponseUsage object from the new responses API (in streaming mode)... Can you please give me a pointer?

@IvanLuchkin
Copy link

Unfortunately, I don't have experience with streaming in Chat Completions or Responses APIs. From what I was able to find, you should be looking for the ResponseStreamEvent.completed.response.usage

@TomerAberbach TomerAberbach changed the title Please document how to measure the number of input and output tokens Provide a tokenizer in Java for counting input tokens Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants