Skip to content

Generalise exponential bucket histogram, add percentile calculation #127597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented May 1, 2025

I want to re-use the exponential-bucket histogram implemented as HandlingTimeTracker for our queue latency metric.

This PR renames it, and references to it, to make it more suitable for re-use and adds getPercentile which can be used to calculate a percentile value from its contents.

See #120488

Relates ES-10531

Note there's a massive amount of commits because I just took the other PR and removed the metrics bits. It should all get squashed away anyway.

nicktindall and others added 30 commits January 21, 2025 13:04
…ialBucketHistogram.java

Co-authored-by: Dianna Hohensee <[email protected]>
@nicktindall nicktindall added :Distributed Coordination/Network Http and internode communication implementations >refactoring labels May 1, 2025
@nicktindall nicktindall requested a review from a team as a code owner May 1, 2025 06:45
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine elasticsearchmachine added Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0 labels May 1, 2025
@nicktindall nicktindall changed the title Generalise exponential bucket histogram Generalise exponential bucket histogram, add percentile calculation May 1, 2025
* @return An array of frequencies of handling times in buckets with upper bounds as returned by {@link #getBucketUpperBounds()}, plus
* an extra bucket for handling times longer than the longest upper bound.
*/
public long[] getSnapshot() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this from getHistogram to getSnapshot because histogram.getHistogram() would look weird.

Copy link
Contributor

@JeremyDahlgren JeremyDahlgren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness it might be good to add a few more test cases noted below, otherwise LGTM.

Comment on lines +96 to +98
if (percentile < 0 || percentile > 1) {
throw new IllegalArgumentException("Requested percentile must be in [0, 1.0], percentile=" + percentile);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a few quick expectThrows() tests for these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 221e079

Comment on lines 119 to 123
public void clear() {
for (int i = 0; i < BUCKET_COUNT; i++) {
buckets[i].reset();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a test for clear()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 221e079

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks for splitting this out.

Comment on lines +96 to +98
if (percentile < 0 || percentile > 1) {
throw new IllegalArgumentException("Requested percentile must be in [0, 1.0], percentile=" + percentile);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Comment on lines 119 to 123
public void clear() {
for (int i = 0; i < BUCKET_COUNT; i++) {
buckets[i].reset();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


/**
* Tracks how long message handling takes on a transport thread as a histogram with fixed buckets.
*/
public class HandlingTimeTracker {
public class HandlingTimeTracker extends ExponentialBucketHistogram {
Copy link
Contributor Author

@nicktindall nicktindall May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep the HandlingTimeTracker because the decision to have 17 + 1 buckets is contextual and should be fixed because it's used in serialization. It also reduces the size of the change.

HandlingTimeTracker is now just an ExponentialBucketHistogram of a specific size.

public ExponentialBucketHistogram(int bucketCount) {
if (bucketCount < 2 || bucketCount > Integer.SIZE) {
throw new IllegalArgumentException("Bucket count must be in [2, " + Integer.SIZE + "], got " + bucketCount);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The observations are long but the bucket bounds are integers, I suppose because the scale of the numbers we've been putting in here are easily contained in the integer range. Perhaps we could revisit this if we need histograms to hold larger values, but that's not the case for the queue latency metric.

@nicktindall nicktindall merged commit f1f7459 into elastic:main May 5, 2025
17 checks passed
@nicktindall nicktindall deleted the ES-10531_generalise_exponential_bucket_histogram branch May 5, 2025 05:55
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request May 9, 2025
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >refactoring Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants