[ML] Inference API add configurable connection pool TTL #127585

jonathan-buttner · 2025-04-30T20:50:12Z

This PR allows the connection pool TTL value to be configured via a setting. This is currently only used for the EIS connection pool to help triage the connection timeouts issue.

jonathan-buttner · 2025-04-30T20:50:41Z

...java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSettings.java

+     */
+    public static final Setting<TimeValue> CONNECTION_TTL_SETTING = Setting.timeSetting(
+        "xpack.inference.elastic.http.connection_ttl",
+        // -1 indicates that the TTL never expires


This is the constructor for the pooling connection manager

public PoolingNHttpClientConnectionManager(ConnectingIOReactor ioReactor, NHttpConnectionFactory<ManagedNHttpClientConnection> connFactory, Registry<SchemeIOSessionStrategy> ioSessionFactoryRegistry, SocketAddressResolver<HttpRoute> socketAddressResolver) { this(ioReactor, connFactory, ioSessionFactoryRegistry, socketAddressResolver, -1L, TimeUnit.MILLISECONDS); }

jonathan-buttner · 2025-04-30T20:55:09Z

...java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceSettings.java

+    public static final Setting<TimeValue> CONNECTION_TTL_SETTING = Setting.timeSetting(
+        "xpack.inference.elastic.http.connection_ttl",
+        // -1 indicates that the TTL never expires
+        TimeValue.MINUS_ONE,


@maxjakob @prwhelan @dtuck9 any thoughts on a default value here?

Sorry I'm not entirely familiar with the use case to give specific guidance. But some questions to consider:

Is this a high request rate client?

Would having to re-perform the TLS handshake more frequently cause any notable latency/performance issues?

Do you have metrics in place that could monitor any potential negative consequences of setting this?

And to confirm, this setting wouldn't abruptly close connections with in-flight requests, correct?

Thanks Diana

Is this a high request rate client?

Yeah it could be, at the moment likely not but once we support sparse embedding I can see it having a higher request rate.

Would having to re-perform the TLS handshake more frequently cause any notable latency/performance issues?

I'm not sure

Do you have metrics in place that could monitor any potential negative consequences of setting this?

Not at the moment but we should probably add some 😅

And to confirm, this setting wouldn't abruptly close connections with in-flight requests, correct?

My understanding of the docs is that if the TTL is reached, the pool won't reuse an existing connection. I haven't seen anything about it terminating active connections.

I believe this is for closing idle connections, so it should be safe to set this to some value so that the connection gets released from the NLB/EIS.

My understanding is it just needs to be lower than the NLB's connection timeout, which is ~350s, otherwise the client tries to reuse the connection and the NLB sends TCP RST which I think surfaces as another connection timeout? The setting would depend on the user's traffic patterns, so I feel like it's hard for us to pick a good value for everyone. Making it configurable is a good idea, then maybe we have the default set to 60s (I'm picking that number randomly)

The default value should be set so the connection does expire, this will refresh the connection pool and possibly help overcome bugs with broken connections. The setting limits the lifespan of the connection in the pool, 24 hours seems like a reasonable value. If all the connections in the pool are created at the same time maybe add some jitter (+/- [1-60]) mins so they don't all EOL at the same time

Thanks for the feedback.

24 hours seems like a reasonable value

One thing to note is that we have an idle connection evictor that runs every minute and checks to see if any connection in the pool has been idle for over at least a minute. If it has, the connection is closed and removed from the pool. I'm not saying that a 1 minute idle time is the correct value, but if we leave that as it is I suspect that a 24 hour value for the TTL won't have much affect (if the connection is idle it will have been closed long before the 24 hours is up). Unless a user is constantly sending requests for a 24 hour period. Maybe that'd happen more often when we go to multi-tenant (the connect is associated with a URL not the user) or in a reindex use case.

One thing to note is that we have an idle connection evictor that runs every minute and checks to see if any connection in the pool has been idle for over at least a minute.

Does this mean that if we set the TTL to anything above 60s, it won't have an effect as the eviction will happen earlier?
And for my understanding, what we're trying to do by setting the right TTL is to find a balance between avoiding unnecessarily frequent TLS handshakes with new connections, vs avoiding unnecessarily long-lived connections that consume resources?

Does this mean that if we set the TTL to anything above 60s, it won't have an effect as the eviction will happen earlier?

I believe the connection will be evicted if it is idle for 60 seconds or more. If we set the TTL to 2 minutes, after the connection has existed for 2 minutes or more (even if it hasn't been idle for the 60 seconds), when a request is sent , the pool will not use that connection. It's basically an upper bound limit that takes priority over other settings. So if a connection is viable for leasing in each other way (hasn't been idle for a long period of time, etc) if it has reached the TTL it won't be used.

So it still has an effect if we set it above 60 seconds.

I don't have a strong opinion, 60s sounds like a good initial setting, then we can revisit this if the workload changes (e.g. ELSER inference bursts at ingest). Some metrics/logs would help to keep an eye on this.

elasticsearchmachine · 2025-05-01T18:17:27Z

Pinging @elastic/ml-core (Team:ML)

* Adding ttl * Using 60 seconds as default

elasticsearchmachine · 2025-05-07T12:17:27Z

💚 Backport successful

Status	Branch	Result
✅	8.19

…27821) * Adding ttl * Using 60 seconds as default

* Adding ttl * Using 60 seconds as default

Adding ttl

c6ae06f

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Apr 30, 2025

jonathan-buttner commented Apr 30, 2025

View reviewed changes

Merge branch 'main' into ml-pool-ttl

265eac9

jonathan-buttner requested review from maxjakob and prwhelan May 1, 2025 12:46

jonathan-buttner marked this pull request as ready for review May 1, 2025 18:17

prwhelan approved these changes May 1, 2025

View reviewed changes

demjened approved these changes May 5, 2025

View reviewed changes

jonathan-buttner and others added 2 commits May 6, 2025 13:57

Using 60 seconds as default

6b6d138

Merge branch 'main' into ml-pool-ttl

3087ee1

jonathan-buttner merged commit 3b4e536 into elastic:main May 7, 2025
17 checks passed

jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request May 7, 2025

[ML] Inference API add configurable connection pool TTL (elastic#127585)

c8f0c8b

* Adding ttl * Using 60 seconds as default

jonathan-buttner mentioned this pull request May 7, 2025

[8.19] [ML] Inference API add configurable connection pool TTL (#127585) #127821

Merged

elasticsearchmachine pushed a commit that referenced this pull request May 7, 2025

[ML] Inference API add configurable connection pool TTL (#127585) (#1…

6197328

…27821) * Adding ttl * Using 60 seconds as default

ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request May 9, 2025

[ML] Inference API add configurable connection pool TTL (elastic#127585)

66cb0a9

* Adding ttl * Using 60 seconds as default

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request May 9, 2025

[ML] Inference API add configurable connection pool TTL (elastic#127585)

23c8e6c

* Adding ttl * Using 60 seconds as default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Inference API add configurable connection pool TTL #127585

[ML] Inference API add configurable connection pool TTL #127585

jonathan-buttner commented Apr 30, 2025

jonathan-buttner Apr 30, 2025 •

edited

Loading

jonathan-buttner Apr 30, 2025

dtuck9 Apr 30, 2025

jonathan-buttner May 1, 2025 •

edited

Loading

prwhelan May 1, 2025

davidkyle May 2, 2025

jonathan-buttner May 2, 2025

demjened May 5, 2025 •

edited

Loading

jonathan-buttner May 5, 2025

demjened May 5, 2025

elasticsearchmachine commented May 1, 2025

elasticsearchmachine commented May 7, 2025

[ML] Inference API add configurable connection pool TTL #127585

[ML] Inference API add configurable connection pool TTL #127585

Conversation

jonathan-buttner commented Apr 30, 2025

jonathan-buttner Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

jonathan-buttner Apr 30, 2025

Choose a reason for hiding this comment

dtuck9 Apr 30, 2025

Choose a reason for hiding this comment

jonathan-buttner May 1, 2025 • edited Loading

Choose a reason for hiding this comment

prwhelan May 1, 2025

Choose a reason for hiding this comment

davidkyle May 2, 2025

Choose a reason for hiding this comment

jonathan-buttner May 2, 2025

Choose a reason for hiding this comment

demjened May 5, 2025 • edited Loading

Choose a reason for hiding this comment

jonathan-buttner May 5, 2025

Choose a reason for hiding this comment

demjened May 5, 2025

Choose a reason for hiding this comment

elasticsearchmachine commented May 1, 2025

elasticsearchmachine commented May 7, 2025

💚 Backport successful

jonathan-buttner Apr 30, 2025 •

edited

Loading

jonathan-buttner May 1, 2025 •

edited

Loading

demjened May 5, 2025 •

edited

Loading