-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] Inference API add configurable connection pool TTL #127585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*/ | ||
public static final Setting<TimeValue> CONNECTION_TTL_SETTING = Setting.timeSetting( | ||
"xpack.inference.elastic.http.connection_ttl", | ||
// -1 indicates that the TTL never expires |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the constructor for the pooling connection manager
public PoolingNHttpClientConnectionManager(ConnectingIOReactor ioReactor, NHttpConnectionFactory<ManagedNHttpClientConnection> connFactory, Registry<SchemeIOSessionStrategy> ioSessionFactoryRegistry, SocketAddressResolver<HttpRoute> socketAddressResolver) {
this(ioReactor, connFactory, ioSessionFactoryRegistry, socketAddressResolver, -1L, TimeUnit.MILLISECONDS);
}
public static final Setting<TimeValue> CONNECTION_TTL_SETTING = Setting.timeSetting( | ||
"xpack.inference.elastic.http.connection_ttl", | ||
// -1 indicates that the TTL never expires | ||
TimeValue.MINUS_ONE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not entirely familiar with the use case to give specific guidance. But some questions to consider:
- Is this a high request rate client?
- Would having to re-perform the TLS handshake more frequently cause any notable latency/performance issues?
- Do you have metrics in place that could monitor any potential negative consequences of setting this?
And to confirm, this setting wouldn't abruptly close connections with in-flight requests, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Diana
Is this a high request rate client?
Yeah it could be, at the moment likely not but once we support sparse embedding I can see it having a higher request rate.
Would having to re-perform the TLS handshake more frequently cause any notable latency/performance issues?
I'm not sure
Do you have metrics in place that could monitor any potential negative consequences of setting this?
Not at the moment but we should probably add some 😅
And to confirm, this setting wouldn't abruptly close connections with in-flight requests, correct?
My understanding of the docs is that if the TTL is reached, the pool won't reuse an existing connection. I haven't seen anything about it terminating active connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is for closing idle connections, so it should be safe to set this to some value so that the connection gets released from the NLB/EIS.
My understanding is it just needs to be lower than the NLB's connection timeout, which is ~350s, otherwise the client tries to reuse the connection and the NLB sends TCP RST which I think surfaces as another connection timeout? The setting would depend on the user's traffic patterns, so I feel like it's hard for us to pick a good value for everyone. Making it configurable is a good idea, then maybe we have the default set to 60s (I'm picking that number randomly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value should be set so the connection does expire, this will refresh the connection pool and possibly help overcome bugs with broken connections. The setting limits the lifespan of the connection in the pool, 24 hours seems like a reasonable value. If all the connections in the pool are created at the same time maybe add some jitter (+/- [1-60]) mins so they don't all EOL at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
24 hours seems like a reasonable value
One thing to note is that we have an idle connection evictor that runs every minute and checks to see if any connection in the pool has been idle for over at least a minute. If it has, the connection is closed and removed from the pool. I'm not saying that a 1 minute idle time is the correct value, but if we leave that as it is I suspect that a 24 hour value for the TTL won't have much affect (if the connection is idle it will have been closed long before the 24 hours is up). Unless a user is constantly sending requests for a 24 hour period. Maybe that'd happen more often when we go to multi-tenant (the connect is associated with a URL not the user) or in a reindex use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note is that we have an idle connection evictor that runs every minute and checks to see if any connection in the pool has been idle for over at least a minute.
Does this mean that if we set the TTL to anything above 60s, it won't have an effect as the eviction will happen earlier?
And for my understanding, what we're trying to do by setting the right TTL is to find a balance between avoiding unnecessarily frequent TLS handshakes with new connections, vs avoiding unnecessarily long-lived connections that consume resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that if we set the TTL to anything above 60s, it won't have an effect as the eviction will happen earlier?
I believe the connection will be evicted if it is idle for 60 seconds or more. If we set the TTL to 2 minutes, after the connection has existed for 2 minutes or more (even if it hasn't been idle for the 60 seconds), when a request is sent , the pool will not use that connection. It's basically an upper bound limit that takes priority over other settings. So if a connection is viable for leasing in each other way (hasn't been idle for a long period of time, etc) if it has reached the TTL it won't be used.
So it still has an effect if we set it above 60 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion, 60s sounds like a good initial setting, then we can revisit this if the workload changes (e.g. ELSER inference bursts at ingest). Some metrics/logs would help to keep an eye on this.
Pinging @elastic/ml-core (Team:ML) |
* Adding ttl * Using 60 seconds as default
💚 Backport successful
|
* Adding ttl * Using 60 seconds as default
* Adding ttl * Using 60 seconds as default
This PR allows the connection pool TTL value to be configured via a setting. This is currently only used for the EIS connection pool to help triage the connection timeouts issue.