Skip to content

Deprecate metadata_request_timeout #469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dkropachev
Copy link
Collaborator

@dkropachev dkropachev commented May 7, 2025

When I was testing e4a000f, I have figure that out there is another option that controls system queries timeout - Cluster.control_connection_timeout, switch to use that it instead of Cluster.metadata_request_timeout, but did not remove Cluster.metadata_request_timeout.

Effectively Cluster.metadata_request_timeout is not working, what is working is Cluster.control_connection_timeout.
To address that let's mark Cluster.metadata_request_timeout it as deprecated in favor of Cluster.control_connection_timeout, to be removed in next minor release.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@dkropachev dkropachev requested a review from Lorak-mmk May 7, 2025 19:09
@dkropachev dkropachev self-assigned this May 7, 2025
When I was testing scylladb@e4a000f,
I have figure that out there is another option that controls system queries timeout - `Cluster.control_connection_timeout`,
switched to use it instead of `Cluster.metadata_request_timeout`, but did not remove `Cluster.metadata_request_timeout`.

Effectively `Cluster.metadata_request_timeout` is not working, what is working is `Cluster.control_connection_timeout`.
To address that let's mark `Cluster.metadata_request_timeout` it as deprecated in favor of `Cluster.control_connection_timeout`, to be removed in next minor release.
@dkropachev dkropachev force-pushed the dk/depricate-metadata_request_timeout branch from c0898eb to 8ef0565 Compare May 7, 2025 19:16
@Lorak-mmk
Copy link

The way I understand it control_connection_timeout was previously responsible for client-side timeout, and is now responsible for both (but we thought that metadata_request_timeout is responsible for server-side).
Do we always want client-side and server-side timeout to be the same?
If yes, then the fix makes sense. If not, then we should not deprecate it, and instead start using it for server-side timeout.

If we decide to keep this, fix, then apart from printing a warning I'd also set control_connection_timeout to the bigger of the 2 values (control_connection_timeout, metadata_request_timeout) to prevent client-side timeout from being significantly lower.

@Lorak-mmk
Copy link

Lorak-mmk commented May 7, 2025

Btw having integration tests for this functionality would have prevented this. We should not skip them in further PRs.

@dkropachev
Copy link
Collaborator Author

The way I understand it control_connection_timeout was previously responsible for client-side timeout, and is now responsible for both (but we thought that metadata_request_timeout is responsible for server-side). Do we always want client-side and server-side timeout to be the same? If yes, then the fix makes sense. If not, then we should not deprecate it, and instead start using it for server-side timeout.

I don't see any reason why they should not go together, only thing that I would have changed is to set server side timeout a bit less than client side, say 100ms less to offset possibl network delay.

If we decide to keep this, fix, then apart from printing a warning I'd also set control_connection_timeout to the bigger of the 2 values (control_connection_timeout, metadata_request_timeout) to prevent client-side timeout from being significantly lower.

no reason to do so, metadata_request_timeout is not working at all.

@Lorak-mmk
Copy link

no reason to do so, metadata_request_timeout is not working at all.

Right, my brain stopped working for a minute, sorry.

@dkropachev
Copy link
Collaborator Author

Btw having integration tests for this functionality would have prevented this. We should not skip them in further PRs.

What happened is I started working on the test and found that it is not working, from there I have found that there is control_connection_timeout and switched to it, dropping all the tests.
control_connection_timeout it self have test coverage.
So, only thing that would help is me carefully going over PR again after all these changes.

@mykaul
Copy link

mykaul commented May 8, 2025

  1. Don't deprecate a used feature. Not like that.
  2. There's a difference between the control connection timeout and how long it should take, after the connection is established, to fetch (sometimes very large) system data, such as the schema.

@dkropachev
Copy link
Collaborator Author

dkropachev commented May 8, 2025

  1. Don't deprecate a used feature. Not like that.

Please read #469 (comment) and PR Description, metadata_request_timeout is not working and never worked, it was introduced by mistake.
As of now, control_connection_timeout controls both server-side and client-side timeout.

  1. There's a difference between the control connection timeout and how long it should take, after the connection is established, to fetch (sometimes very large) system data, such as the schema.

control_connection_timeout is client-side timeout, metadata_request_timeout - supposed to be server-side timeout.
I don't see any good reason keep them separate, it only create space for missconfiguration which will result in having retries for completed requests.

@mykaul
Copy link

mykaul commented May 11, 2025

  1. Don't deprecate a used feature. Not like that.

Please read #469 (comment) and PR Description, metadata_request_timeout is not working and never worked, it was introduced by mistake. As of now, control_connection_timeout controls both server-side and client-side timeout.

  1. There's a difference between the control connection timeout and how long it should take, after the connection is established, to fetch (sometimes very large) system data, such as the schema.

control_connection_timeout is client-side timeout, metadata_request_timeout - supposed to be server-side timeout. I don't see any good reason keep them separate, it only create space for missconfiguration which will result in having retries for completed requests.

I reiterate that there was a good reason to use a separate, server-side timeout, for the schema fetch queries. There's a separate phase to the connection - which is lengthy, and you don't wish to retry the whole connection, just because of the big schema (which causes a long schema query). This is a Scylla-specific feature we can take advantage of (that Cassandra may not have - did not check it). We've implemented it in Java too and I think it's useful. If it doesn't work here, we should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants