[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores failing #127286

elasticsearchmachine · 2025-04-23T21:02:04Z

Build Scans:

Reproduction Line:

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores" -Dtests.seed=7102862F2F4E8B3E -Dtests.locale=en-DE -Dtests.timezone=Indian/Comoro -Druntime.java=24

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: 
Expected: a collection with size <1>
     but: collection size was <0>

Issue Reasons:

[main] 3 failures in test testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores (1.1% fail rate in 266 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2025-04-23T21:02:07Z

This has been muted on branch 8.x

Mute Reasons:

[8.x] 2 failures in test testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores (4.0% fail rate in 50 executions)

Build Scans:

…ldDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores #127286

elasticsearchmachine · 2025-04-23T21:02:32Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

JeremyDahlgren · 2025-04-30T20:31:59Z

It looks like there is a race condition where the tiny node ends up hosting a shard from either the original index or the index copy. The assert at the end fails when it is only checking for shards from the original index, when it instead has a single shard from the index copy. To reproduce this more reliably I forced usableSpace = shardSizes.getSmallestShardSize() and indexRandom(true, indexName, 100) to build smaller shards and keep the usable space at the minimum.

elasticsearchmachine · 2025-05-01T13:19:33Z

This has been muted on branch main

Mute Reasons:

[main] 3 failures in test testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores (1.1% fail rate in 266 executions)

Build Scans:

…ldDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores #127286

The test launches two concurrent restores and wants to verify that the node with limited disk space is only assigned a single shard from one of the indices. The test was asserting that it had one shard from the first index, but it is possible for it to get one shard from the index copy instead. This change allows the shard to be from either index, but still asserts there is only one assignment to the tiny node. Closes elastic#127286

The test launches two concurrent restores and wants to verify that the node with limited disk space is only assigned a single shard from one of the indices. The test was asserting that it had one shard from the first index, but it is possible for it to get one shard from the index copy instead. This change allows the shard to be from either index, but still asserts there is only one assignment to the tiny node. Closes elastic#127286 (cherry picked from commit 6263f44)

The test launches two concurrent restores and wants to verify that the node with limited disk space is only assigned a single shard from one of the indices. The test was asserting that it had one shard from the first index, but it is possible for it to get one shard from the index copy instead. This change allows the shard to be from either index, but still asserts there is only one assignment to the tiny node. Closes elastic#127286 (cherry picked from commit 6263f44) # Conflicts: # muted-tests.yml

elasticsearchmachine added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >test-failure Triaged test failures from CI labels Apr 23, 2025

elasticsearchmachine added a commit that referenced this issue Apr 23, 2025

Mute org.elasticsearch.cluster.routing.allocation.decider.DiskThresho…

27c4da0

…ldDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores #127286

elasticsearchmachine added needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Distributed Coordination Meta label for Distributed Coordination team labels Apr 23, 2025

JeremyDahlgren self-assigned this Apr 29, 2025

JeremyDahlgren added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 30, 2025

elasticsearchmachine added a commit that referenced this issue May 1, 2025

Mute org.elasticsearch.cluster.routing.allocation.decider.DiskThresho…

1af1f02

…ldDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores #127286

JeremyDahlgren mentioned this issue May 1, 2025

Fix assertion in DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores #127615

Merged

JeremyDahlgren closed this as completed in #127615 May 2, 2025

JeremyDahlgren closed this as completed in 6263f44 May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores failing #127286

[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores failing #127286

elasticsearchmachine commented Apr 23, 2025 •

edited

Loading

elasticsearchmachine commented Apr 23, 2025

elasticsearchmachine commented Apr 23, 2025

JeremyDahlgren commented Apr 30, 2025

elasticsearchmachine commented May 1, 2025

[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores failing #127286

[CI] DiskThresholdDeciderIT testRestoreSnapshotAllocationDoesNotExceedWatermarkWithMultipleRestores failing #127286

Comments

elasticsearchmachine commented Apr 23, 2025 • edited Loading

elasticsearchmachine commented Apr 23, 2025

elasticsearchmachine commented Apr 23, 2025

JeremyDahlgren commented Apr 30, 2025

elasticsearchmachine commented May 1, 2025

elasticsearchmachine commented Apr 23, 2025 •

edited

Loading