SERVER-106474 Enforce a maximum idle timeout for resmoke tasks on required variants (#40934)

GitOrigin-RevId: 1df2ed0c9ad0a0aaf89008b9b5fa0830e7fdc609
This commit is contained in:
Sean Lyons 2025-09-04 13:19:03 -04:00 committed by MongoDB Bot
parent 3be04ed957
commit 851e20dbaf
5 changed files with 37 additions and 30 deletions

4
.github/CODEOWNERS vendored
View File

@ -253,6 +253,10 @@ WORKSPACE.bazel @10gen/devprod-build @svc-auto-approve-bot
/buildscripts/smoke_tests/**/server_storage_engine_integration.yml @10gen/server-storage-engine-integration @svc-auto-approve-bot
/buildscripts/smoke_tests/**/server_ttl.yml @10gen/server-ttl @svc-auto-approve-bot
# The following patterns are parsed from ./buildscripts/tests/OWNERS.yml
/buildscripts/tests/ @10gen/devprod-build @svc-auto-approve-bot
/buildscripts/tests/test_evergreen_task_timeout.py @10gen/devprod-correctness @svc-auto-approve-bot
# The following patterns are parsed from ./buildscripts/tests/burn_in_tests_end2end/OWNERS.yml
/buildscripts/tests/burn_in_tests_end2end/ @10gen/devprod-correctness @svc-auto-approve-bot

View File

@ -50,7 +50,7 @@ DEFAULT_NON_REQUIRED_BUILD_TIMEOUT = timedelta(hours=2)
# An idle timeout will expire in the presence of an exceptionally long running test in a resmoke task.
# This helps prevent the introduction of new long-running tests in required build variants.
DEFAULT_REQUIRED_BUILD_IDLE_TIMEOUT = timedelta(minutes=16)
MAXIMUM_REQUIRED_BUILD_IDLE_TIMEOUT = timedelta(minutes=16)
class TimeoutOverride(BaseModel):
@ -296,14 +296,14 @@ class TaskTimeoutOrchestrator:
LOGGER.info("Overriding configured timeout", idle_timeout_secs=override.total_seconds())
determined_timeout = override
elif self._is_required_build_variant(variant) and (
determined_timeout is None or determined_timeout > DEFAULT_REQUIRED_BUILD_IDLE_TIMEOUT
if self._is_required_build_variant(variant) and (
determined_timeout is None or determined_timeout > MAXIMUM_REQUIRED_BUILD_IDLE_TIMEOUT
):
LOGGER.info(
"Overriding required-builder idle timeout",
idle_timeout_secs=DEFAULT_REQUIRED_BUILD_IDLE_TIMEOUT.total_seconds(),
idle_timeout_secs=MAXIMUM_REQUIRED_BUILD_IDLE_TIMEOUT.total_seconds(),
)
determined_timeout = DEFAULT_REQUIRED_BUILD_IDLE_TIMEOUT
determined_timeout = MAXIMUM_REQUIRED_BUILD_IDLE_TIMEOUT
return determined_timeout

View File

@ -0,0 +1,8 @@
version: 2.0.0
filters:
- "*":
approvers:
- 10gen/devprod-build
- "test_evergreen_task_timeout.py":
approvers:
- 10gen/devprod-correctness

View File

@ -409,7 +409,7 @@ class TestDetermineIdleTimeout(unittest.TestCase):
build_variant="variant-required",
display_name="! required",
timeout_override=None,
expected_timeout=under_test.DEFAULT_REQUIRED_BUILD_IDLE_TIMEOUT,
expected_timeout=under_test.MAXIMUM_REQUIRED_BUILD_IDLE_TIMEOUT,
)
def test_prefer_shorter_that_default_on_required_variants(self):

View File

@ -1,35 +1,30 @@
# Evergreen Task Timeouts
## Type of timeouts
## Types of timeouts
There are two types of timeouts that [evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate):
There are two types of timeouts that [Evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate):
- **Exec timeout**: The _exec_ timeout is the overall timeout for a task. Once the total runtime for
a test hits this value, the timeout logic will be triggered. This value is specified by
**exec_timeout_secs** in the evergreen configuration.
- **Idle timeout**: The _idle_ timeout is the amount of time in which evergreen will wait for
output to be created before it considers the task hung and triggers timeout logic. This value
is specified by **timeout_secs** in the evergreen configuration.
- **Exec Timeout**: The _exec timeout_ is the overall timeout for a task. Once the total runtime for a test exceeds this value, the timeout logic will be triggered. This value is specified by `exec_timeout_secs` in the Evergreen configuration.
- **Idle Timeout**: The _idle timeout_ is the amount of time Evergreen will wait for output to be generated before considering the task hung and triggering the timeout logic. This value is specified by `timeout_secs` in the Evergreen configuration.
**Note**: In most cases, **exec_timeout** is usually the more useful of the timeouts.
**Note**: In most cases, the **exec timeout** is the more useful of the two timeouts.
## Setting the timeout for a task
There are a few ways in which the timeout can be determined for a task running in evergreen.
There are several ways to set the timeout for a task running in Evergreen.
- **Specified in 'etc/evergreen.yml'**: Timeout can be specified directly in the 'evergreen.yml' file,
both on tasks and build variants. This can be useful for setting default timeout values, but is limited
since different build variants frequently have different runtime characteristics and it is not possible
to set timeouts for a task running on a specific build variant.
### Specifying timeouts in the Evergreen YAML configuration
- **etc/evergreen_timeouts.yml**: The 'etc/evergreen_timeouts.yml' file for overriding timeouts
for specific tasks on specific build variants. This provides a work-around for the limitations of
specifying the timeouts directly in the 'evergreen.yml'. In order to use this method, the task
must run the "determine task timeout" and "update task timeout expansions" functions at the beginning
of the task evergreen definition. Most resmoke tasks already do this.
Timeouts can be specified directly in the `evergreen.yml` (and related) files, both for tasks and build variants. This approach is useful for setting default timeout values but is limited because different build variants often have varying runtime characteristics. This means it is not possible to set timeouts for a specific task running on a specific build variant using only this method.
- **buildscripts/evergreen_task_timeout.py**: This is the script that reads the 'etc/evergreen_timeouts.yml'
file and calculates the timeout to use. Additionally, it will check the historic test results of the
task being run and see if there is enough information to calculate timeouts based on that. It can
also be used for more advanced ways of determining timeouts (e.g. the script is used to set much
more aggressive timeouts on tasks that are run in the commit-queue).
### Overrides: [etc/evergreen_timeouts.yml](../../etc/evergreen_timeouts.yml)
The `etc/evergreen_timeouts.yml` file allows overriding timeouts for specific tasks on specific build variants. This workaround helps address the limitations of directly specifying timeouts in `evergreen.yml`. To use this method, the task must include the `determine task timeout` and `update task timeout expansions` functions at the beginning of its Evergreen definition. Many Resmoke tasks already incorporate these functions.
### Resmoke tasks: [buildscripts/evergreen_task_timeout.py](../../buildscripts/evergreen_task_timeout.py)
This script reads the `etc/evergreen_timeouts.yml` file to calculate the appropriate timeout settings. Additionally, it checks historical test results for the task being run to determine if enough information is available to calculate timeouts based on past data. The script also supports more advanced methods of determining timeouts, such as applying aggressive timeout measures for tasks executed in the commit queue or on required build variants. In cases of conflict, the commit queue and required build variant limits take precedence over the previous two methods.
### Compile tasks: [evergreen/generate_override_timeout.py](../../evergreen/generate_override_timeout.py)
This script is used for compile tasks defined in files such as `etc/evergreen_yml_components/tasks/compile_tasks.yml` and `etc/evergreen_yml_components/tasks/compile_tasks_shared.yml`. The script reads the `etc/evergreen_timeouts.yml` file and calculates appropriate timeouts. The Evergreen function `override task timeout` then runs this script to update the timeouts accordingly.