SERVER-110110 Document rule based rewriter (#44992)

GitOrigin-RevId: 101e731ec2bb65957ad1ac200f74632597e7dcc7
This commit is contained in:
Henri Nikku 2025-12-10 16:50:34 +00:00 committed by MongoDB Bot
parent 701fc76a51
commit 0b456b6711
2 changed files with 25 additions and 4 deletions

View File

@ -7,8 +7,6 @@ After an aggregate command issued by a user is parsed into a `Pipeline`, it unde
1. [**Inter-stage Optimization**](#inter-stage-optimization): optimizes the entire `Pipeline` object, which is represented internally as a container of `DocumentSource`s. This modifies the container by combining, swapping, dropping, and/or inserting stages.
1. [**Stage-specific Optimization**](#stage-specific-optimization): optimizes each stage, or `DocumentSource` individually.
<!-- TODO(SERVER-110110): Add links to RBR docs where applicable. -->
For information on how to register new rewrites, see [Registering new rewrites](#registering-new-rewrites).
> ### Aside: Disabling Optimizations
@ -144,7 +142,7 @@ For implementation details about dependency tracking and validation, refer to th
## Stage-specific Optimization
Once we have the final order of the stages, we go through each stage and call [`DocumentSource::optimize()`](https://github.com/mongodb/mongo/blob/65b9efd4861b9f0d61f8b29843d29febcba91bcb/src/mongo/db/pipeline/document_source.h#L507) on each one (called from [`Pipeline::optimizeEachStage()`](https://github.com/mongodb/mongo/blob/65b9efd4861b9f0d61f8b29843d29febcba91bcb/src/mongo/db/pipeline/pipeline.cpp#L383)), either returning an optimized `DocumentSource` that's semantically equivalent or removing the stage if it's a no-op. For instance, a no-op stage like `{$match: {}}` would be removed.
Once we have the final order of the stages, we again invoke the [rule-based rewrite engine](../query/compiler/rewrites/README.md), but this time with a configuration that only run rules that perform stage-specific optimizations. These rules are currently implemented as public `optimize()` methods on `DocumentSource` subclasses and registered as unconditional rules. Each of these rules either returns an optimized `DocumentSource` that's semantically equivalent or removes the current stage if it's a no-op. For instance, a no-op stage like `{$match: {}}` would be removed.
The `MatchExpression` in a `$match` stage contains specific rewrite logic that is covered in greater detail [here](../matcher/README.md).
@ -226,7 +224,7 @@ graph TD
## Registering new rewrites
All pipeline rewrites are invoked through the [rule-based rewrite engine](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L196). While most rewrites are still implemented inside `DocumentSource::optimizeAt()` and `optimize()` and registered as unconditional rules (i.e., rules where the precondition is always true), new rewrites should be implemented and registered as their own, separate rules. A rule is defined by a name, precondition and transform functions, a priority and a set of tags: https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L51-L81
All pipeline rewrites are invoked through the [rule-based rewrite engine](https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L196) (see [README](../query/compiler/rewrites/README.md)). While most rewrites are still implemented inside `DocumentSource::optimizeAt()` and `optimize()` and registered as unconditional rules (i.e., rules where the precondition is always true), new rewrites should be implemented and registered as their own, separate rules. A rule is defined by a name, precondition and transform functions, a priority and a set of tags: https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L51-L81
### Rule registry and registration macros

View File

@ -0,0 +1,23 @@
# Rule-based Rewrite Engine
## Overview
The rule-based rewrite engine is a simple but generic-purpose engine for applying sets of rewrite rules to a data structure. It is currently only used for [optimizing aggregation pipelines](https://github.com/mongodb/mongo/blob/e4bf22b6936f3795e11890c908521825120c8a05/src/mongo/db/pipeline/README.md). The following sections describe different components that make up the engine ([the rules](#rules), [the rewrite context](#rewrite-context), and [the engine](#rewrite-engine) itself).
## Rules
The rewrite engine executes rules, which are defined by a name, precondition and transform functions, a priority and a set of tags. The precondition function determines whether the transform function should run. Priority is used to determine the order in which rules are applied when multiple rules may apply to the same element. The tags allow the engine to be invoked to only apply a certain subset of rules.
https://github.com/mongodb/mongo/blob/d8c7211ff2b04e961019b3939500221b94149931/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L51-L81
## Rewrite Engine
The engine is a [generic class](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L164-L165) responsible for driving the rewrite process and maintaining a priority queue of rules that are applicable to the element that is being rewritten. It can be specialized to work with any data structure by providing it with an implementation of the [rewrite context](#rewrite-context) that knows how to walk and modify that structure. The engine is invoked by calling the [`applyRules()`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181) method (see [`optimize.cpp`](https://github.com/mongodb/mongo/blob/126ab84794ef530fd2503453c9f8828743a4e7e7/src/mongo/db/pipeline/optimization/optimize.cpp#L44-L48) for example usage). The rewrite process is essentially a loop that asks the rewrite context for all rules that can apply to the current element, attempts them in priority order, and either advances to the next element or retries the rules on the same element depending on whether any transform reported that it changed the position of the current element.
https://github.com/mongodb/mongo/blob/126ab84794ef530fd2503453c9f8828743a4e7e7/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181-L203
Besides constructing the engine and calling [`applyRules()`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L181), users of the engine should not interact with it directly. Rules never interact with the engine directly either.
## Rewrite Context
The rewrite engine itself is agnostic to the details of the data structure that it is rewriting. It relies on the interface provided by a concrete [`RewriteContext`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L90) implementation to walk and modify the structure, and to decide which rules can apply to which elements. The interface is defined as follows: https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/query/compiler/rewrites/rule_based_rewriter.h#L92-L112
Similarly, rules have access to the context and can use it to enqueue additional rules. The context can also expose additional helpers to rules, e.g. for modifying the structure that is being rewritten. See [`rule_based_rewrites::pipeline::Transforms`](https://github.com/mongodb/mongo/blob/0e6163a2018345a86baf5bd4bff03cefd224daec/src/mongo/db/pipeline/optimization/rule_based_rewriter.h#L202) for an example.