diff --git a/README.md b/README.md index 88018ed5708..752e82ee790 100644 --- a/README.md +++ b/README.md @@ -4,92 +4,89 @@ Welcome to MongoDB! ## Components - - `mongod` - The database server. - - `mongos` - Sharding router. - - `mongo` - The database shell (uses interactive javascript). - +- `mongod` - The database server. +- `mongos` - Sharding router. +- `mongo` - The database shell (uses interactive javascript). ## Download MongoDB - - https://www.mongodb.com/try/download/community - - Using homebrew `brew tap mongodb/brew` - - Using docker image `docker pull mongo` +- https://www.mongodb.com/try/download/community +- Using homebrew `brew tap mongodb/brew` +- Using docker image `docker pull mongo` ## Building - See [Building MongoDB](docs/building.md). +See [Building MongoDB](docs/building.md). ## Running - For command line options invoke: +For command line options invoke: - ```bash - $ ./mongod --help - ``` +```bash +$ ./mongod --help +``` - To run a single server database: +To run a single server database: - ```bash - $ sudo mkdir -p /data/db - $ ./mongod - $ - $ # The mongo javascript shell connects to localhost and test database by default: - $ ./mongo - > help - ``` +```bash +$ sudo mkdir -p /data/db +$ ./mongod +$ +$ # The mongo javascript shell connects to localhost and test database by default: +$ ./mongo +> help +``` ## Installing Compass - You can install compass using the `install_compass` script packaged with MongoDB: +You can install compass using the `install_compass` script packaged with MongoDB: - ```bash - $ ./install_compass - ``` +```bash +$ ./install_compass +``` - This will download the appropriate MongoDB Compass package for your platform - and install it. +This will download the appropriate MongoDB Compass package for your platform +and install it. ## Drivers - Client drivers for most programming languages are available at - https://docs.mongodb.com/manual/applications/drivers/. Use the shell - (`mongo`) for administrative tasks. +Client drivers for most programming languages are available at +https://docs.mongodb.com/manual/applications/drivers/. Use the shell +(`mongo`) for administrative tasks. ## Bug Reports - See https://github.com/mongodb/mongo/wiki/Submit-Bug-Reports. +See https://github.com/mongodb/mongo/wiki/Submit-Bug-Reports. ## Packaging - Packages are created dynamically by the [buildscripts/packager.py](buildscripts/packager.py) script. - This will generate RPM and Debian packages. +Packages are created dynamically by the [buildscripts/packager.py](buildscripts/packager.py) script. +This will generate RPM and Debian packages. -## Learn MongoDB +## Learn MongoDB - - Documentation - https://docs.mongodb.com/manual/ - - Developer Center - https://www.mongodb.com/developer/ - - MongoDB University - https://learn.mongodb.com +- Documentation - https://docs.mongodb.com/manual/ +- Developer Center - https://www.mongodb.com/developer/ +- MongoDB University - https://learn.mongodb.com ## Cloud Hosted MongoDB - https://www.mongodb.com/cloud/atlas +https://www.mongodb.com/cloud/atlas ## Forums - - https://mongodb.com/community/forums/ +- https://mongodb.com/community/forums/ - Technical questions about using MongoDB. + Technical questions about using MongoDB. - - https://mongodb.com/community/forums/c/server-dev - - Technical questions about building and developing MongoDB. +- https://mongodb.com/community/forums/c/server-dev + Technical questions about building and developing MongoDB. ## LICENSE - MongoDB is free and the source is available. Versions released prior to - October 16, 2018 are published under the AGPL. All versions released after - October 16, 2018, including patch fixes for prior versions, are published - under the [Server Side Public License (SSPL) v1](LICENSE-Community.txt). - See individual files for details. - +MongoDB is free and the source is available. Versions released prior to +October 16, 2018 are published under the AGPL. All versions released after +October 16, 2018, including patch fixes for prior versions, are published +under the [Server Side Public License (SSPL) v1](LICENSE-Community.txt). +See individual files for details. diff --git a/README.third_party.md b/README.third_party.md index e22f95108bd..fdd57e9419c 100644 --- a/README.third_party.md +++ b/README.third_party.md @@ -19,54 +19,54 @@ not authored by MongoDB, and has a license which requires reproduction, a notice will be included in `THIRD-PARTY-NOTICES`. -| Name | License | Vendored Version | Emits persisted data | Distributed in Release Binaries | -| ---------------------------| ----------------- | ------------------| :------------------: | :-----------------------------: | -| [abseil-cpp] | Apache-2.0 | 20230802.1 | | ✗ | -| [Aladdin MD5] | Zlib | Unknown | ✗ | ✗ | -| [ASIO] | BSL-1.0 | 1.12.2 | | ✗ | -| [benchmark] | Apache-2.0 | 1.5.2 | | | -| [Boost] | BSL-1.0 | 1.79.0 | | ✗ | -| [c-ares] | MIT | 1.19.1 | | ✗ | -| [double-conversion] | ??? | ??? | | ??? | -| [fmt] | BSD-2-Clause | 7.1.3 | | ✗ | -| [GPerfTools] | BSD-3-Clause | 2.9.1 | | ✗ | -| [gRPC] | Apache-2.0 | 1.59.2 | | ✗ | -| [ICU4] | ICU | 57.1 | ✗ | ✗ | -| [immer] | BSL-1.0 | d98a68c + changes | | ✗ | -| [Intel Decimal FP Library] | BSD-3-Clause | 2.0 Update 1 | | ✗ | -| [JSON-Schema-Test-Suite] | MIT | 728066f9c5 | | | -| [libstemmer] | BSD-3-Clause | Unknown | ✗ | ✗ | -| [librdkafka] | BSD-2-Clause | 2.0.2 | | | -| [libmongocrypt] | Apache-2.0 | 1.8.4 | ✗ | ✗ | -| [linenoise] | BSD-3-Clause | 6cdc775 + changes | | ✗ | -| [mongo-c-driver] | Apache-2.0 | 1.23.0 | ✗ | ✗ | -| [MozJS] | MPL-2.0 | ESR 91.3.0 | | ✗ | -| [MurmurHash3] | Public Domain | a6bd3ce + changes | ✗ | ✗ | -| [ocspbuilder] | MIT | 0.10.2 | | | -| [ocspresponder] | Apache-2.0 | 0.5.0 | | | -| [pcre2] | BSD-3-Clause | 10.40 | | ✗ | -| [protobuf] | BSD-3-Clause | 4.25.0 | | ✗ | -| [re2] | BSD-3-Clause | 2021-09-01 | | ✗ | -| [S2] | Apache-2.0 | Unknown | ✗ | ✗ | -| [SafeInt] | MIT | 3.0.26 | | | -| [schemastore.org] | Apache-2.0 | 6847cfc3a1 | | | -| [scons] | MIT | 3.1.2 | | | -| [Snappy] | BSD-3-Clause | 1.1.10 | ✗ | ✗ | -| [timelib] | MIT | 2022.04 | | ✗ | -| [TomCrypt] | Public Domain | 1.18.2 | ✗ | ✗ | -| [Unicode] | Unicode-DFS-2015 | 8.0.0 | ✗ | ✗ | -| [libunwind] | MIT | 1.6.2 + changes | | ✗ | -| [Valgrind] | BSD-4-Clause\[1] | 3.17.0 | | ✗ | -| [wiredtiger] | | \[2] | ✗ | ✗ | -| [yaml-cpp] | MIT | 0.6.3 | | ✗ | -| [Zlib] | Zlib | 1.3 | ✗ | ✗ | -| [Zstandard] | BSD-3-Clause | 1.5.5 | ✗ | ✗ | +| Name | License | Vendored Version | Emits persisted data | Distributed in Release Binaries | +| -------------------------- | -------------------------------------------------------------- | -------------------------------------------------- | :------------------: | :-----------------------------: | +| [abseil-cpp] | Apache-2.0 | 20230802.1 | | ✗ | +| [Aladdin MD5] | Zlib | Unknown | ✗ | ✗ | +| [ASIO] | BSL-1.0 | 1.12.2 | | ✗ | +| [benchmark] | Apache-2.0 | 1.5.2 | | | +| [Boost] | BSL-1.0 | 1.79.0 | | ✗ | +| [c-ares] | MIT | 1.19.1 | | ✗ | +| [double-conversion] | ??? | ??? | | ??? | +| [fmt] | BSD-2-Clause | 7.1.3 | | ✗ | +| [GPerfTools] | BSD-3-Clause | 2.9.1 | | ✗ | +| [gRPC] | Apache-2.0 | 1.59.2 | | ✗ | +| [ICU4] | ICU | 57.1 | ✗ | ✗ | +| [immer] | BSL-1.0 | d98a68c + changes | | ✗ | +| [Intel Decimal FP Library] | BSD-3-Clause | 2.0 Update 1 | | ✗ | +| [JSON-Schema-Test-Suite] | MIT | 728066f9c5 | | | +| [libstemmer] | BSD-3-Clause | Unknown | ✗ | ✗ | +| [librdkafka] | BSD-2-Clause | 2.0.2 | | | +| [libmongocrypt] | Apache-2.0 | 1.8.4 | ✗ | ✗ | +| [linenoise] | BSD-3-Clause | 6cdc775 + changes | | ✗ | +| [mongo-c-driver] | Apache-2.0 | 1.23.0 | ✗ | ✗ | +| [MozJS] | MPL-2.0 | ESR 91.3.0 | | ✗ | +| [MurmurHash3] | Public Domain | a6bd3ce + changes | ✗ | ✗ | +| [ocspbuilder] | MIT | 0.10.2 | | | +| [ocspresponder] | Apache-2.0 | 0.5.0 | | | +| [pcre2] | BSD-3-Clause | 10.40 | | ✗ | +| [protobuf] | BSD-3-Clause | 4.25.0 | | ✗ | +| [re2] | BSD-3-Clause | 2021-09-01 | | ✗ | +| [S2] | Apache-2.0 | Unknown | ✗ | ✗ | +| [SafeInt] | MIT | 3.0.26 | | | +| [schemastore.org] | Apache-2.0 | 6847cfc3a1 | | | +| [scons] | MIT | 3.1.2 | | | +| [Snappy] | BSD-3-Clause | 1.1.10 | ✗ | ✗ | +| [timelib] | MIT | 2022.04 | | ✗ | +| [TomCrypt] | Public Domain | 1.18.2 | ✗ | ✗ | +| [Unicode] | Unicode-DFS-2015 | 8.0.0 | ✗ | ✗ | +| [libunwind] | MIT | 1.6.2 + changes | | ✗ | +| [Valgrind] | BSD-4-Clause\[1] | 3.17.0 | | ✗ | +| [wiredtiger] | | \[2] | ✗ | ✗ | +| [yaml-cpp] | MIT | 0.6.3 | | ✗ | +| [Zlib] | Zlib | 1.3 | ✗ | ✗ | +| [Zstandard] | BSD-3-Clause | 1.5.5 | ✗ | ✗ | [abseil-cpp]: https://github.com/abseil/abseil-cpp [ASIO]: https://github.com/chriskohlhoff/asio [benchmark]: https://github.com/google/benchmark [Boost]: http://www.boost.org/ -[double-conversion]: https://github.com/google/double-conversion (transitive dependency of MozJS) +[double-conversion]: https://github.com/google/double-conversion "transitive dependency of MozJS" [fmt]: http://fmtlib.net/ [GPerfTools]: https://github.com/gperftools/gperftools [gRPC]: https://github.com/grpc/grpc @@ -132,26 +132,25 @@ these libraries. Releases prepared in this fashion will include a copy of these libraries' license in a file named `THIRD-PARTY-NOTICES.windows`. -| Name | Enterprise Only | Has Windows DLLs | -| :--------- | :-------------: | :--------------: | -| Cyrus SASL | Yes | Yes | -| libldap | Yes | No | -| net-snmp | Yes | Yes | -| OpenSSL | No | Yes\[3] | -| libcurl | No | No | - +| Name | Enterprise Only | Has Windows DLLs | +| :--------- | :-------------: | :-----------------------------------------------------: | +| Cyrus SASL | Yes | Yes | +| libldap | Yes | No | +| net-snmp | Yes | Yes | +| OpenSSL | No | Yes\[3] | +| libcurl | No | No | ## Notes: 1. ^ - The majority of Valgrind is licensed under the GPL, with the exception of a single - header file which is licensed under a BSD license. This BSD licensed header is the only - file from Valgrind which is vendored and consumed by MongoDB. + The majority of Valgrind is licensed under the GPL, with the exception of a single + header file which is licensed under a BSD license. This BSD licensed header is the only + file from Valgrind which is vendored and consumed by MongoDB. 2. ^ - WiredTiger is maintained by MongoDB in a separate repository. As a part of our - development process, we periodically ingest the latest snapshot of that repository. + WiredTiger is maintained by MongoDB in a separate repository. As a part of our + development process, we periodically ingest the latest snapshot of that repository. 3. ^ - OpenSSL is only shipped as a dependency of the MongoDB tools written in Go. The MongoDB - shell and server binaries use Windows' cryptography APIs. + OpenSSL is only shipped as a dependency of the MongoDB tools written in Go. The MongoDB + shell and server binaries use Windows' cryptography APIs. diff --git a/bazel/README.md b/bazel/README.md index cdc4a7e9eac..a0a2ee654c6 100644 --- a/bazel/README.md +++ b/bazel/README.md @@ -1,3 +1,4 @@ # MongoDB Bazel Documentation -- [Developer Workflow](docs/developer_workflow.md) -- [Best Practices](docs/best_practices.md) + +- [Developer Workflow](docs/developer_workflow.md) +- [Best Practices](docs/best_practices.md) diff --git a/bazel/docs/developer_workflow.md b/bazel/docs/developer_workflow.md index 47f4ab3fd37..b6d19f88221 100644 --- a/bazel/docs/developer_workflow.md +++ b/bazel/docs/developer_workflow.md @@ -11,10 +11,10 @@ The Bazel equivalent of SConscript files are BUILD.bazel files. src/mongo/BUILD.bazel would contain: mongo_cc_binary( - name = "hello_world", - srcs = [ - "hello_world.cpp" - ], + name = "hello_world", + srcs = [ + "hello_world.cpp" + ], } Once you've obtained bazelisk by running **evergreen/get_bazelisk.sh**, you can then build this target via "bazelisk build": @@ -25,7 +25,7 @@ Or run this target via "bazelisk run": ./bazelisk run //src/mongo:hello_world -The full target name is a combination between the directory of the BUILD.bazel file and the target name: +The full target name is a combination between the directory of the BUILD.bazel file and the target name: //{BUILD.bazel dir}:{targetname} @@ -36,32 +36,31 @@ Bazel makes use of static analysis wherever possible to improve execution and qu The divergence from SCons is that now source files have to be declared in addition to header files. mongo_cc_binary( - name = "hello_world", - srcs = [ - "hello_world.cpp", - "new_source.cpp" # If adding a source file - ], - hdrs = [ - "new_header.h" # If adding a header file - ], + name = "hello_world", + srcs = [ + "hello_world.cpp", + "new_source.cpp" # If adding a source file + ], + hdrs = [ + "new_header.h" # If adding a header file + ], } ## Adding a New Library The DevProd Build Team created MongoDB-specific macros for the different types of build targets you may want to specify. These include: - - mongo_cc_binary - - mongo_cc_library - - idl_generator - +- mongo_cc_binary +- mongo_cc_library +- idl_generator Creating a new library is similar to the steps above for creating a new binary. A new **mongo_cc_library** definition would be created in the BUILD.bazel file. mongo_cc_library( - name = "new_library", - srcs = [ - "new_library_source_file.cpp" - ] + name = "new_library", + srcs = [ + "new_library_source_file.cpp" + ] } ## Declaring Dependencies @@ -69,20 +68,20 @@ Creating a new library is similar to the steps above for creating a new binary. If a library or binary depends on another library, this must be declared in the **deps** section of the target. The syntax for referring to the library is the same syntax used in the bazelisk build/run command. mongo_cc_library( - name = "new_library", - # ... + name = "new_library", + # ... } - + mongo_cc_binary( - name = "hello_world", - srcs = [ - "hello_world.cpp", - ], - deps = [ - ":new_library", # if referring to the library declared in the same directory as this build file - # "//src/mongo:new_library" # absolute path - # "sub_directory:new_library" # relative path of a subdirectory - ], + name = "hello_world", + srcs = [ + "hello_world.cpp", + ], + deps = [ + ":new_library", # if referring to the library declared in the same directory as this build file + # "//src/mongo:new_library" # absolute path + # "sub_directory:new_library" # relative path of a subdirectory + ], } ## Depending on a Bazel Library in a SCons Build Target @@ -97,7 +96,6 @@ This allows SCons build targets to depend on Bazel build targets directly. The B 'fsync_locked.cpp', ], LIBDEPS=[ - 'new_library', # depend on the bazel "new_library" target defined above - ], + 'new_library', # depend on the bazel "new_library" target defined above + ], ) - diff --git a/bazel/docs/engflow_credential_setup.md b/bazel/docs/engflow_credential_setup.md index 9d0d85bf8de..a8422eeef7b 100644 --- a/bazel/docs/engflow_credential_setup.md +++ b/bazel/docs/engflow_credential_setup.md @@ -5,18 +5,20 @@ MongoDB uses EngFlow to enable remote execution with Bazel. This dramatically sp To install the necessary credentials to enable remote execution, run scons.py with any build command, then follow the setup instructions it prints out. Or: (Only if not in the Engineering org) -- Request access to the MANA group https://mana.corp.mongodbgov.com/resources/659ec4b9bccf3819e5608712 + +- Request access to the MANA group https://mana.corp.mongodbgov.com/resources/659ec4b9bccf3819e5608712 (For everyone) -- Go to https://sodalite.cluster.engflow.com/gettingstarted -- Login with OKTA, then click the "GENERATE AND DOWNLOAD MTLS CERTIFICATE" button - - (If logging in with OKTA doesn't work) Login with Google using your MongoDB email, then click the "GENERATE AND DOWNLOAD MTLS CERTIFICATE" button -- On your local system (usually your MacBook), open a shell terminal and, after setting the variables on the first three lines, run: - REMOTE_USER= - REMOTE_HOST= - ZIP_FILE=~/Downloads/engflow-mTLS.zip +- Go to https://sodalite.cluster.engflow.com/gettingstarted +- Login with OKTA, then click the "GENERATE AND DOWNLOAD MTLS CERTIFICATE" button + - (If logging in with OKTA doesn't work) Login with Google using your MongoDB email, then click the "GENERATE AND DOWNLOAD MTLS CERTIFICATE" button +- On your local system (usually your MacBook), open a shell terminal and, after setting the variables on the first three lines, run: - curl https://raw.githubusercontent.com/mongodb/mongo/master/buildscripts/setup_engflow_creds.sh -o setup_engflow_creds.sh - chmod +x ./setup_engflow_creds.sh - ./setup_engflow_creds.sh $REMOTE_USER $REMOTE_HOST $ZIP_FILE + REMOTE_USER= + REMOTE_HOST= + ZIP_FILE=~/Downloads/engflow-mTLS.zip + + curl https://raw.githubusercontent.com/mongodb/mongo/master/buildscripts/setup_engflow_creds.sh -o setup_engflow_creds.sh + chmod +x ./setup_engflow_creds.sh + ./setup_engflow_creds.sh $REMOTE_USER $REMOTE_HOST $ZIP_FILE diff --git a/buildfarm/README.md b/buildfarm/README.md index da9a7e0f417..eb591749873 100644 --- a/buildfarm/README.md +++ b/buildfarm/README.md @@ -1 +1 @@ -This directory exists to manage a Buildfarm; see docs/bazel.md for more details. \ No newline at end of file +This directory exists to manage a Buildfarm; see docs/bazel.md for more details. diff --git a/buildscripts/antithesis/README.md b/buildscripts/antithesis/README.md index 8cff329010c..3c8c2197670 100644 --- a/buildscripts/antithesis/README.md +++ b/buildscripts/antithesis/README.md @@ -1,52 +1,60 @@ # How to Use Antithesis ## Context -Antithesis is a third party vendor with an environment that can perform network fuzzing. We can -upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to -the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up -the corresponding multi-container application in their environment and run a test suite. Network -fuzzing is performed on the topology while the test suite runs & a report is generated by -Antithesis identifying bugs. Check out -https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to see an example of how we + +Antithesis is a third party vendor with an environment that can perform network fuzzing. We can +upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to +the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up +the corresponding multi-container application in their environment and run a test suite. Network +fuzzing is performed on the topology while the test suite runs & a report is generated by +Antithesis identifying bugs. Check out +https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to see an example of how we use Antithesis today. ## Base Images -The `base_images` directory consists of the building blocks for creating a MongoDB test topology. -These images are uploaded to the Antithesis Docker registry weekly during the -`antithesis_image_push` task. For more visibility into how these images are built and uploaded to + +The `base_images` directory consists of the building blocks for creating a MongoDB test topology. +These images are uploaded to the Antithesis Docker registry weekly during the +`antithesis_image_push` task. For more visibility into how these images are built and uploaded to the Antithesis Docker registry, please see that task. ### mongo_binaries -This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to -start a `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building + +This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to +start a `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building block for creating the System Under Test topology. ### workload -This image contains the latest `mongo` binary as well as the `resmoke` test runner. The `workload` -container is not part of the actual toplogy. The purpose of a `workload` container is to execute -`mongo` commands to complete the topology setup, and to run a test suite on an existing topology + +This image contains the latest `mongo` binary as well as the `resmoke` test runner. The `workload` +container is not part of the actual toplogy. The purpose of a `workload` container is to execute +`mongo` commands to complete the topology setup, and to run a test suite on an existing topology like so: + ```shell buildscript/resmoke.py run --suite antithesis_concurrency_sharded_with_stepdowns_and_balancer ``` **Every topology must have 1 workload container.** -Note: During `workload` image build, `buildscripts/antithesis_suite.py` runs, which generates -"antithesis compatible" test suites and prepends them with `antithesis_`. These are the test suites +Note: During `workload` image build, `buildscripts/antithesis_suite.py` runs, which generates +"antithesis compatible" test suites and prepends them with `antithesis_`. These are the test suites that can run in antithesis and are available from witihin the `workload` container. ## Topologies -The `topologies` directory consists of subdirectories representing various mongo test topologies. + +The `topologies` directory consists of subdirectories representing various mongo test topologies. Each topology has a `Dockerfile`, a `docker-compose.yml` file and a `scripts` directory. ### Dockerfile -This assembles an image with the necessary files for spinning up the corresponding topology. It -consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data` -directory. If this is structured properly, you should be able to copy the files & directories + +This assembles an image with the necessary files for spinning up the corresponding topology. It +consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data` +directory. If this is structured properly, you should be able to copy the files & directories from this image and run `docker-compose up` to set up the desired topology. Example from `buildscripts/antithesis/topologies/sharded_cluster/Dockerfile`: + ```Dockerfile FROM scratch COPY docker-compose.yml / @@ -56,18 +64,20 @@ ADD data /data ADD debug /debug ``` -All topology images are built and uploaded to the Antithesis Docker registry during the -`antithesis_image_push` task. Some of these directories are created during the +All topology images are built and uploaded to the Antithesis Docker registry during the +`antithesis_image_push` task. Some of these directories are created during the `evergreen/antithesis_image_build.sh` script such as `/data` and `/logs`. -Note: These images serve solely as a filesystem containing all necessary files for a topology, +Note: These images serve solely as a filesystem containing all necessary files for a topology, therefore use `FROM scratch`. ### docker-compose.yml - This describes how to construct the corresponding topology using the - `mongo-binaries` and `workload` images. + +This describes how to construct the corresponding topology using the +`mongo-binaries` and `workload` images. Example from `buildscripts/antithesis/topologies/sharded_cluster/docker-compose.yml`: + ```yml version: '3.0' @@ -156,62 +166,71 @@ networks: - subnet: 10.20.20.0/24 ``` -Each container must have a `command` in `docker-compose.yml` that runs an init script. The init -script belongs in the `scripts` directory, which is included as a volume. The `command` should be -set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is +Each container must have a `command` in `docker-compose.yml` that runs an init script. The init +script belongs in the `scripts` directory, which is included as a volume. The `command` should be +set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is a requirement for the topology to start up properly in Antithesis. -When creating `mongod` or `mongos` instances, route the logs like so: -`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. -This enables us to easily retrieve logs if a bug is detected by Antithesis. +When creating `mongod` or `mongos` instances, route the logs like so: +`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`. +This enables us to easily retrieve logs if a bug is detected by Antithesis. -The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to +The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to be affected by network fuzzing. For instance, you would likely not want the `workload` container to be affected by network fuzzing -- as shown in the example above. -Use the `evergreen-latest-master` tag for all images. This is updated automatically in -`evergreen/antithesis_image_build.sh` during the `antithesis_image_build` task -- if needed. +Use the `evergreen-latest-master` tag for all images. This is updated automatically in +`evergreen/antithesis_image_build.sh` during the `antithesis_image_build` task -- if needed. ### scripts -Take a look at `buildscripts/antithesis/topologies/sharded_cluster/scripts/mongos_init.py` to see -how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py` -to set up the desired topology. You can also use simple shell scripts as in the case of -`buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts -must not end in order to keep the underlying container alive. You can use an infinite while -loop for `python` scripts or you can use `tail -f /dev/null` for shell scripts. +Take a look at `buildscripts/antithesis/topologies/sharded_cluster/scripts/mongos_init.py` to see +how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py` +to set up the desired topology. You can also use simple shell scripts as in the case of +`buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts +must not end in order to keep the underlying container alive. You can use an infinite while +loop for `python` scripts or you can use `tail -f /dev/null` for shell scripts. ## How do I create a new topology for Antithesis testing? -To create a new topology for Antithesis testing is easy & requires a few simple steps. -1. Add a new directory in `buildscripts/antithesis/topologies` to represent your desired topology. -You can use existing topologies as an example. -2. Make sure that your workload test suite runs against your topology without any failures. This -may require tagging some tests as `antithesis-incompatible`. -3. Update the `antithesis_image_push` task so that your new topology image is -uploaded to the Antithesis Docker registry. -4. Reach out to #server-testing on Slack & provide the new topology image name as well as the + +To create a new topology for Antithesis testing is easy & requires a few simple steps. + +1. Add a new directory in `buildscripts/antithesis/topologies` to represent your desired topology. + You can use existing topologies as an example. +2. Make sure that your workload test suite runs against your topology without any failures. This + may require tagging some tests as `antithesis-incompatible`. +3. Update the `antithesis_image_push` task so that your new topology image is + uploaded to the Antithesis Docker registry. +4. Reach out to #server-testing on Slack & provide the new topology image name as well as the desired test suite to run. 5. Include the SDP team on the code review. - + These are the required updates to `evergreen/antithesis_image_build.sh`: -- Add the following command for each of your `mongos` and `mongod` containers in your topology to -create your log directories. + +- Add the following command for each of your `mongos` and `mongod` containers in your topology to + create your log directories. + ```shell mkdir -p antithesis/topologies/[topology_name]/{logs,data}/[container_name] ``` - -- Build an image for your new topology ending in `-config` + +- Build an image for your new topology ending in `-config` + ```shell cd [your_topology_dir] sed -i s/evergreen-latest-master/$tag/ docker-compose.yml sudo docker build . -t [your-topology-name]-config:$tag ``` + These are the required updates to `evergreen/antithesis_image_push.sh`: -- Push your new image to the Antithesis Docker registry + +- Push your new image to the Antithesis Docker registry + ```shell sudo docker tag "[your-topology-name]-config:$tag" "us-central1-docker.pkg.dev/molten-verve-216720/mongodb-repository/[your-topology-name]-config:$tag" sudo docker push "us-central1-docker.pkg.dev/molten-verve-216720/mongodb-repository/[your-topology-name]-config:$tag" ``` ## Additional Resources -If you are interested in leveraging Antithesis feel free to reach out to #server-testing on Slack. + +If you are interested in leveraging Antithesis feel free to reach out to #server-testing on Slack. diff --git a/buildscripts/cost_model/README.md b/buildscripts/cost_model/README.md index a435b00023f..4b013e587c3 100644 --- a/buildscripts/cost_model/README.md +++ b/buildscripts/cost_model/README.md @@ -14,10 +14,10 @@ The following assumes you are using python from the MongoDB toolchain. (mongo-python3) deactivate # only if you have another python env activated sh> /opt/mongodbtoolchain/v4/bin/python3 -m venv cm # create new env sh> source cm/bin/activate # activate new env -(cm) python -m pip install -r requirements.txt # install required packages +(cm) python -m pip install -r requirements.txt # install required packages (cm) python start.py # run the calibrator (cm) deactivate # back to bash -sh> +sh> ``` ### Install new packages @@ -25,3 +25,4 @@ sh> ```sh (cm) python -m pip install # install (cm) python -m pip freeze > requirements.txt # do not forget to update requirements.txt +``` diff --git a/buildscripts/docs/suites.md b/buildscripts/docs/suites.md index e02b77ff0f9..a1c1fbe59ac 100644 --- a/buildscripts/docs/suites.md +++ b/buildscripts/docs/suites.md @@ -1,163 +1,184 @@ # Resmoke Test Suites -Resmoke stores test suites represented as `.yml` files in the `buildscripts/resmokeconfig/suites` -directory. These `.yml` files allow users to spin up a variety of configurations to run tests + +Resmoke stores test suites represented as `.yml` files in the `buildscripts/resmokeconfig/suites` +directory. These `.yml` files allow users to spin up a variety of configurations to run tests against. # Suite Fields ## test_kind - [Root Level] -This represents the type of tests that are running in this suite. Some examples include: *js_test, -cpp_unit_test, cpp_integration_tests, benchmark_test, fsm_workload_test, etc.* You can see all + +This represents the type of tests that are running in this suite. Some examples include: _js_test, +cpp_unit_test, cpp_integration_tests, benchmark_test, fsm_workload_test, etc._ You can see all available options in the `_SELECTOR_REGISTRY` at `mongo/buildscripts/resmokelib/selector.py`. Ex: + ```yaml test_kind: js_test ``` ## selector - [Root Level] + The selector determines test files to include/exclude in the suite. Ex: + ```yaml selector: - roots: - - jstests/aggregation/**/*.js - exclude_files: - - jstests/aggregation/extras/*.js - - jstests/aggregation/data/*.js - exclude_with_any_tags: - - requires_pipeline_optimization + roots: + - jstests/aggregation/**/*.js + exclude_files: + - jstests/aggregation/extras/*.js + - jstests/aggregation/data/*.js + exclude_with_any_tags: + - requires_pipeline_optimization ``` + ### selector.roots -File path(s) of test files to include. If a path without a glob is provided, it must exist. + +File path(s) of test files to include. If a path without a glob is provided, it must exist. ### selector.exclude_files + File path(s) of test files to exclude. If a path without a glob is provided, it must exist. ### selector.exclude_with_any_tags -Exclude test files by tag name(s). To see all available tags, run + +Exclude test files by tag name(s). To see all available tags, run `./buildscripts/resmoke.py list-tags`. ## executor - [Root Level] + Configuration for the test execution framework. Ex: + ```yaml executor: - archive: -... - config: -... - hooks: -... - fixture: -... + archive: +--- +config: +--- +hooks: +--- +fixture: ``` ### executor.archive -Upon failure, data files can be uploaded to s3. A failure is when a `hook` or `test` throws an -exception. Data files will be archived in the following situations: + +Upon failure, data files can be uploaded to s3. A failure is when a `hook` or `test` throws an +exception. Data files will be archived in the following situations: + 1. Any `hook` included in this section throws an exception. 2. If `tests: true` and any `test` in the suite throws an exception. Ex: + ```yaml archive: - hooks: - - Hook1 - - Hook2 -... - tests: true + hooks: + - Hook1 + - Hook2 +--- +tests: true ``` ### executor.config -This section contains additional configuration for each test. The structure of this can vary -significantly based on the `test_kind`. For specific information, you can look at the -implementation of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` + +This section contains additional configuration for each test. The structure of this can vary +significantly based on the `test_kind`. For specific information, you can look at the +implementation of the `test_kind` of concern in the `buildscripts/resmokelib/testing/testcases` directory. Ex: + ```yaml config: - shell_options: - global_vars: - TestData: - defaultReadConcernLevel: null - enableMajorityReadConcern: '' - nodb: '' - gssapiServiceName: "mockservice" - eval: >- - var testingReplication = true; - load('jstests/libs/override_methods/set_read_and_write_concerns.js'); - load('jstests/libs/override_methods/enable_causal_consistency_without_read_pref.js'); + shell_options: + global_vars: + TestData: + defaultReadConcernLevel: null + enableMajorityReadConcern: "" + nodb: "" + gssapiServiceName: "mockservice" + eval: >- + var testingReplication = true; + load('jstests/libs/override_methods/set_read_and_write_concerns.js'); + load('jstests/libs/override_methods/enable_causal_consistency_without_read_pref.js'); ``` -Above is an example of the most common `test_kind` -- `js_test`. `js_test` uses `shell_options` to -customize the mongo shell when running tests. -`global_vars` allows for setting global variables. A `TestData` object is a special global variable -that is used to hold testing data. Parts of `TestData` can be updated via `resmoke` command-line -invocation, via `.yml` (as shown above), and during runtime. The global `TestData` object is merged -intelligently and made available to the `js_test` running. Behavior can vary on key collision, but -in general this is the order of precedence: (1) resmoke command-line (2) [suite].yml (3) +Above is an example of the most common `test_kind` -- `js_test`. `js_test` uses `shell_options` to +customize the mongo shell when running tests. + +`global_vars` allows for setting global variables. A `TestData` object is a special global variable +that is used to hold testing data. Parts of `TestData` can be updated via `resmoke` command-line +invocation, via `.yml` (as shown above), and during runtime. The global `TestData` object is merged +intelligently and made available to the `js_test` running. Behavior can vary on key collision, but +in general this is the order of precedence: (1) resmoke command-line (2) [suite].yml (3) runtime/default. -The mongo shell can also be invoked with flags & +The mongo shell can also be invoked with flags & named arguments. Flags must have the `''` value, such as in the case for `nodb` above. -`eval` can also be used to run generic javascript code in the shell. You can directly include +`eval` can also be used to run generic javascript code in the shell. You can directly include javascript code, or you can put it in a separate script & `load` it. ### executor.hooks -All hooks inherit from the `buildscripts.resmokelib.testing.hooks.interface.Hook` parent class and -can override any subset of the following empty base methods: `before_suite`, `after_suite`, -`before_test`, `after_test`. At least 1 base method must be overridden, otherwise the hook will -not do anything at all. During test suite execution, each hook runs its custom logic in the -respective scenarios. Some customizable tasks that hooks can perform include: *validating data, -deleting data, performing cleanup*, etc. You can see all existing hooks in the + +All hooks inherit from the `buildscripts.resmokelib.testing.hooks.interface.Hook` parent class and +can override any subset of the following empty base methods: `before_suite`, `after_suite`, +`before_test`, `after_test`. At least 1 base method must be overridden, otherwise the hook will +not do anything at all. During test suite execution, each hook runs its custom logic in the +respective scenarios. Some customizable tasks that hooks can perform include: _validating data, +deleting data, performing cleanup_, etc. You can see all existing hooks in the `buildscripts/resmokelib/testing/hooks` directory. Ex: + ```yaml hooks: - - class: CheckReplOplogs - - class: CheckReplDBHash - - class: ValidateCollections - - class: CleanEveryN - n: 20 - - class: MyHook - param1: something - param2: somethingelse + - class: CheckReplOplogs + - class: CheckReplDBHash + - class: ValidateCollections + - class: CleanEveryN + n: 20 + - class: MyHook + param1: something + param2: somethingelse ``` -The hook name in the `.yml` must match its Python class name in the -`buildscripts/resmokelib/testing/hooks` directory. Parameters can also be included in the `.yml` -and will be passed to the hook's constructor (the `hook_logger` & `fixture` parameters are +The hook name in the `.yml` must match its Python class name in the +`buildscripts/resmokelib/testing/hooks` directory. Parameters can also be included in the `.yml` +and will be passed to the hook's constructor (the `hook_logger` & `fixture` parameters are automatically included, so those should not be included in the `.yml`). ### executor.fixture -This represents the test fixture to run tests against. The `class` sub-field corresponds to the -Python class name of a fixture in the `buildscripts/resmokelib/testing/fixtures` directory. All -other sub-fields are passed into the constructor of the fixture. These sub-fields will vary based + +This represents the test fixture to run tests against. The `class` sub-field corresponds to the +Python class name of a fixture in the `buildscripts/resmokelib/testing/fixtures` directory. All +other sub-fields are passed into the constructor of the fixture. These sub-fields will vary based on the fixture used. Ex: + ```yaml fixture: - class: ShardedClusterFixture - num_shards: 2 - mongos_options: - bind_ip_all: '' - set_parameters: - enableTestCommands: 1 - mongod_options: - bind_ip_all: '' - set_parameters: - enableTestCommands: 1 - periodicNoopIntervalSecs: 1 - writePeriodicNoops: true + class: ShardedClusterFixture + num_shards: 2 + mongos_options: + bind_ip_all: "" + set_parameters: + enableTestCommands: 1 + mongod_options: + bind_ip_all: "" + set_parameters: + enableTestCommands: 1 + periodicNoopIntervalSecs: 1 + writePeriodicNoops: true ``` ## Examples -For inspiration on creating a new test suite, you can check out a variety of examples in the -`buildscripts/resmokeconfig/suites` directory. + +For inspiration on creating a new test suite, you can check out a variety of examples in the +`buildscripts/resmokeconfig/suites` directory. diff --git a/buildscripts/eslint/README.md b/buildscripts/eslint/README.md index bba2f46862d..56aa113b22a 100644 --- a/buildscripts/eslint/README.md +++ b/buildscripts/eslint/README.md @@ -4,60 +4,68 @@ 1. Install the latest [Node.js](https://nodejs.org/en/download/) if you don't have it. 2. Install [pkg](https://www.npmjs.com/package/pkg) with npm. - ``` - npm install -g pkg - ``` + ``` + npm install -g pkg + ``` 3. Get [ESLint](https://github.com/eslint/eslint) source code. - ``` - git clone git@github.com:eslint/eslint.git - ``` + ``` + git clone git@github.com:eslint/eslint.git + ``` 4. Checkout the latest version using git tag. - ``` - cd eslint - git checkout v${version} - ``` + ``` + cd eslint + git checkout v${version} + ``` 5. Add pkg options to `package.json` file. - ``` - "pkg": { - "scripts": [ "conf/**/*", "lib/**/*", "messages/**/*" ], - "targets": [ "linux-x64", "macos-x64" ] - # "targets": [ "linux-arm" ] - }, - ``` + ``` + "pkg": { + "scripts": [ "conf/**/*", "lib/**/*", "messages/**/*" ], + "targets": [ "linux-x64", "macos-x64" ] + # "targets": [ "linux-arm" ] + }, + ``` 6. Run pkg command to make ESLint executables. - ``` - npm install - pkg . - ``` + ``` + npm install + pkg . + ``` 7. Check that executables are working. Copy files to somewhere in your PATH and try to run it. - Depending on your system - ``` - eslint-linux --help - ``` - or - ``` - eslint-macos --help - ``` - or (if you are on arm) - ``` - eslint --help - ``` + Depending on your system -(*) If executable fails to find some .js files there are [extra steps](#extra-steps) + ``` + eslint-linux --help + ``` + + or + + ``` + eslint-macos --help + ``` + + or (if you are on arm) + + ``` + eslint --help + ``` + +(\*) If executable fails to find some .js files there are [extra steps](#extra-steps) required to be done before step 6. ### Prepare archives Rename produced files. + ``` mv eslint-linux eslint-Linux-x86_64 mv eslint-macos eslint-Darwin-x86_64 # arm # mv eslint eslint-Linux-arm64 ``` + Archive files. (No leading v in version e.g. 8.28.0 NOT v8.28.0) + ``` tar -czvf eslint-${version}-linux-x86_64.tar.gz eslint-Linux-x86_64 tar -czvf eslint-${version}-darwin.tar.gz eslint-Darwin-x86_64 @@ -68,17 +76,20 @@ tar -czvf eslint-${version}-darwin.tar.gz eslint-Darwin-x86_64 ### Upload archives to `boxes.10gen.com` Archives should be available by the following links: + ``` https://s3.amazonaws.com/boxes.10gen.com/build/eslint-${version}-linux-x86_64.tar.gz https://s3.amazonaws.com/boxes.10gen.com/build/eslint-${version}-darwin.tar.gz # arm # https://s3.amazonaws.com/boxes.10gen.com/build/eslint-${version}-linux-arm64.tar.gz ``` + Build team has an access to do that. You can create a build ticket in Jira for them to do it (e.g. https://jira.mongodb.org/browse/BUILD-12984) ### Update ESLint version in `buildscripts/eslint.py` + ``` # Expected version of ESLint. ESLINT_VERSION = "${version}" @@ -91,6 +102,7 @@ and force include files using `assets` or `scripts` options might not help. For the ESLint version 7.22.0 and 8.28.0 the following change was applied to the source code to make everything work: + ``` diff --git a/lib/cli-engine/cli-engine.js b/lib/cli-engine/cli-engine.js index b1befaa04..e02230f83 100644 @@ -99,7 +111,7 @@ index b1befaa04..e02230f83 100644 @@ -987,43 +987,35 @@ class CLIEngine { */ getFormatter(format) { - + - // default is stylish - const resolvedFormatName = format || "stylish"; - diff --git a/buildscripts/iwyu/README.md b/buildscripts/iwyu/README.md index 2e925d7500a..a79bad1049a 100644 --- a/buildscripts/iwyu/README.md +++ b/buildscripts/iwyu/README.md @@ -1,19 +1,19 @@ # IWYU Analysis tool -This tool will run -[include-what-you-use](https://github.com/include-what-you-use/include-what-you-use) +This tool will run +[include-what-you-use](https://github.com/include-what-you-use/include-what-you-use) (IWYU) analysis across the codebase via `compile_commands.json`. -The `iwyu_config.yml` file consists of the current options and automatic +The `iwyu_config.yml` file consists of the current options and automatic pragma marking. You can exclude files from the analysis here. -The tool has two main modes of operation, `fix` and `check` modes. `fix` -mode will attempt to make changes to the source files based off IWYU's -suggestions. The check mode will simply check if there are any suggestion +The tool has two main modes of operation, `fix` and `check` modes. `fix` +mode will attempt to make changes to the source files based off IWYU's +suggestions. The check mode will simply check if there are any suggestion at all. -`fix` mode will take a long time to run, as the tool needs to rerun any -source in which a underlying header was changed to ensure things are not +`fix` mode will take a long time to run, as the tool needs to rerun any +source in which a underlying header was changed to ensure things are not broken, and so therefore ends up recompile the codebase several times over. For more information please refer the the script `--help` option. @@ -31,29 +31,30 @@ Next you can run the analysis: ``` python3 buildscripts/iwyu/run_iwyu_analysis.py ``` -The default mode is fix mode, and it will start making changes to the code + +The default mode is fix mode, and it will start making changes to the code if any changes are found. # Debugging failures -Occasionally IWYU tool will run into problems where it is unable to suggest -valid changes and the changes will cause things to break (not compile). When -it his a failure it will copy the source and all the header's that were used -at the time of the compilation into a directory where the same command can be +Occasionally IWYU tool will run into problems where it is unable to suggest +valid changes and the changes will cause things to break (not compile). When +it his a failure it will copy the source and all the header's that were used +at the time of the compilation into a directory where the same command can be run to reproduce the error. -You can examine the suggested changes in the source and headers and compare +You can examine the suggested changes in the source and headers and compare them to the working source tree. Then you can make corrective changes to allow - IWYU to get past the failure. +IWYU to get past the failure. -IWYU is not perfect and it make several mistakes that a human can understand +IWYU is not perfect and it make several mistakes that a human can understand and fix appropriately. # Running the tests -This tool includes its own end to end testing. The test directory includes -sub directories which contain source and iwyu configs to run the tool against. -The tests will then compare the results to built in expected results and fail +This tool includes its own end to end testing. The test directory includes +sub directories which contain source and iwyu configs to run the tool against. +The tests will then compare the results to built in expected results and fail if the the tests are not producing the expected results. To run the tests use the command: diff --git a/buildscripts/resmokeconfig/matrix_suites/README.md b/buildscripts/resmokeconfig/matrix_suites/README.md index e0cd8e5353f..68f22dffd77 100644 --- a/buildscripts/resmokeconfig/matrix_suites/README.md +++ b/buildscripts/resmokeconfig/matrix_suites/README.md @@ -1,6 +1,7 @@ # Matrix Resmoke.py Suites ## Summary + Matrix Suites are defined as a combination of explict suite files (in `buildscripts/resmokeconfig/suites` by default) and a set of "overrides" for specific keys. The intention is @@ -10,10 +11,12 @@ fully composed of reusable sections, similar to how Genny's workloads are defined as a set of parameterized `PhaseConfig`s. ## Usage + Matrix suites behave like regular suites for all functionality in resmoke.py, including `list-suites`, `find-suites` and `run --suites=[SUITE]`. ## Writing a matrix suite mapping file. + Matrix suites consist of a mapping, and a set of overrides in their eponymous directories. When you are done writing the mapping file, you must [generate the matrix suite file.](#generating-matrix-suites) @@ -24,6 +27,7 @@ modifiers. There is also an optional `decription` field that will get output with the local resmoke invocation. The fields of modifiers are the following: + 1. overrides 2. excludes 3. eval @@ -35,6 +39,7 @@ For example `encryption.mongodfixture_ese` would reference the `mongodfixture_es inside of the `encryption.yml` file inside of the `overrides` directory. ### overrides + All fields referenced in the `overrides` section of the mappings file will overwrite the specified fields in the `base_suite`. The `overrides` modifier takes precidence over the `excludes` and `eval` modifiers. @@ -42,22 +47,26 @@ The `overrides` list will be processed in order so order can matter if multiple try to overwrite the same field in the base_suite. ### excludes + All fields referenced in the `excludes` section of the mappings file will append to the specified `exclude` fields in the base suite. The only two valid options in the referenced modifier field are `exclude_with_any_tags` and `exclude_files`. They are appended in the order they are specified in the mappings file. ### eval + All fields referenced in the `eval` section of the mappings file will append to the specified `config.shell_options.eval` field in the base suite. They are appended in the order they are specified in the mappings file. ### extends + All fields referenced in the `extends` section of the mappings file must be lists, and will be appended to the correspending keys on the same path. When extends is applied (after the other modifiers), the key being extended must already exist and also be a list. ## Generating matrix suites + The generated matrix suites live in the `buildscripts/resmokeconfig/matrix_suites/generated_suites` directory. These files may be edited for local testing but must remain consistent with the mapping files. There is a task in the commit queue that enforces this. To generate a new version of these @@ -66,10 +75,12 @@ will overwrite the current generated matrix suites on disk so make sure you do n changes to these files. ## Validating matrix suites + All matrix suites are validated whenever they are run to ensure that the mapping file and the generated suite file are in sync. The `resmoke_validation_tests` task in the commit queue also ensures that the files are validated. ## FAQ + For questions about the user or authorship experience, please reach out in #server-testing. diff --git a/buildscripts/resmokelib/hang_analyzer/README.md b/buildscripts/resmokelib/hang_analyzer/README.md index cba4083c157..d441dee04da 100644 --- a/buildscripts/resmokelib/hang_analyzer/README.md +++ b/buildscripts/resmokelib/hang_analyzer/README.md @@ -1,26 +1,30 @@ ## Running the core analyzer There are two main ways of running the core analyzer. + 1. Running the core analyzer with local core dumps and binaries. 2. Running the core analyzer with core dumps and binaries from an evergreen task. Note that some analysis might fail if you are not on the same AMI (Amazon Machine Image) that the task was run on. To run the core analyzer with local core dumps and binaries: + ``` python3 buildscripts/resmoke.py core-analyzer ``` + This will look for binaries in the build/install directory, and it will look for core dumps in the current directory. If your local environment is different you can include `--install-dir` and `--core-dir` in your invocation to specify other locations. To run the core analyzer with core dumps and binaries from an evergreen task: + ``` python3 buildscripts/resmoke.py core-analyzer --task-id={task_id} ``` + This will download all of the core dumps and binaries from the task and put them into the configured `--working-dir`, this defaults to the `core-analyzer` directory. All of the task analysis will be added to the `analysis` directory inside the configured `--working-dir`. Note: Currently the core analyzer only runs on linux. Windows uses the legacy hang analyzer but will be switched over when we run into issues or have time to do the transition. We have not tackled the problem of getting core dumps on macOS so we have no core dump analysis on that operating system. - ### Getting core dumps ```mermaid @@ -31,17 +35,21 @@ sequenceDiagram Hang Analyzer ->> Core Dumps: Attach to pid and generate core dumps ``` -When a task times out, it hits the [timeout](https://github.com/10gen/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694) section in the defined evergreen config. +When a task times out, it hits the [timeout](https://github.com/10gen/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2694) section in the defined evergreen config. In this timeout section, we run [this](https://github.com/10gen/mongo/blob/a6e56a8e136fe554dc90565bf6acf5bf86f7a46e/etc/evergreen_yml_components/definitions.yml#L2302) task which runs the hang-analyzer with the following invocation: + ``` python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python ``` + This tells the hang-analyzer to look for all of the python processes (we are specifically looking for resmoke) on the machine and to signal them. When resmoke is [signaled](https://github.com/10gen/mongo/blob/08a99b15eea7ae0952b2098710d565dd7f709ff6/buildscripts/resmokelib/sighandler.py#L25), it again invokes the hang analyzer with the specific pids of it's child processes. It will look similar to this most of the time: + ``` python3 buildscripts/resmoke.py hang-analyzer -o file -o stdout -k -c -d pid1,pid2,pid3 ``` + The things to note here are the `-k` which kills the process and `-c` which takes core dumps. The resulting core dumps are put into the current running directory. When a task fails normally, core dumps may also be generated by the linux kernel and put into the working directory. @@ -56,7 +64,6 @@ After investigation of the above issue, we found that compressing and uploading We made a [script](https://github.com/10gen/mongo/blob/master/buildscripts/fast_archive.py) that gzips all of the core dumps in parallel and uploads them to S3 individually asynchronously. This solved all of the problems listed above. - ### Generating the core analyzer task ```mermaid diff --git a/buildscripts/resmokelib/powercycle/README.md b/buildscripts/resmokelib/powercycle/README.md index 1f15e833a68..09f86bc5538 100644 --- a/buildscripts/resmokelib/powercycle/README.md +++ b/buildscripts/resmokelib/powercycle/README.md @@ -12,6 +12,7 @@ Powercycle test is the part of resmoke. Python 3.10+ with python venv is require run the resmoke (python3 from [mongodbtoolchain](http://mongodbtoolchain.build.10gen.cc/) is highly recommended). Python venv can be set up by running in the root mongo repo directory: + ``` python3 -m venv python3-venv source python3-venv/bin/activate @@ -19,36 +20,40 @@ pip install -r buildscripts/requirements.txt ``` If python venv is already set up activate it before running the resmoke: + ``` source python3-venv/bin/activate ``` There are several commands that can be run by calling resmoke powercycle subcommand: + ``` python buildscripts/resmoke.py powercycle --help ``` The main entry point of resmoke powercycle subcommand is located in this file: + ``` buildscripts/resmokelib/powercycle/__init__.py ``` ## Poweryclce main steps -- [Set up EC2 instance](#set-up-ec2-instance) -- [Run powercycle test](#run-powercycle-test) - - [Resmoke powercycle run arguments](#resmoke-powercycle-run-arguments) - - [Powercycle test implementation](#powercycle-test-implementation) -- [Save diagnostics](#save-diagnostics) -- [Remote hang analyzer (optional)](#remote-hang-analyzer-optional) +- [Set up EC2 instance](#set-up-ec2-instance) +- [Run powercycle test](#run-powercycle-test) + - [Resmoke powercycle run arguments](#resmoke-powercycle-run-arguments) + - [Powercycle test implementation](#powercycle-test-implementation) +- [Save diagnostics](#save-diagnostics) +- [Remote hang analyzer (optional)](#remote-hang-analyzer-optional) ### Set up EC2 instance 1. `Evergreen host.create command` - in Evergreen the remote host is created with -the same distro as the localhost runs and some initial connections are made to ensure -it's up before further steps + the same distro as the localhost runs and some initial connections are made to ensure + it's up before further steps 2. `Resmoke powercycle setup-host command` - prepares remote host via ssh to run -the powercycle test: + the powercycle test: + ``` python buildscripts/resmoke.py powercycle setup-host ``` @@ -59,25 +64,28 @@ Powercycle setup-host operations are located in created by `expansions.write` command in Evergreen. It runs several operations via ssh: -- create directory on the remote host -- copy `buildscripts` and `mongoDB executables` from localhost to the remote host -- set up python venv on the remote host -- set up curator to collect system & process stats on the remote host -- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) -to crash Windows (only on Windows) + +- create directory on the remote host +- copy `buildscripts` and `mongoDB executables` from localhost to the remote host +- set up python venv on the remote host +- set up curator to collect system & process stats on the remote host +- install [NotMyFault](https://docs.microsoft.com/en-us/sysinternals/downloads/notmyfault) + to crash Windows (only on Windows) Remote operation via ssh implementation is located in `buildscripts/resmokelib/powercycle/lib/remote_operations.py`. The following operations are supported: -- `copy_to` - copy files from the localhost to the remote host -- `copy_from` - copy files from the remote host to the localhost -- `shell` - runs shell command on the remote host + +- `copy_to` - copy files from the localhost to the remote host +- `copy_from` - copy files from the remote host to the localhost +- `shell` - runs shell command on the remote host ### Run powercycle test `Resmoke powercycle run command` - runs the powercycle test on the localhost which runs remote operations on the remote host via ssh and local validation checks: + ``` python buildscripts/resmoke.py powercycle run \ --sshUserHost=${user_name}@${host_ip} \ @@ -85,7 +93,7 @@ python buildscripts/resmoke.py powercycle run \ --taskName=${task_name} ``` -###### Resmoke powercycle run arguments +###### Resmoke powercycle run arguments The arguments for resmoke powercycle run command are defined in `add_subcommand()` function in `buildscripts/resmokelib/powercycle/__init__.py`. When powercycle test @@ -107,41 +115,43 @@ The powercycle test main implementation is located in `main()` function in The value of `--remoteOperation` argument is used to distinguish if we are running the script on the localhost or on the remote host. `remote_handler()` function performs the following remote operations: -- `noop` - do nothing -- `crash_server` - internally crash the server -- `kill_mongod` - kill mongod process -- `install_mongod` - install mongod -- `start_mongod` - start mongod process -- `stop_mongod` - stop mongod process -- `shutdown_mongod` - run shutdown command using mongo client -- `rsync_data` - backups mongod data -- `seed_docs` - seed a collection with random document values -- `set_fcv` - run set FCV command using mongo client -- `check_disk` - run `chkdsk` command on Windows + +- `noop` - do nothing +- `crash_server` - internally crash the server +- `kill_mongod` - kill mongod process +- `install_mongod` - install mongod +- `start_mongod` - start mongod process +- `stop_mongod` - stop mongod process +- `shutdown_mongod` - run shutdown command using mongo client +- `rsync_data` - backups mongod data +- `seed_docs` - seed a collection with random document values +- `set_fcv` - run set FCV command using mongo client +- `check_disk` - run `chkdsk` command on Windows When running on localhost the powercycle test loops do the following steps: -- Rsync the database post-crash (starting from the 2nd loop), pre-recovery on the remote host - - makes a backup before recovery -- Start mongod on the secret port on the remote host and wait for it to recover - - also sets FCV and seeds documents on the 1st loop -- Validate canary from the localhost (starting from the 2nd loop) - - uses mongo client to connect to the remote mongod -- Validate collections from the localhost - - calls resmoke to perform the validation on the remote mongod -- Shutdown mongod on the remote host -- Rsync the database post-recovery on the remote host - - makes a backup after recovery -- Start mongod on the standard port on the remote host -- Start CRUD and FSM clients on the localhost - - calls resmoke to run CRUD and FSM clients -- Generate canary document from the localhost - - uses mongo client to connect to the remote mongod -- Crash the remote server or kill mongod on the remote host - - most of the powercycle tasks do crashes -- Run check disk on the remote host (on Windows) -- Exit loop if one of these occurs: - - loop number exceeded - - any step fails + +- Rsync the database post-crash (starting from the 2nd loop), pre-recovery on the remote host + - makes a backup before recovery +- Start mongod on the secret port on the remote host and wait for it to recover + - also sets FCV and seeds documents on the 1st loop +- Validate canary from the localhost (starting from the 2nd loop) + - uses mongo client to connect to the remote mongod +- Validate collections from the localhost + - calls resmoke to perform the validation on the remote mongod +- Shutdown mongod on the remote host +- Rsync the database post-recovery on the remote host + - makes a backup after recovery +- Start mongod on the standard port on the remote host +- Start CRUD and FSM clients on the localhost + - calls resmoke to run CRUD and FSM clients +- Generate canary document from the localhost + - uses mongo client to connect to the remote mongod +- Crash the remote server or kill mongod on the remote host + - most of the powercycle tasks do crashes +- Run check disk on the remote host (on Windows) +- Exit loop if one of these occurs: + - loop number exceeded + - any step fails `exit_handler()` function writes a report and does cleanups any time after the test run exits. @@ -149,6 +159,7 @@ When running on localhost the powercycle test loops do the following steps: `Resmoke powercycle save-diagnostics command` - copies powercycle diagnostics files from the remote host to the localhost (mainly used by Evergreen): + ``` python buildscripts/resmoke.py powercycle save-diagnostics ``` @@ -159,25 +170,27 @@ Powercycle save-diagnostics operations are located in created by `expansions.write` command in Evergreen. It runs several operations via ssh: -- `gatherRemoteEventLogs` - - runs on Windows -- `tarEC2Artifacts` - - on success archives `mongod.log` - - on failure additionally archives data files and all before-recovery and after-recovery backups - - on failure on Windows additionally archives event logs -- `copyEC2Artifacts` - - from the remote host to the localhost -- `copyEC2MonitorFiles` - - from the remote host to the localhost -- `gatherRemoteMongoCoredumps` - - copies all mongo core dumps to a single directory -- `copyRemoteMongoCoredumps` - - from the remote host to the localhost + +- `gatherRemoteEventLogs` + - runs on Windows +- `tarEC2Artifacts` + - on success archives `mongod.log` + - on failure additionally archives data files and all before-recovery and after-recovery backups + - on failure on Windows additionally archives event logs +- `copyEC2Artifacts` + - from the remote host to the localhost +- `copyEC2MonitorFiles` + - from the remote host to the localhost +- `gatherRemoteMongoCoredumps` + - copies all mongo core dumps to a single directory +- `copyRemoteMongoCoredumps` + - from the remote host to the localhost ### Remote hang analyzer (optional) `Resmoke powercycle remote-hang-analyzer command` - runs hang analyzer on the remote host (mainly used by Evergreen): + ``` $python buildscripts/resmoke.py powercycle remote-hang-analyzer ``` diff --git a/buildscripts/tests/resmoke_end2end/README.md b/buildscripts/tests/resmoke_end2end/README.md index ef3fa3b1ea8..08987356c22 100644 --- a/buildscripts/tests/resmoke_end2end/README.md +++ b/buildscripts/tests/resmoke_end2end/README.md @@ -1,9 +1,11 @@ -* All end-to-end resmoke tests can be run via a resmoke suite itself: +- All end-to-end resmoke tests can be run via a resmoke suite itself: + ``` mongodb_repo_root$ /opt/mongodbtoolchain/v4/bin/python3 buildscripts/resmoke.py run --suites resmoke_end2end_tests ``` -* Finer grained control of tests can also be run with by invoking python's unittest main by hand. E.g: +- Finer grained control of tests can also be run with by invoking python's unittest main by hand. E.g: + ``` mongodb_repo_root$ /opt/mongodbtoolchain/v4/bin/python3 -m unittest -v buildscripts.tests.resmoke_end2end.test_resmoke.TestTestSelection.test_at_sign_as_replay_file ``` diff --git a/docs/README.md b/docs/README.md index a88ddcf52c1..5273f7c9b1c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,26 +1,25 @@ -MongoDB Server Documentation -============ +# MongoDB Server Documentation This is just some internal documentation. For the full MongoDB docs, please see [mongodb.org](http://www.mongodb.org/) -* [Batons](baton.md) -* [Build System](build_system.md) -* [Build System Reference](build_system_reference.md) -* [Building MongoDB](building.md) -* [Command Dispatch](command_dispatch.md) -* [Contextual Singletons](contexts.md) -* [Egress Networking](egress_networking.md) -* [Exception Architecture](exception_architecture.md) -* [Fail Points](fail_points.md) -* [Futures and Promises](futures_and_promises.md) -* [Load Balancer Support](load_balancer_support.md) -* [Memory Management](memory_management.md) -* [Parsing Stack Traces](parsing_stack_traces.md) -* [Primary-only Services](primary_only_service.md) -* [Security Architecture Guide](security_guide.md) -* [Server Parameters](server-parameters.md) -* [String Manipulation](string_manipulation.md) -* [Thread Pools](thread_pools.md) -* [MongoDB Voluntary Product Accessibility Template® (VPAT™)](vpat.md) +- [Batons](baton.md) +- [Build System](build_system.md) +- [Build System Reference](build_system_reference.md) +- [Building MongoDB](building.md) +- [Command Dispatch](command_dispatch.md) +- [Contextual Singletons](contexts.md) +- [Egress Networking](egress_networking.md) +- [Exception Architecture](exception_architecture.md) +- [Fail Points](fail_points.md) +- [Futures and Promises](futures_and_promises.md) +- [Load Balancer Support](load_balancer_support.md) +- [Memory Management](memory_management.md) +- [Parsing Stack Traces](parsing_stack_traces.md) +- [Primary-only Services](primary_only_service.md) +- [Security Architecture Guide](security_guide.md) +- [Server Parameters](server-parameters.md) +- [String Manipulation](string_manipulation.md) +- [Thread Pools](thread_pools.md) +- [MongoDB Voluntary Product Accessibility Template® (VPAT™)](vpat.md) diff --git a/docs/baton.md b/docs/baton.md index 0196451f482..e2c4304b4cf 100644 --- a/docs/baton.md +++ b/docs/baton.md @@ -1,65 +1,64 @@ # Server-Internal Baton Pattern -Batons are lightweight job queues in *mongod* and *mongos* processes that allow -recording the intent to execute a task (e.g., polling on a network socket) and -deferring its execution to a later time. Batons, often by reusing `Client` -threads and through the *Waitable* interface, move the execution of scheduled -tasks out of the line, potentially hiding the execution cost from the critical +Batons are lightweight job queues in _mongod_ and _mongos_ processes that allow +recording the intent to execute a task (e.g., polling on a network socket) and +deferring its execution to a later time. Batons, often by reusing `Client` +threads and through the _Waitable_ interface, move the execution of scheduled +tasks out of the line, potentially hiding the execution cost from the critical path. A total of four baton classes are available today: -- [Baton][baton] -- [DefaultBaton][defaultBaton] -- [NetworkingBaton][networkingBaton] -- [AsioNetworkingBaton][asioNetworkingBaton] +- [Baton][baton] +- [DefaultBaton][defaultBaton] +- [NetworkingBaton][networkingBaton] +- [AsioNetworkingBaton][asioNetworkingBaton] ## Baton Hierarchy -All baton implementations extend *Baton*. They all expose an interface to allow -scheduling tasks on the baton, to demand the awakening of the baton on client -socket disconnect, and to create a *SubBaton*. A *SubBaton*, for any of the -baton types, is essentially a handle to a local object that proxies scheduling -requests to its underlying baton until it is detached (e.g., through destruction +All baton implementations extend _Baton_. They all expose an interface to allow +scheduling tasks on the baton, to demand the awakening of the baton on client +socket disconnect, and to create a _SubBaton_. A _SubBaton_, for any of the +baton types, is essentially a handle to a local object that proxies scheduling +requests to its underlying baton until it is detached (e.g., through destruction of its handle). -Additionally, a *NetworkingBaton* enables consumers of a transport layer to -execute I/O themselves, rather than delegating it to other threads. They are -special batons that are able to poll network sockets, which is not feasible -through other baton types. This is essential for minimizing context switches and +Additionally, a _NetworkingBaton_ enables consumers of a transport layer to +execute I/O themselves, rather than delegating it to other threads. They are +special batons that are able to poll network sockets, which is not feasible +through other baton types. This is essential for minimizing context switches and improving the readability of stack traces. ### DefaultBaton -DefaultBaton is the most basic baton implementation. A default baton is tightly -associated with an `OperationContext`, and its associated `Client` thread. This -baton provides the platform to execute tasks while a client thread awaits an -event or a timeout (e.g., via `OperationContext::sleepUntil(...)`), essentially -paving the way towards utilizing idle cycles of client threads for useful work. -Tasks can be scheduled on this baton through its associated `OperationContext` +DefaultBaton is the most basic baton implementation. A default baton is tightly +associated with an `OperationContext`, and its associated `Client` thread. This +baton provides the platform to execute tasks while a client thread awaits an +event or a timeout (e.g., via `OperationContext::sleepUntil(...)`), essentially +paving the way towards utilizing idle cycles of client threads for useful work. +Tasks can be scheduled on this baton through its associated `OperationContext` and using `OperationContext::getBaton()::schedule(...)`. -Note that this baton is not available for an `OperationContext` that belongs to -a `ServiceContext` with an `AsioTransportLayer` transport layer. In that case, -the aforementioned interface will return a handle to *AsioNetworkingBaton*. +Note that this baton is not available for an `OperationContext` that belongs to +a `ServiceContext` with an `AsioTransportLayer` transport layer. In that case, +the aforementioned interface will return a handle to _AsioNetworkingBaton_. ### AsioNetworkingBaton -This baton is only available for Linux and extends *NetworkingBaton* to -implement a networking reactor. It utilizes `poll(2)` and `eventfd(2)` to allow +This baton is only available for Linux and extends _NetworkingBaton_ to +implement a networking reactor. It utilizes `poll(2)` and `eventfd(2)` to allow client threads await events without busy polling. ## Example -For an example of scheduling a task on the `OperationContext` baton, see +For an example of scheduling a task on the `OperationContext` baton, see [here][example]. ## Considerations -Since any task scheduled on a baton is intended for out-of-line execution, it +Since any task scheduled on a baton is intended for out-of-line execution, it must be non-blocking and preferably short-lived to ensure forward progress. -[baton]:https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178 +[baton]: https://github.com/mongodb/mongo/blob/5906d967c3144d09fab6a4cc1daddb295df19ffb/src/mongo/db/baton.h#L61-L178 [defaultBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/db/default_baton.h#L46-L75 [networkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton.h#L61-L96 [asioNetworkingBaton]: https://github.com/mongodb/mongo/blob/9cfe13115e92a43d1b9273ee1d5817d548264ba7/src/mongo/transport/baton_asio_linux.h#L60-L529 [example]: https://github.com/mongodb/mongo/blob/262e5a961fa7221bfba5722aeea2db719f2149f5/src/mongo/s/multi_statement_transaction_requests_sender.cpp#L91-L99 - diff --git a/docs/bazel.md b/docs/bazel.md index d1fd35f470a..7735be1fef1 100644 --- a/docs/bazel.md +++ b/docs/bazel.md @@ -1,27 +1,31 @@ (Note: This is a work-in-progress for the SDP team; contact #server-dev-platform for questions) To perform a Bazel build via SCons: -* You must be on a arm64 virtual workstation -* You must generate engflow credentials and store them in the correct location (see below) -* Build the Bazel-compatible target: `python3 ./buildscripts/scons.py BAZEL_BUILD_ENABLED=1 --build-profile=fast --ninja=disabled --link-model=static -j 200 --modules= build/fast/mongo/db/commands/libfsync_locked.a` + +- You must be on a arm64 virtual workstation +- You must generate engflow credentials and store them in the correct location (see below) +- Build the Bazel-compatible target: `python3 ./buildscripts/scons.py BAZEL_BUILD_ENABLED=1 --build-profile=fast --ninja=disabled --link-model=static -j 200 --modules= build/fast/mongo/db/commands/libfsync_locked.a` To generate and install the engflow credentials: -* Navigate to and log in with your mongodb gmail account: https://sodalite.cluster.engflow.com/gettingstarted -* Generate and download the credentials; you will need to move them to the workstation machine (scp, copy paste plain text, etc...) -* Store them (the same filename they downloaded as) on your machine at the default location our build expects: `/engflow/creds/` -* You should run `chmod 600` on them to make sure they are readable only by your user -* If you don't want to use the cluster you can pass `BAZEL_FLAGS=--config=local` on the SCons command line or `--config=local` on the bazel command line -To perform a Bazel build and *bypass* SCons: -* Install Bazelisk: `curl -L https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-arm64 --output /tmp/bazelisk && chmod +x /tmp/bazelisk` -* Build the Bazel-compatible target: `/tmp/bazelisk build --verbose_failures src/mongo/db/commands:fsync_locked` +- Navigate to and log in with your mongodb gmail account: https://sodalite.cluster.engflow.com/gettingstarted +- Generate and download the credentials; you will need to move them to the workstation machine (scp, copy paste plain text, etc...) +- Store them (the same filename they downloaded as) on your machine at the default location our build expects: `/engflow/creds/` +- You should run `chmod 600` on them to make sure they are readable only by your user +- If you don't want to use the cluster you can pass `BAZEL_FLAGS=--config=local` on the SCons command line or `--config=local` on the bazel command line + +To perform a Bazel build and _bypass_ SCons: + +- Install Bazelisk: `curl -L https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-arm64 --output /tmp/bazelisk && chmod +x /tmp/bazelisk` +- Build the Bazel-compatible target: `/tmp/bazelisk build --verbose_failures src/mongo/db/commands:fsync_locked` To perform a Bazel build using a local Buildfarm (to test remote execution capability): -* For more details on Buildfarm, see https://bazelbuild.github.io/bazel-buildfarm -* (One time only) Build and start the Buildfarm: -** Change into the `buildfarm` directory: `cd buildfarm` -** Build the image: `docker-compose build` -** Start the container: `docker-compose up --detach` -** Poll until the containers report status `running`: `docker ps --filter status=running --filter name=buildfarm` -* (Whenever you build): -** Build the Bazel-compatible target with remote execution enabled: `/tmp/bazelisk build --verbose_failures --remote_executor=grpc://localhost:8980 src/mongo/db/commands:fsync_locked` + +- For more details on Buildfarm, see https://bazelbuild.github.io/bazel-buildfarm +- (One time only) Build and start the Buildfarm: + ** Change into the `buildfarm` directory: `cd buildfarm` + ** Build the image: `docker-compose build` + ** Start the container: `docker-compose up --detach` + ** Poll until the containers report status `running`: `docker ps --filter status=running --filter name=buildfarm` +- (Whenever you build): + \*\* Build the Bazel-compatible target with remote execution enabled: `/tmp/bazelisk build --verbose_failures --remote_executor=grpc://localhost:8980 src/mongo/db/commands:fsync_locked` diff --git a/docs/build_system.md b/docs/build_system.md index 6204e4e84f7..71a89f867e0 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -1,76 +1,135 @@ # The MongoDB Build System ## Introduction + ### System requirements and supported platforms ## How to get Help + ### Where to go + ### What to bring when you go there (SCons version, server version, SCons command line, versions of relevant tools, `config.log`, etc.) ## Known Issues + ### Commonly-encountered issues + #### `--disable-warnings-as-errors` + ### Reference to known issues in the ticket system + ### How to report a problem + #### For employees + #### For non-employees ## Set up the build environment + ### Set up the virtualenv and poetry + See [Building Python Prerequisites](building.md#python-prerequisites) + ### The Enterprise Module + #### Getting the module source + #### Enabling the module ## Building the software + ### Commonly-used build targets + ### Building a standard “debug” build + #### `--dbg` + ### What goes where? + #### `$BUILD_ROOT/scons` and its contents + #### `$BUILD_ROOT/$VARIANT_DIR` and its contents + #### `$BUILD_ROOT/install` and its contents + #### `DESTDIR` and `PREFIX` + #### `--build-dir` + ### Running core tests to verify the build + ### Building a standard “release” build + #### `--separate-debug` + ### Installing from the build directory + #### `--install-action` + ### Creating a release archive ## Advanced Builds + ### Compiler and linker options + #### `CC, CXX, CCFLAGS, CFLAGS, CXXFLAGS` + #### `CPPDEFINES and CPPPATH` + #### `LINKFLAGS` + #### `MSVC_VERSION` + #### `VERBOSE` + ### Advanced build options + #### `-j` + #### `--separate-debug` + #### `--link-model` + #### `--allocator` + #### `--cxx-std` + #### `--linker` + #### `--variables-files` + ### Cross compiling + #### `HOST_ARCH` and `TARGET_ARCH` + ### Using Ninja + #### `--ninja` + ### Cached builds + #### Using the SCons build cache + ##### `--cache` + ##### `--cache-dir` + #### Using `ccache` + ##### `CCACHE` + ### Using Icecream + #### `ICECC`, `ICECRUN`, `ICECC_CREATE_ENV` + #### `ICECC_VERSION` and `ICECC_VERSION_ARCH` + #### `ICECC_DEBUG` ## Developer builds + ### Developer build options + #### `MONGO_{VERSION,GIT_HASH}` By default, the server build system consults the local git repository @@ -128,117 +187,196 @@ SCons invocations on almost any branch you are likely to find yourself using. #### Using sanitizers + ##### `--sanitize` + ##### `*SAN_OPTIONS` + #### `--dbg` `--opt` + #### `--build-tools=[stable|next]` + ### Setting up your development environment + #### `mongo_custom_variables.py` + ##### Guidance on what to put in your custom variables + ##### How to suppress use of your custom variables + ##### Useful variables files (e.g. `mongodbtoolchain`) + #### Using the Mongo toolchain + ##### Why do we have our own toolchain? + ##### When is it appropriate to use the MongoDB toolchain? + ##### How do I obtain the toolchain? + ##### How do I upgrade the toolchain? + ##### How do I tell the build system to use it? + ### Creating and using build variants + #### Using `--build-dir` to separate variant build artifacts + #### `BUILD_ROOT` and `BUILD_DIR` + #### `VARIANT_DIR` + #### `NINJA_PREFIX` and `NINJA_SUFFIX` + ### Building older versions + #### Using` git-worktree` + ### Speeding up incremental builds + #### Selecting minimal build targets + #### Compiler arguments + ##### `-gsplit-dwarf` and `/DEBUG:FASTLINK` -#### Don’t reinstall what you don’t have to (*NIX only) + +#### Don’t reinstall what you don’t have to (\*NIX only) + ##### `--install-action=hardlink` + #### Speeding up SCons dependency evaluation + ##### `--implicit-cache` + ##### `--build-fast-and-loose` + #### Using Ninja responsibly + #### What about `ccache`? ## Making source changes + ### Adding a new dependency ### Linting and Lint Targets + #### What lint targets are available? + #### Using `clang-format` + ### Testing your changes + #### How are test test suites defined? + #### Running test suites + #### Adding tests to a suite + #### Running individual tests ## Modifying the buid system + ### What is SCons? + #### `SConstruct` and `SConscripts` + #### `Environments `and their `Clone`s + ##### Overriding and altering variables + #### `Targets` and `Sources` + #### `Nodes` + ##### `File` Nodes + ##### `Program` and `Library` Nodes + #### `Aliases`, `Depends` and `Requires` + #### `Builders` + #### `Emitters` + #### `Scanners` + #### `Actions` + #### `Configure` objects + #### DAG walk + #### Reference to SCons documentation + ### Modules + #### How modules work + #### The Enterprise module + ##### The `build.py` file + #### Adding a new module + ### Poetry + #### What is Poetry + [Poetry](https://python-poetry.org/) is a python dependency management system. Poetry tries to find dependencies in [pypi](https://pypi.org/) (similar to pip). For more details visit the poetry website. #### Why use Poetry + Poetry creates a dependency lock file similar to that of a [Ruby Gemfile](https://bundler.io/guides/gemfile.html#gemfiles) or a [Rust Cargo File](https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html). This lock file has exact dependencies that will be the same no matter when they are installed. Even if dependencyA has an update available the older pinned dependency will still be installed. The means that there will be less errors that are based on two users having different versions of python dependencies. #### Poetry Lock File + In a Poetry project there are two files that determine and resolve the dependencies. The first is [pyproject.toml](../pyproject.toml). This file loosely tells poetry what dependencies and needed and the constraints of those dependencies. For example the following are all valid selections. + 1. `dependencyA = "1.0.0" # dependencyA can only ever be 1.0.0` 2. `dependencyA = "^1.0.0" # dependencyA can be any version greater than or equal to 1.0.0 and less than 2.0.0` 3. `dependencyA = "*" # dependencyA can be any version` The [poetry.lock](../poetry.lock) file has the exact package versions. This file is generated by poetry by running `poetry lock`. This file contains a pinned list of all transitive dependencies that satisfy the requirements in [pyproject.toml](../pyproject.toml). + ### `LIBDEPS` and the `LIBDEPS` Linter + #### Why `LIBDEPS`? + Libdeps is a subsystem within the build, which is centered around the LIBrary DEPendency graph. It tracks and maintains the dependency graph as well as lints, analyzes and provides useful metrics about the graph. + #### Different `LIBDEPS` variable types + The `LIBDEPS` variables are how the library relationships are defined within the build scripts. The primary variables are as follows: -* `LIBDEPS`: - The 'public' type which propagates lower level dependencies onward automatically. -* `LIBDEPS_PRIVATE`: - Creates a dependency only between the target and the dependency. -* `LIBDEPS_INTERFACE`: - Same as `LIBDEPS` but excludes itself from the propagation onward. -* `LIBDEPS_DEPENDENTS`: - Creates a reverse `LIBDEPS_PRIVATE` dependency where the dependency is the one declaring the relationship. -* `PROGDEPS_DEPENDENTS`: - Same as `LIBDEPS_DEPENDENTS` but for use with Program builders. + +- `LIBDEPS`: + The 'public' type which propagates lower level dependencies onward automatically. +- `LIBDEPS_PRIVATE`: + Creates a dependency only between the target and the dependency. +- `LIBDEPS_INTERFACE`: + Same as `LIBDEPS` but excludes itself from the propagation onward. +- `LIBDEPS_DEPENDENTS`: + Creates a reverse `LIBDEPS_PRIVATE` dependency where the dependency is the one declaring the relationship. +- `PROGDEPS_DEPENDENTS`: + Same as `LIBDEPS_DEPENDENTS` but for use with Program builders. Libraries are added to these variables as lists per each SCons builder instance in the SConscripts depending on what type of relationship is needed. For more detailed information on theses types, refer to [`The LIBDEPS variables`](build_system_reference.md#the-libdeps-variables) + #### The `LIBDEPS` lint rules and tags + The libdeps subsystem is capable of linting and automatically detecting issues. Some of these linting rules are automatically checked during build-time (while the SConscripts are read and the build is performed) while others need to be manually run post-build (after the the generated graph file has been built). Some rules will include exemption tags which can be added to a libraries `LIBDEPS_TAGS` to override a rule for that library. The build-time linter also has a print option `--libdeps-linting=print` which will print all issues without failing the build and ignoring exemption tags. This is useful for getting an idea of what issues are currently outstanding. For a complete list of build-time lint rules, please refer to [`Build-time Libdeps Linter`](build_system_reference.md#build-time-libdeps-linter) + #### `LIBDEPS_TAGS` + `LIBDEPS_TAGS` can also be used to supply flags to the libdeps subsystem to do special handling for certain libraries such as exemptions or inclusions for linting rules and also SCons command line expansion functions. For a full list of tags refer to [`LIBDEPS_TAGS`](build_system_reference.md#libdeps_tags) #### Using the post-build LIBDEPS Linter + To use the post-build tools, you must first build the libdeps dependency graph by building the `generate-libdeps-graph` target. You must also install the requirements file: @@ -253,14 +391,18 @@ After the graph file is created, it can be used as input into the `gacli` tool t python3 buildscripts/libdeps/gacli.py --graph-file build/cached/libdeps/libdeps.graphml ``` -Another tool which provides a graphical interface as well as visual representation of the graph is the graph visualizer. Minimally, it requires passing in a directory in which any files with the `.graphml` extension will be available for analysis. By default it will launch the web interface which is reachable in a web browser at http://localhost:3000. +Another tool which provides a graphical interface as well as visual representation of the graph is the graph visualizer. Minimally, it requires passing in a directory in which any files with the `.graphml` extension will be available for analysis. By default it will launch the web interface which is reachable in a web browser at http://localhost:3000. ``` python3 buildscripts/libdeps/graph_visualizer.py --graphml-dir build/opt/libdeps ``` For more information about the details of using the post-build linting tools refer to [`post-build linting and analysis`](build_system_reference.md#post-build-linting-and-analysis) + ### Debugging build system failures + #### Using` -k` and `-n` + #### `--debug=[explain, time, stacktrace]` + #### `--libdeps-debug` diff --git a/docs/build_system_reference.md b/docs/build_system_reference.md index 721febded93..1d8b86434af 100644 --- a/docs/build_system_reference.md +++ b/docs/build_system_reference.md @@ -1,25 +1,43 @@ # MongoDB Build System Reference ## MongoDB Build System Requirements + ### Recommended minimum requirements + ### Python modules + ### External libraries + ### Enterprise module requirements + ### Testing requirements ## MongoDB customizations + ### SCons modules + ### Development tools + #### Compilation database generator + ### Build tools + #### IDL Compiler + ### Auxiliary tools + #### Ninja generator + #### Icecream tool + #### ccache tool + ### LIBDEPS + Libdeps is a subsystem within the build, which is centered around the LIBrary DEPendency graph. It tracks and maintains the dependency graph as well as lints, analyzes and provides useful metrics about the graph. + #### Design + The libdeps subsystem is divided into several stages, described in order of use as follows. ##### SConscript `LIBDEPS` definitions and built time linting @@ -37,17 +55,18 @@ The libdeps analyzer module is a python library which provides and Application P ##### The CLI and Visualizer tools The libdeps analyzer module is used in the libdeps Graph Analysis Command Line Interface (gacli) tool and the libdeps Graph Visualizer web service. Both tools read in the graph file generated from the build and provide the Human Machine Interface (HMI) for analysis and linting. + #### The `LIBDEPS` variables + The variables include several types of lists to be added to libraries per a SCons builder instance: -| Variable | Use | -| ------------- |-------------| -| `LIBDEPS` | transitive dependencies | -| `LIBDEPS_PRIVATE` | local dependencies | -| `LIBDEPS_INTERFACE` | transitive dependencies excluding self | -| `LIBDEPS_DEPENDENTS` | reverse dependencies | -| `PROGDEPS_DEPENDENTS` | reverse dependencies for Programs | - +| Variable | Use | +| --------------------- | -------------------------------------- | +| `LIBDEPS` | transitive dependencies | +| `LIBDEPS_PRIVATE` | local dependencies | +| `LIBDEPS_INTERFACE` | transitive dependencies excluding self | +| `LIBDEPS_DEPENDENTS` | reverse dependencies | +| `PROGDEPS_DEPENDENTS` | reverse dependencies for Programs | _`LIBDEPS`_ is the 'public' type, such that libraries that are added to this list become a dependency of the current library, and also become dependencies of libraries which may depend on the current library. This propagation also includes not just the libraries in the `LIBDEPS` list, but all `LIBDEPS` of those `LIBDEPS` recursively, meaning that all dependencies of the `LIBDEPS` libraries, also become dependencies of the current library and libraries which depend on it. @@ -60,34 +79,40 @@ _`LIBDEPS_DEPENDENTS`_ are added to libraries which will force themselves as dep _`PROGDEPS_DEPENDENTS`_ are the same as `LIBDEPS_DEPENDENTS`, but intended for use only with Program builders. #### `LIBDEPS_TAGS` + The `LIBDEPS_TAGS` variable is used to mark certain libdeps for various reasons. Some `LIBDEPS_TAGS` are used to mark certain libraries for `LIBDEPS_TAG_EXPANSIONS` variable which is used to create a function which can expand to a string on the command line. Below is a table of available `LIBDEPS` tags: -| Tag | Description | -|---|---| -| `illegal_cyclic_or_unresolved_dependencies_allowlisted` | SCons subst expansion tag to handle dependency cycles | -| `init-no-global-side-effects` | SCons subst expansion tag for causing linkers to avoid pulling in all symbols | -| `lint-public-dep-allowed` | Linting exemption tag exempting the `lint-no-public-deps` tag | -| `lint-no-public-deps` | Linting inclusion tag ensuring a libdep has no `LIBDEPS` declared | -| `lint-allow-non-alphabetic` | Linting exemption tag allowing `LIBDEPS` variable lists to be non-alphabetic | -| `lint-leaf-node-allowed-dep` | Linting exemption tag exempting the `lint-leaf-node-no-deps` tag | -| `lint-leaf-node-no-deps` | Linting inclusion tag ensuring a libdep has no libdeps and is a leaf node | -| `lint-allow-nonlist-libdeps` | Linting exemption tag allowing a `LIBDEPS` variable to not be a list | `lint-allow-bidirectional-edges` | Linting exemption tag allowing reverse dependencies to also be a forward dependencies | -| `lint-allow-nonprivate-on-deps-dependents` | Linting exemption tag allowing reverse dependencies to be transitive | -| `lint-allow-dup-libdeps` | Linting exemption tag allowing `LIBDEPS` variables to contain duplicate libdeps on a given library | -| `lint-allow-program-links-private` | Linting exemption tag allowing `Program`s to have `PRIVATE_LIBDEPS` | +| Tag | Description | +| ------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | -------------------------------- | ------------------------------------------------------------------------------------- | +| `illegal_cyclic_or_unresolved_dependencies_allowlisted` | SCons subst expansion tag to handle dependency cycles | +| `init-no-global-side-effects` | SCons subst expansion tag for causing linkers to avoid pulling in all symbols | +| `lint-public-dep-allowed` | Linting exemption tag exempting the `lint-no-public-deps` tag | +| `lint-no-public-deps` | Linting inclusion tag ensuring a libdep has no `LIBDEPS` declared | +| `lint-allow-non-alphabetic` | Linting exemption tag allowing `LIBDEPS` variable lists to be non-alphabetic | +| `lint-leaf-node-allowed-dep` | Linting exemption tag exempting the `lint-leaf-node-no-deps` tag | +| `lint-leaf-node-no-deps` | Linting inclusion tag ensuring a libdep has no libdeps and is a leaf node | +| `lint-allow-nonlist-libdeps` | Linting exemption tag allowing a `LIBDEPS` variable to not be a list | `lint-allow-bidirectional-edges` | Linting exemption tag allowing reverse dependencies to also be a forward dependencies | +| `lint-allow-nonprivate-on-deps-dependents` | Linting exemption tag allowing reverse dependencies to be transitive | +| `lint-allow-dup-libdeps` | Linting exemption tag allowing `LIBDEPS` variables to contain duplicate libdeps on a given library | +| `lint-allow-program-links-private` | Linting exemption tag allowing `Program`s to have `PRIVATE_LIBDEPS` | ##### The `illegal_cyclic_or_unresolved_dependencies_allowlisted` tag + This tag should not be used anymore because the library dependency graph has been successfully converted to a Directed Acyclic Graph (DAG). Prior to this accomplishment, it was necessary to handle cycles specifically with platform specific options on the command line. ##### The `init-no-global-side-effects` tag + Adding this flag to a library turns on platform specific compiler flags which will cause the linker to pull in just the symbols it needs. Note that by default, the build is configured to pull in all symbols from libraries because of the use of static initializers, however if a library is known to not have any of these initializers, then this flag can be added for some performance improvement. #### Linting and linter tags The libdeps linter features automatically detect certain classes of LIBDEPS usage errors. The libdeps linters are implemented as build-time linting and post-build linting procedures to maintain order in usage of the libdeps tool and the build’s library dependency graph. You will need to comply with the rules enforced by the libdeps linter, and fix issues that it raises when modifying the build scripts. There are exemption tags to prevent the linter from blocking things, however these exemption tags should only be used in extraordinary cases, and with good reason. A goal of the libdeps linter is to drive and maintain the number of exemption tags in use to zero. + ##### Exemption Tags + There are a number of existing issues that need to be addressed, but they will be addressed in future tickets. In the meantime, the use of specific strings in the LIBDEPS_TAGS variable can allow the libdeps linter to skip certain issues on given libraries. For example, to have the linter skip enforcement of the lint rule against bidirectional edges for "some_library": + ``` env.Library( target=’some_library’ @@ -97,11 +122,13 @@ env.Library( ``` #### build-time Libdeps Linter + If there is a build-time issue, the build will fail until it is addressed. This linting feature will be on by default and takes about half a second to complete in a full enterprise build (at the time of writing this), but can be turned off by using the --libdeps-linting=off option on your SCons invocation. The current rules and there exemptions are listed below: 1. **A 'Program' can not link a non-public dependency, it can only have LIBDEPS links.** + ###### Example ``` @@ -112,12 +139,15 @@ The current rules and there exemptions are listed below: LIBDEPS_PRIVATE=[‘lib2’], # This is a Program, BAD ) ``` + ###### Rationale A Program can not be linked into anything else, and there for the transitiveness does not apply. A default value of LIBDEPS was selected for consistency since most Program's were already doing this at the time the rule was created. ###### Exemption + 'lint-allow-program-links-private' on the target node + ###### 2. **A 'Node' can only directly link a given library once.** @@ -133,15 +163,19 @@ The current rules and there exemptions are listed below: LIBDEPS_INTERFACE=[‘lib2’, 'lib2'], # Linked twice, BAD ) ``` + ###### Rationale Libdeps will ignore duplicate links, so this rule is mostly for consistency and neatness in the build scripts. ###### Exemption + 'lint-allow-dup-libdeps' on the target node ###### + 3. **A 'Node' which uses LIBDEPS_DEPENDENTS or PROGDEPS_DEPENDENTS can only have LIBDEPS_PRIVATE links.** + ###### Example ``` @@ -153,14 +187,21 @@ The current rules and there exemptions are listed below: LIBDEPS_PRIVATE=[‘lib2’], # OK ) ``` + ###### Rationale The node that the library is using LIBDEPS_DEPENDENTS or PROGDEPS_DEPENDENT to inject its dependency onto should be conditional, therefore there should not be transitiveness for that dependency since it cannot be the source of any resolved symbols. + ###### Exemption + 'lint-allow-nonprivate-on-deps-dependents' on the target node + ###### + 4. **A 'Node' can not link directly to a library that uses LIBDEPS_DEPENDENTS or PROGDEPS_DEPENDENTS.** + ###### Example + ``` env.Library( target='other_library', @@ -173,16 +214,21 @@ The current rules and there exemptions are listed below: LIBDEPS_DEPENDENTS=['lib3'], ) ``` + ###### Rationale A library that is using LIBDEPS_DEPENDENTS or PROGDEPS_DEPENDENT should only be used for reverse dependency edges. If a node does need to link directly to a library that does have reverse dependency edges, that indicates the library should be split into two separate libraries, containing its direct dependency content and its conditional reverse dependency content. ###### Exemption + 'lint-allow-bidirectional-edges' on the target node + ###### 5. **All libdeps environment vars must be assigned as lists.** + ###### Example + ``` env.Library( target='some_library', @@ -191,13 +237,19 @@ The current rules and there exemptions are listed below: LIBDEPS_PRIVATE=['lib2'], # OK ) ``` + ###### Rationale Libdeps will handle non-list environment variables, so this is more for consistency and neatness in the build scripts. + ###### Exemption + 'lint-allow-nonlist-libdeps' on the target node + ###### + 6. **Libdeps with the tag 'lint-leaf-node-no-deps' shall not link any libdeps.** + ###### Example ``` @@ -225,13 +277,19 @@ The current rules and there exemptions are listed below: The special tag allows certain nodes to be marked and programmatically checked that they remain lead nodes. An example use-case is when we want to make sure certain nodes never link mongodb code. ###### Exemption + 'lint-leaf-node-allowed-dep' on the exempted libdep ###### Inclusion + 'lint-leaf-node-no-deps' on the target node + ###### + 7. **Libdeps with the tag 'lint-no-public-deps' shall not link any libdeps.** + ###### Example + ``` env.Library( target='lib2', @@ -253,17 +311,25 @@ The current rules and there exemptions are listed below: ] ) ``` + ###### Rationale The special tag allows certain nodes to be marked and programmatically checked that they do not link publicly. Some nodes such as mongod_main have special requirements that this programmatically checks. ###### Exemption + 'lint-public-dep-allowed' on the exempted libdep + ###### Inclusion + 'lint-no-public-deps' on the target node + ###### + 8. **Libdeps shall be sorted alphabetically in LIBDEPS lists in the SCons files.** + ###### Example + ``` env.Library( target='lib2', @@ -276,17 +342,19 @@ The current rules and there exemptions are listed below: ] ) ``` + ###### Rationale Keeping the SCons files neat and ordered allows for easier Code Review diffs and generally better maintainability. + ###### Exemption + 'lint-allow-non-alphabetic' on the exempted libdep + ###### - - - ##### The build-time print Option + The libdeps linter also has the `--libdeps-linting=print` option which will perform linting, and instead of failing the build on an issue, just print and continue on. It will also ignore exemption tags, and still print the issue because it will not fail the build. This is a good way to see the entirety of existing issues that are exempted by tags, as well as printing other metrics such as time spent linting. #### post-build linting and analysis @@ -400,12 +468,14 @@ The script will launch the backend and then build the optimized production front After the server has started up, it should notify you via the terminal that you can access it at http://localhost:3000 locally in your browser. - - ## Build system configuration + ### SCons configuration + #### Frequently used flags and variables + ### MongoDB build configuration + #### Frequently used flags and variables ##### `MONGO_GIT_HASH` @@ -425,18 +495,31 @@ of `git describe`, which will use the local tags to derive a version. ### Targets and Aliases ## Build artifacts and installation + ### Hygienic builds + ### AutoInstall + ### AutoArchive ## MongoDB SCons style guide + ### Sconscript Formatting Guidelines + #### Vertical list style + #### Alphabetize everything + ### `Environment` Isolation + ### Declaring Targets (`Program`, `Library`, and `CppUnitTest`) + ### Invoking external tools correctly with `Command`s + ### Customizing an `Environment` for a target + ### Invoking subordinate `SConscript`s + #### `Import`s and `Export`s + ### A Model `SConscript` with Comments diff --git a/docs/building.md b/docs/building.md index a32d64e72da..e43b5705342 100644 --- a/docs/building.md +++ b/docs/building.md @@ -6,26 +6,25 @@ way to get started, rather than building from source. To build MongoDB, you will need: -* A modern C++ compiler capable of compiling C++20. One of the following is required: - * GCC 11.3 or newer - * Clang 12.0 (or Apple XCode 13.0 Clang) or newer - * Visual Studio 2022 version 17.0 or newer (See Windows section below for details) -* On Linux and macOS, the libcurl library and header is required. MacOS includes libcurl. - * Fedora/RHEL - `dnf install libcurl-devel` - * Ubuntu/Debian - `libcurl-dev` is provided by three packages. Install one of them: - * `libcurl4-openssl-dev` - * `libcurl4-nss-dev` - * `libcurl4-gnutls-dev` - * On Ubuntu, the lzma library is required. Install `liblzma-dev` - * On Amazon Linux, the xz-devel library is required. `yum install xz-devel` -* Python 3.10.x and Pip modules: - * See the section "Python Prerequisites" below. -* About 13 GB of free disk space for the core binaries (`mongod`, - `mongos`, and `mongo`) and about 600 GB for the install-all target. +- A modern C++ compiler capable of compiling C++20. One of the following is required: + - GCC 11.3 or newer + - Clang 12.0 (or Apple XCode 13.0 Clang) or newer + - Visual Studio 2022 version 17.0 or newer (See Windows section below for details) +- On Linux and macOS, the libcurl library and header is required. MacOS includes libcurl. + - Fedora/RHEL - `dnf install libcurl-devel` + - Ubuntu/Debian - `libcurl-dev` is provided by three packages. Install one of them: + - `libcurl4-openssl-dev` + - `libcurl4-nss-dev` + - `libcurl4-gnutls-dev` + - On Ubuntu, the lzma library is required. Install `liblzma-dev` + - On Amazon Linux, the xz-devel library is required. `yum install xz-devel` +- Python 3.10.x and Pip modules: + - See the section "Python Prerequisites" below. +- About 13 GB of free disk space for the core binaries (`mongod`, + `mongos`, and `mongo`) and about 600 GB for the install-all target. MongoDB supports the following architectures: arm64, ppc64le, s390x, -and x86-64. More detailed platform instructions can be found below. - +and x86-64. More detailed platform instructions can be found below. ## MongoDB Tools @@ -37,7 +36,6 @@ repository. The source for the tools is now available at [mongodb/mongo-tools](https://github.com/mongodb/mongo-tools). - ## Python Prerequisites In order to build MongoDB, Python 3.10+ is required, and several Python @@ -59,9 +57,9 @@ dedicated to building MongoDB is optional but recommended. Note: In order to compile C-based Python modules, you'll also need the Python and OpenSSL C headers. Run: -* Fedora/RHEL - `dnf install python3-devel openssl-devel` -* Ubuntu (20.04 and newer)/Debian (Bullseye and newer) - `apt install python-dev-is-python3 libssl-dev` -* Ubuntu (18.04 and older)/Debian (Buster and older) - `apt install python3.7-dev libssl-dev` +- Fedora/RHEL - `dnf install python3-devel openssl-devel` +- Ubuntu (20.04 and newer)/Debian (Bullseye and newer) - `apt install python-dev-is-python3 libssl-dev` +- Ubuntu (18.04 and older)/Debian (Buster and older) - `apt install python3.7-dev libssl-dev` Note: If you are seeing errors involving "Prompt dismissed.." you might need to run the following command before poetry install. @@ -73,7 +71,7 @@ If you only want to build the database server `mongod`: $ python3 buildscripts/scons.py install-mongod -***Note***: For C++ compilers that are newer than the supported +**_Note_**: For C++ compilers that are newer than the supported version, the compiler may issue new warnings that cause MongoDB to fail to build since the build system treats compiler warnings as errors. To ignore the warnings, pass the switch @@ -81,7 +79,7 @@ errors. To ignore the warnings, pass the switch $ python3 buildscripts/scons.py install-mongod --disable-warnings-as-errors -***Note***: On memory-constrained systems, you may run into an error such as `g++: fatal error: Killed signal terminated program cc1plus`. To use less memory during building, pass the parameter `-j1` to scons. This can be incremented to `-j2`, `-j3`, and higher as appropriate to find the fastest working option on your system. +**_Note_**: On memory-constrained systems, you may run into an error such as `g++: fatal error: Killed signal terminated program cc1plus`. To use less memory during building, pass the parameter `-j1` to scons. This can be incremented to `-j2`, `-j3`, and higher as appropriate to find the fastest working option on your system. $ python3 buildscripts/scons.py install-mongod -j1 @@ -99,21 +97,20 @@ tests, etc): $ python3 buildscripts/scons.py install-all-meta - ## SCons Targets The following targets can be named on the scons command line to build and install a subset of components: -* `install-mongod` -* `install-mongos` -* `install-core` (includes *only* `mongod` and `mongos`) -* `install-servers` (includes all server components) -* `install-devcore` (includes `mongod`, `mongos`, and `jstestshell` (formerly `mongo` shell)) -* `install-all` (includes a complete end-user distribution and tests) -* `install-all-meta` (absolutely everything that can be built and installed) +- `install-mongod` +- `install-mongos` +- `install-core` (includes _only_ `mongod` and `mongos`) +- `install-servers` (includes all server components) +- `install-devcore` (includes `mongod`, `mongos`, and `jstestshell` (formerly `mongo` shell)) +- `install-all` (includes a complete end-user distribution and tests) +- `install-all-meta` (absolutely everything that can be built and installed) -***NOTE***: The `install-core` and `install-servers` targets are *not* +**_NOTE_**: The `install-core` and `install-servers` targets are _not_ guaranteed to be identical. The `install-core` target will only ever include a minimal set of "core" server components, while `install-servers` is intended for a functional end-user installation. If you are testing, you should use the @@ -126,23 +123,21 @@ The build system will produce an installation tree into `PREFIX` is by default empty. This means that with all of the listed targets all built binaries will be in `build/install/bin` by default. - ## Windows Build requirements: -* Visual Studio 2022 version 17.0 or newer -* Python 3.10 + +- Visual Studio 2022 version 17.0 or newer +- Python 3.10 Or download a prebuilt binary for Windows at www.mongodb.org. - ## Debian/Ubuntu To install dependencies on Debian or Ubuntu systems: # apt-get install build-essential - ## OS X Install Xcode 13.0 or newer. @@ -151,16 +146,16 @@ Install Xcode 13.0 or newer. Install the following ports: - * `devel/libexecinfo` - * `lang/llvm70` - * `lang/python` +- `devel/libexecinfo` +- `lang/llvm70` +- `lang/python` Add `CC=clang12 CXX=clang++12` to the `scons` options, when building. - ## OpenBSD + Install the following ports: - * `devel/libexecinfo` - * `lang/gcc` - * `lang/python` +- `devel/libexecinfo` +- `lang/gcc` +- `lang/python` diff --git a/docs/command_dispatch.md b/docs/command_dispatch.md index 4b2391afbd2..8360e069f06 100644 --- a/docs/command_dispatch.md +++ b/docs/command_dispatch.md @@ -15,10 +15,10 @@ single client connection during its lifetime. Central to the entry point is the requests and returns a response message indicating the result of the corresponding request message. This function is currently implemented by several subclasses of the parent `ServiceEntryPoint` in order to account for the -differences in processing requests between *mongod* and *mongos* -- these +differences in processing requests between _mongod_ and _mongos_ -- these distinctions are reflected in the `ServiceEntryPointMongos` and `ServiceEntryPointMongod` subclasses (see [here][service_entry_point_mongos_h] -and [here][service_entry_point_mongod_h]). One such distinction is the *mongod* +and [here][service_entry_point_mongod_h]). One such distinction is the _mongod_ entry point's use of the `ServiceEntryPointCommon::Hooks` interface, which provides greater flexibility in modifying the entry point's behavior. This includes waiting on a read of a particular [read concern][read_concern] level to @@ -28,17 +28,17 @@ for [write concerns][write_concern] as well. ## Strategy -One area in which the *mongos* entry point differs from its *mongod* counterpart +One area in which the _mongos_ entry point differs from its _mongod_ counterpart is in its usage of the [Strategy class][strategy_h]. `Strategy` operates as a legacy interface for processing client read, write, and command requests; there is a near 1-to-1 mapping between its constituent functions and request types (e.g. `writeOp()` for handling write operation requests, `getMore()` for a -getMore request, etc.). These functions comprise the backbone of the *mongos* +getMore request, etc.). These functions comprise the backbone of the _mongos_ entry point's `handleRequest()` -- that is to say, when a valid request is received, it is sieved and ultimately passed along to the appropriate Strategy class member function. The significance of using the Strategy class specifically -with the *mongos* entry point is that it [facilitates query routing to -shards][mongos_router] in *addition* to running queries against targeted +with the _mongos_ entry point is that it [facilitates query routing to +shards][mongos_router] in _addition_ to running queries against targeted databases (see [s/transaction_router.h][transaction_router_h] for finer details). @@ -50,7 +50,7 @@ system][template_method_pattern], that will likely be used during the lifespan of a particular server. Construction of a Command should only occur during server startup. When a new Command is constructed, that Command is stored in a global `CommandRegistry` object for future reference. There are two kinds of -Command subclasses: `BasicCommand` and `TypedCommand`. +Command subclasses: `BasicCommand` and `TypedCommand`. A major distinction between the two is in their implementation of the `parse()` member function. `parse()` takes in a request and returns a handle to a single @@ -62,7 +62,7 @@ implementation of `TypedCommand::parse()`, on the other hand, varies depending on the Request type parameter the Command takes in. Since the `TypedCommand` accepts requests generated by IDL, the parsing function associated with a usable Request type must allow it to be parsed as an IDL command. In handling requests, -both the *mongos* and *mongod* entry points interact with the Command subclasses +both the _mongos_ and _mongod_ entry points interact with the Command subclasses through the `CommandHelpers` struct in order to parse requests and ultimately run them as Commands. @@ -81,5 +81,5 @@ For details on transport internals, including ingress networking, see [this docu [mongos_router]: https://docs.mongodb.com/manual/core/sharded-cluster-query-router/ [transaction_router_h]: ../src/mongo/s/transaction_router.h [commands_h]: ../src/mongo/db/commands.h -[template_method_pattern]: https://en.wikipedia.org/wiki/Template_method_pattern +[template_method_pattern]: https://en.wikipedia.org/wiki/Template_method_pattern [transport_internals]: ../src/mongo/transport/README.md diff --git a/docs/contexts.md b/docs/contexts.md index a1397b8bf63..0624d745731 100644 --- a/docs/contexts.md +++ b/docs/contexts.md @@ -55,7 +55,7 @@ All `Client`s have an associated lock which protects their internal state includ associated `OperationContext` from concurrent access. Any mutation to a `Client`’s associated `OperationContext` (or other protected internal state) _must_ take the `Client` lock before being performed, as an `OperationContext` can otherwise be killed and destroyed at any time. A `Client` -thread may read its own internal state without taking the `Client` lock, but _must_ take the +thread may read its own internal state without taking the `Client` lock, but _must_ take the `Client` lock when reading another `Client` thread's internal state. Only a `Client`'s owning thread may write to its `Client`'s internal state, and must take the lock when doing so. `Client`s implement the standard lockable interface (`lock()`, `unlock()`, and `try_lock()`) to support these @@ -68,7 +68,7 @@ operations. The semantics of the `Client` lock are summarized in the table below ### `Client` thread manipulation - [`Client::cc()`][client-cc-url] may be used to get the `Client` object associated with the currently +[`Client::cc()`][client-cc-url] may be used to get the `Client` object associated with the currently executing thread. Prefer passing `Client` objects as parameters over calls to `Client::cc()` when possible. A [`ThreadClient`][thread-client-url] is an RAII-style class which may be used to construct and bind a `Client` to the current running thread and automatically unbind it once the `ThreadClient` diff --git a/docs/egress_networking.md b/docs/egress_networking.md index d17ddb6a58f..b8f1a6f0c66 100644 --- a/docs/egress_networking.md +++ b/docs/egress_networking.md @@ -1,28 +1,29 @@ # Egress Networking -Egress networking entails outbound communication (i.e. requests) from a client process to a server process (e.g. *mongod*), as well as inbound communication (i.e. responses) from such a server process back to a client process. +Egress networking entails outbound communication (i.e. requests) from a client process to a server process (e.g. _mongod_), as well as inbound communication (i.e. responses) from such a server process back to a client process. ## Remote Commands -Remote commands represent the "packages" in which data is transmitted via egress networking. There are two types of remote commands: requests and responses. The [request object][remote_command_request_h] is in essence a wrapper for a command in BSON format, that is to be delivered to and executed by a remote MongoDB node against a database specified by a member in the object. The [response object][remote_command_response_h], in turn, contains data that describes the response to a previously sent request, also in BSON format. Besides the actual response data, the response object also stores useful information such as the duration of running the command specified in the corresponding request, as well as a `Status` member that indicates whether the operation was a success, and the cause of error if not. +Remote commands represent the "packages" in which data is transmitted via egress networking. There are two types of remote commands: requests and responses. The [request object][remote_command_request_h] is in essence a wrapper for a command in BSON format, that is to be delivered to and executed by a remote MongoDB node against a database specified by a member in the object. The [response object][remote_command_response_h], in turn, contains data that describes the response to a previously sent request, also in BSON format. Besides the actual response data, the response object also stores useful information such as the duration of running the command specified in the corresponding request, as well as a `Status` member that indicates whether the operation was a success, and the cause of error if not. -There are two variants of both the request and response classes that are used in egress networking. The distinction between the `RemoteCommandRequest` and `RemoteCommandRequestOnAny` classes is that the former specifies a particular host/server to connect to, whereas the latter houses a vector of hosts, for when a command may be run on multiple nodes in a replica set. The distinction between `RemoteCommandResponse` and `RemoteCommandOnAnyResponse` is that the latter includes additional information as to what host the originating request was ultimately run on. It should be noted that the distinctions between the request and response classes are characteristically different; that is to say, whereas the *OnAny* variant of the request object is a augmented version of the other, the response classes should be understood as being different return types altogether. +There are two variants of both the request and response classes that are used in egress networking. The distinction between the `RemoteCommandRequest` and `RemoteCommandRequestOnAny` classes is that the former specifies a particular host/server to connect to, whereas the latter houses a vector of hosts, for when a command may be run on multiple nodes in a replica set. The distinction between `RemoteCommandResponse` and `RemoteCommandOnAnyResponse` is that the latter includes additional information as to what host the originating request was ultimately run on. It should be noted that the distinctions between the request and response classes are characteristically different; that is to say, whereas the _OnAny_ variant of the request object is a augmented version of the other, the response classes should be understood as being different return types altogether. ## Connection Pooling -[Connection pooling][connection_pool] is largely taken care of by the [executor::connection_pool][connection_pool_h] class. This class houses a collection of `ConnectionPool::SpecificPool` objects, each of which shares a one-to-one mapping with a unique host. This lends itself to a parent-child relationship between a "parent" ConnectionPool and its constituent "children" SpecificPool members. The `ConnectionPool::ControllerInterface` subclass is used to direct the behavior of the SpecificPools that belong to it. The main operations associated with the ControllerInterface are the addition, removal, and updating of hosts (and thereby corresponding SpecificPools) to/from/in the parent pool. SpecificPools are created when a connection to a new host is requested, and expire when `hostTimeout` has passed without there having been any new requests or checked-out connections (i.e. connections in use). A pool can have its expiration status lifted whenever a connection is requested, but once a pool is shutdown, the pool becomes unusable. The `hostTimeout` field is one of many parameters belonging to the `ConnectionPool::Options` struct that determines how pools operate. +[Connection pooling][connection_pool] is largely taken care of by the [executor::connection_pool][connection_pool_h] class. This class houses a collection of `ConnectionPool::SpecificPool` objects, each of which shares a one-to-one mapping with a unique host. This lends itself to a parent-child relationship between a "parent" ConnectionPool and its constituent "children" SpecificPool members. The `ConnectionPool::ControllerInterface` subclass is used to direct the behavior of the SpecificPools that belong to it. The main operations associated with the ControllerInterface are the addition, removal, and updating of hosts (and thereby corresponding SpecificPools) to/from/in the parent pool. SpecificPools are created when a connection to a new host is requested, and expire when `hostTimeout` has passed without there having been any new requests or checked-out connections (i.e. connections in use). A pool can have its expiration status lifted whenever a connection is requested, but once a pool is shutdown, the pool becomes unusable. The `hostTimeout` field is one of many parameters belonging to the `ConnectionPool::Options` struct that determines how pools operate. -The `ConnectionPool::ConnectionInterface` is responsible for handling the connections *within* a pool. The ConnectionInterface's operations include, but are not limited to, connection setup (establishing a connection, authenticating, etc.), refreshing connections, and managing a timer. This interface also maintains the notion of a pool/connection **generation**, which is used to identify whether some particular connection's generation is older than that of the pool it belongs to (i.e. the connection is out-of-date), in which case it is dropped. The ConnectionPool uses a global mutex for access to SpecificPools as well as generation counters. Another component of the ConnectionPool is its `EgressConnectionCloserManager`. The manager consists of multiple `EgressConnectionClosers`, which are used to determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's purpose is to drop *connections* to hosts based on whether they have been marked as keep open or not. +The `ConnectionPool::ConnectionInterface` is responsible for handling the connections _within_ a pool. The ConnectionInterface's operations include, but are not limited to, connection setup (establishing a connection, authenticating, etc.), refreshing connections, and managing a timer. This interface also maintains the notion of a pool/connection **generation**, which is used to identify whether some particular connection's generation is older than that of the pool it belongs to (i.e. the connection is out-of-date), in which case it is dropped. The ConnectionPool uses a global mutex for access to SpecificPools as well as generation counters. Another component of the ConnectionPool is its `EgressConnectionCloserManager`. The manager consists of multiple `EgressConnectionClosers`, which are used to determine whether hosts should be dropped. In the context of the ConnectionPool, the manager's purpose is to drop _connections_ to hosts based on whether they have been marked as keep open or not. ## Internal Network Clients -Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient class][async_client_h]. The async client is responsible for initializing a connection to a particular host as well as initializing the [wire protocol][wire_protocol] for client-server communication, after which remote requests can be sent by the client and corresponding remote responses from a database can subsequently be received. In setting up the wire protocol, the async client sends an [isMaster][is_master] request to the server and parses the server's isMaster response to ensure that the status of the connection is OK. An initial isMaster request is constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that may not support other protocols. The async client also supports client authentication functionality (i.e. authenticating a user's credentials, client host, remote host, etc.). +Client-side outbound communication in egress networking is primarily handled by the [AsyncDBClient class][async_client_h]. The async client is responsible for initializing a connection to a particular host as well as initializing the [wire protocol][wire_protocol] for client-server communication, after which remote requests can be sent by the client and corresponding remote responses from a database can subsequently be received. In setting up the wire protocol, the async client sends an [isMaster][is_master] request to the server and parses the server's isMaster response to ensure that the status of the connection is OK. An initial isMaster request is constructed in the legacy OP_QUERY protocol, so that clients can still communicate with servers that may not support other protocols. The async client also supports client authentication functionality (i.e. authenticating a user's credentials, client host, remote host, etc.). -The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to be executed by the executor, and are scheduled by client threads as well as other callbacks. There are several variations of work scheduling methods, which include: immediate scheduling, scheduling no earlier than a specified time, and scheduling iff a specified event has been signalled. These methods return a handle that can be used while the executor is still in scope for either waiting on or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains on the work queue and is technically still run, but is labeled as having been 'cancelled' beforehand. Once a given callback/request is scheduled, the task executor is then able to execute such requests via a [network interface][network_interface_h]. The network interface, connected to a particular host/server, begins the asynchronous execution of commands specified via a request bundled in the aforementioned callback handle. The interface is capable of blocking threads until its associated task executor has work that needs to be performed, and is likewise able to return from an idle state when it receives a signal that the executor has new work to process. +The scheduling of requests is managed by the [task executor][task_executor_h], which maintains the notion of **events** and **callbacks**. Callbacks represent work (e.g. remote requests) that is to be executed by the executor, and are scheduled by client threads as well as other callbacks. There are several variations of work scheduling methods, which include: immediate scheduling, scheduling no earlier than a specified time, and scheduling iff a specified event has been signalled. These methods return a handle that can be used while the executor is still in scope for either waiting on or cancelling the scheduled callback in question. If a scheduled callback is cancelled, it remains on the work queue and is technically still run, but is labeled as having been 'cancelled' beforehand. Once a given callback/request is scheduled, the task executor is then able to execute such requests via a [network interface][network_interface_h]. The network interface, connected to a particular host/server, begins the asynchronous execution of commands specified via a request bundled in the aforementioned callback handle. The interface is capable of blocking threads until its associated task executor has work that needs to be performed, and is likewise able to return from an idle state when it receives a signal that the executor has new work to process. Client-side legacy networking draws upon the `DBClientBase` class, of which there are multiple subclasses residing in the `src/mongo/client` folder. The [replica set DBClient][dbclient_rs_h] discerns which one of multiple servers in a replica set is the primary at construction time, and establishes a connection (using the `DBClientConnection` wrapper class, also extended from `DBClientBase`) with the replica set via the primary. In cases where the primary server is unresponsive within a specified time range, the RS DBClient will automatically attempt to establish a secondary server as the new primary (see [automatic failover][automatic_failover]). ## See Also + For details on transport internals, including ingress networking, see [this document][transport_internals]. diff --git a/docs/evergreen-testing/README.md b/docs/evergreen-testing/README.md index a8233e95ac0..cbbf47a447b 100644 --- a/docs/evergreen-testing/README.md +++ b/docs/evergreen-testing/README.md @@ -2,9 +2,9 @@ Documentation about how MongoDB is tested in Evergreen. -* [Burn_in_tags](burn_in_tags.md) -* [Burn_in_tests](burn_in_tests.md) -* [Configuration for Evergreen Integration](configuration.md) -* [Task Timeouts](task_timeouts.md) -* [Task Generation](task_generation.md) -* [Multiversion Testing](multiversion.md) +- [Burn_in_tags](burn_in_tags.md) +- [Burn_in_tests](burn_in_tests.md) +- [Configuration for Evergreen Integration](configuration.md) +- [Task Timeouts](task_timeouts.md) +- [Task Generation](task_generation.md) +- [Multiversion Testing](multiversion.md) diff --git a/docs/evergreen-testing/burn_in_tags.md b/docs/evergreen-testing/burn_in_tags.md index 9600645a1fc..2eadec1d6f2 100644 --- a/docs/evergreen-testing/burn_in_tags.md +++ b/docs/evergreen-testing/burn_in_tags.md @@ -19,6 +19,7 @@ new or changed javascript tests (note that a javascript test can be included in those tests will be run 2 times minimum, and 1000 times maximum or for 10 minutes, whichever is reached first. ## ! Run All Affected JStests + The `! Run All Affected JStests` variant has a single `burn_in_tags_gen` task. This task will create & activate [`burn_in_tests`](burn_in_tests.md) tasks for all required and suggested variants. The end result is that any jstests that have been modified in the patch will diff --git a/docs/evergreen-testing/configuration.md b/docs/evergreen-testing/configuration.md index 0aa811bdc2e..6e2dcafdbd8 100644 --- a/docs/evergreen-testing/configuration.md +++ b/docs/evergreen-testing/configuration.md @@ -1,51 +1,53 @@ # Evergreen configuration -This document describes the continuous integration (CI) configuration for MongoDB. - +This document describes the continuous integration (CI) configuration for MongoDB. ## Projects There are a number of Evergreen projects supporting MongoDB's CI. For more information on -Evergreen-specific terminology used in this document, please refer to the +Evergreen-specific terminology used in this document, please refer to the [Project Configuration](https://github.com/evergreen-ci/evergreen/wiki/Project-Configuration-Files) section of the Evergreen wiki. ### `mongodb-mongo-master` + The main project for testing MongoDB's dev environments with a number build variants, each one corresponding to a particular compile or testing environment to support development. Each build variant runs a set of tasks; each task ususally runs one or more tests. ### `mongodb-mongo-master-nightly + Tracks the same branch as `mongodb-mongo-master`, each build variant corresponds to a (version, OS, architecure) triplet for a supported MongoDB nightly release. ### `sys_perf` + The system performance project. ### `microbenchmarks` -Performance unittests, used mainly for validating areas related to the Query system. +Performance unittests, used mainly for validating areas related to the Query system. ## Project configurations The above Evergreen projects are defined in the following files: -* `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions, buildvariants, etc. -They are copied from the existing evergreen.yml file. +- `etc/evergreen_yml_components/**.yml`. YAML files containing definitions for tasks, functions, buildvariants, etc. + They are copied from the existing evergreen.yml file. -* `etc/evergreen.yml`. Imports components from above and serves as the project config for mongodb-mongo-master, -containing all build variants for development, including all feature-specific, patch build required, and suggested -variants. +- `etc/evergreen.yml`. Imports components from above and serves as the project config for mongodb-mongo-master, + containing all build variants for development, including all feature-specific, patch build required, and suggested + variants. -* `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly, containing only build -variants for public nightly builds, imports similar components as evergreen.yml to ensure consistency. +- `etc/evergreen_nightly.yml`. The project configuration for mongodb-mongo-master-nightly, containing only build + variants for public nightly builds, imports similar components as evergreen.yml to ensure consistency. -* `etc/sys_perf.yml`. Configuration file for the system performance project. - -* `etc/perf.yml`. Configuration for the microbenchmark project. +- `etc/sys_perf.yml`. Configuration file for the system performance project. +- `etc/perf.yml`. Configuration for the microbenchmark project. ## Release Branching Process + Only the `mongodb-mongo-master-nightly` project will be branched with required and other necessary variants (e.g. sanitizers) added back in. Most variants in `mongodb-mongo-master` would be dropped by default but can be re-introduced to the release branches manually on an diff --git a/docs/evergreen-testing/multiversion.md b/docs/evergreen-testing/multiversion.md index 4bda73b7081..2275d8a253d 100644 --- a/docs/evergreen-testing/multiversion.md +++ b/docs/evergreen-testing/multiversion.md @@ -1,48 +1,43 @@ # Multiversion Testing - ## Table of contents -- [Multiversion Testing](#multiversion-testing) - - [Table of contents](#table-of-contents) - - [Terminology and overview](#terminology-and-overview) - - [Introduction](#introduction) - - [Latest vs last-lts vs last-continuous](#latest-vs-last-lts-vs-last-continuous) - - [Old vs new](#old-vs-new) - - [Explicit and Implicit multiversion suites](#explicit-and-implicit-multiversion-suites) - - [Version combinations](#version-combinations) - - [Working with multiversion tasks in Evergreen](#working-with-multiversion-tasks-in-evergreen) - - [Exclude tests from multiversion testing](#exclude-tests-from-multiversion-testing) - - [Multiversion task generation](#multiversion-task-generation) - +- [Multiversion Testing](#multiversion-testing) + - [Table of contents](#table-of-contents) + - [Terminology and overview](#terminology-and-overview) + - [Introduction](#introduction) + - [Latest vs last-lts vs last-continuous](#latest-vs-last-lts-vs-last-continuous) + - [Old vs new](#old-vs-new) + - [Explicit and Implicit multiversion suites](#explicit-and-implicit-multiversion-suites) + - [Version combinations](#version-combinations) + - [Working with multiversion tasks in Evergreen](#working-with-multiversion-tasks-in-evergreen) + - [Exclude tests from multiversion testing](#exclude-tests-from-multiversion-testing) + - [Multiversion task generation](#multiversion-task-generation) ## Terminology and overview - ### Introduction Some tests test specific upgrade/downgrade behavior expected between different versions of MongoDB. Several versions of MongoDB are spun up during those test runs. -* Multiversion suites - resmoke suites that are running tests with several versions of MongoDB. - -* Multiversion tasks - Evergreen tasks that are running multiversion suites. Multiversion tasks in -most cases include `multiversion` or `downgrade` in their names. +- Multiversion suites - resmoke suites that are running tests with several versions of MongoDB. +- Multiversion tasks - Evergreen tasks that are running multiversion suites. Multiversion tasks in + most cases include `multiversion` or `downgrade` in their names. ### Latest vs last-lts vs last-continuous For some of the versions we are using such generic names as `latest`, `last-lts` and `last-continuous`. -* `latest` - the current version. In Evergreen, the version that was compiled in the current build. - -* `last-lts` - the latest LTS (Long Term Support) Major release version. In Evergreen, the version -that was downloaded from the last LTS release branch project. +- `latest` - the current version. In Evergreen, the version that was compiled in the current build. -* `last-continuous` - the latest Rapid release version. In Evergreen, the version that was -downloaded from the Rapid release branch project. +- `last-lts` - the latest LTS (Long Term Support) Major release version. In Evergreen, the version + that was downloaded from the last LTS release branch project. +- `last-continuous` - the latest Rapid release version. In Evergreen, the version that was + downloaded from the Rapid release branch project. ### Old vs new @@ -56,34 +51,53 @@ compiled binaries are downloaded from the old branch projects with `db-contrib-tool` searches for the latest available compiled binaries on the old branch projects in Evergreen. - ### Explicit and Implicit multiversion suites Multiversion suites can be explicit and implicit. -* Explicit - JS tests are aware of the binary versions they are running, -e.g. [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml). -The version of binaries is explicitly set in JS tests, -e.g. [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44): +- Explicit - JS tests are aware of the binary versions they are running, + e.g. [multiversion.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/multiversion.yml). + The version of binaries is explicitly set in JS tests, + e.g. [jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js](https://github.com/mongodb/mongo/blob/397c8da541940b3fbe6257243f97a342fe7e0d3b/jstests/multiVersion/genericSetFCVUsage/major_version_upgrade.js#L33-L44): ```js const versions = [ - {binVersion: '4.4', featureCompatibilityVersion: '4.4', testCollection: 'four_four'}, - {binVersion: '5.0', featureCompatibilityVersion: '5.0', testCollection: 'five_zero'}, - {binVersion: '6.0', featureCompatibilityVersion: '6.0', testCollection: 'six_zero'}, - {binVersion: 'last-lts', featureCompatibilityVersion: lastLTSFCV, testCollection: 'last_lts'}, { - binVersion: 'last-continuous', - featureCompatibilityVersion: lastContinuousFCV, - testCollection: 'last_continuous' + binVersion: "4.4", + featureCompatibilityVersion: "4.4", + testCollection: "four_four", + }, + { + binVersion: "5.0", + featureCompatibilityVersion: "5.0", + testCollection: "five_zero", + }, + { + binVersion: "6.0", + featureCompatibilityVersion: "6.0", + testCollection: "six_zero", + }, + { + binVersion: "last-lts", + featureCompatibilityVersion: lastLTSFCV, + testCollection: "last_lts", + }, + { + binVersion: "last-continuous", + featureCompatibilityVersion: lastContinuousFCV, + testCollection: "last_continuous", + }, + { + binVersion: "latest", + featureCompatibilityVersion: latestFCV, + testCollection: "latest", }, - {binVersion: 'latest', featureCompatibilityVersion: latestFCV, testCollection: 'latest'}, ]; ``` -* Implicit - JS tests know nothing about the binary versions they are running, -e.g. [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml). -Most of the implicit multiversion suites are using matrix suites, e.g. `replica_sets_last_lts`: +- Implicit - JS tests know nothing about the binary versions they are running, + e.g. [retryable_writes_downgrade.yml](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be208481493fc96/buildscripts/resmokeconfig/suites/retryable_writes_downgrade.yml). + Most of the implicit multiversion suites are using matrix suites, e.g. `replica_sets_last_lts`: ```bash $ python buildscripts/resmoke.py suiteconfig --suite=replica_sets_last_lts @@ -118,7 +132,7 @@ The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be2 of replica set fixture configuration override: ```yaml - fixture: +fixture: num_nodes: 3 old_bin_version: last_lts mixed_bin_versions: new_new_old @@ -128,7 +142,7 @@ The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be2 of sharded cluster fixture configuration override: ```yaml - fixture: +fixture: num_shards: 2 num_rs_nodes_per_shard: 2 old_bin_version: last_lts @@ -139,71 +153,68 @@ The [example](https://github.com/mongodb/mongo/blob/e91cda950e50aa4c707efbdd0be2 of shell fixture configuration override: ```yaml - value: +value: executor: - config: - shell_options: - global_vars: - TestData: - useRandomBinVersionsWithinReplicaSet: 'last-lts' + config: + shell_options: + global_vars: + TestData: + useRandomBinVersionsWithinReplicaSet: "last-lts" ``` - ### Version combinations In implicit multiversion suites the same set of tests may run in similar suites that are using various mixed version combinations. Those version combinations depend on the type of resmoke fixture the suite is running with. These are the recommended version combinations to test against based on the suite fixtures: -* Replica set fixture combinations: - * `last-lts new-new-old` (i.e. suite runs the replica set fixture that spins up the `latest` and -the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd - `latest`, -3rd - `last-lts`, etc.) - * `last-lts new-old-new` - * `last-lts old-new-new` - * `last-continuous new-new-old` - * `last-continuous new-old-new` - * `last-continuous old-new-new` - * Ex: [change_streams](https://github.com/10gen/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml) uses a [`ReplicaSetFixture`](https://github.com/10gen/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50) so the corresponding multiversion suites are - * [`change_streams_last_continuous_new_new_old`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_new_old.yml) - * [`change_streams_last_continuous_new_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_old_new.yml) - * [`change_streams_last_continuous_old_new_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_old_new_new.yml) - * [`change_streams_last_lts_new_new_old`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_lts_new_new_old.yml) - * [`change_streams_last_lts_new_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_lts_new_old_new.yml) - * [`change_streams_last_lts_old_new_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_lts_old_new_new.yml) +- Replica set fixture combinations: -* Sharded cluster fixture combinations: - * `last-lts new-old-old-new` (i.e. suite runs the sharded cluster fixture that spins up the -`latest` and the `last-lts` versions in a sharded cluster that consists of 2 shards with 2-node -replica sets per shard where the 1st node of the 1st shard is the `latest`, 2nd node of 1st -shard - `last-lts`, 1st node of 2nd shard - `last-lts`, 2nd node of 2nd shard - `latest`, etc.) - * `last-continuous new-old-old-new` - * Ex: [change_streams_downgrade](https://github.com/10gen/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml) uses a [`ShardedClusterFixture`](https://github.com/10gen/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408) so the corresponding multiversion suites are - * [`change_streams_downgrade_last_continuous_new_old_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_continuous_new_old_old_new.yml) - * [`change_streams_downgrade_last_lts_new_old_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_lts_new_old_old_new.yml) + - `last-lts new-new-old` (i.e. suite runs the replica set fixture that spins up the `latest` and + the `last-lts` versions in a 3-node replica set where the 1st node is the `latest`, 2nd - `latest`, + 3rd - `last-lts`, etc.) + - `last-lts new-old-new` + - `last-lts old-new-new` + - `last-continuous new-new-old` + - `last-continuous new-old-new` + - `last-continuous old-new-new` + - Ex: [change_streams](https://github.com/10gen/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml) uses a [`ReplicaSetFixture`](https://github.com/10gen/mongo/blob/88d59bfe9d5ee2c9938ae251f7a77a8bf1250a6b/buildscripts/resmokeconfig/suites/change_streams.yml#L50) so the corresponding multiversion suites are + - [`change_streams_last_continuous_new_new_old`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_new_old.yml) + - [`change_streams_last_continuous_new_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_new_old_new.yml) + - [`change_streams_last_continuous_old_new_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_continuous_old_new_new.yml) + - [`change_streams_last_lts_new_new_old`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_lts_new_new_old.yml) + - [`change_streams_last_lts_new_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_lts_new_old_new.yml) + - [`change_streams_last_lts_old_new_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_last_lts_old_new_new.yml) -* Shell fixture combinations: - * `last-lts` (i.e. suite runs the shell fixture that spins up `last-lts` as the `old` versions, -etc.) - * `last-continuous` - * Ex: [initial_sync_fuzzer](https://github.com/10gen/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml) uses a Shell Fixture, so the corresponding multiversion suites are - * [`initial_sync_fuzzer_last_lts`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_lts.yml) - * [`initial_sync_fuzzer_last_continuous`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_continuous.yml) +- Sharded cluster fixture combinations: + - `last-lts new-old-old-new` (i.e. suite runs the sharded cluster fixture that spins up the + `latest` and the `last-lts` versions in a sharded cluster that consists of 2 shards with 2-node + replica sets per shard where the 1st node of the 1st shard is the `latest`, 2nd node of 1st + shard - `last-lts`, 1st node of 2nd shard - `last-lts`, 2nd node of 2nd shard - `latest`, etc.) + - `last-continuous new-old-old-new` + - Ex: [change_streams_downgrade](https://github.com/10gen/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml) uses a [`ShardedClusterFixture`](https://github.com/10gen/mongo/blob/a96b83b2fa7010a5823fefac2469b4a06a697cf1/buildscripts/resmokeconfig/suites/change_streams_downgrade.yml#L408) so the corresponding multiversion suites are + - [`change_streams_downgrade_last_continuous_new_old_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_continuous_new_old_old_new.yml) + - [`change_streams_downgrade_last_lts_new_old_old_new`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/change_streams_downgrade_last_lts_new_old_old_new.yml) + +- Shell fixture combinations: + - `last-lts` (i.e. suite runs the shell fixture that spins up `last-lts` as the `old` versions, + etc.) + - `last-continuous` + - Ex: [initial_sync_fuzzer](https://github.com/10gen/mongo/blob/908625ffdec050a71aa2ce47c35788739f629c60/buildscripts/resmokeconfig/suites/initial_sync_fuzzer.yml) uses a Shell Fixture, so the corresponding multiversion suites are + - [`initial_sync_fuzzer_last_lts`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_lts.yml) + - [`initial_sync_fuzzer_last_continuous`](https://github.com/10gen/mongo/blob/612814f4ce56282c47d501817ba28337c26d7aba/buildscripts/resmokeconfig/matrix_suites/mappings/initial_sync_fuzzer_last_continuous.yml) If `last-lts` and `last-continuous` versions happen to be the same, or last-continuous is EOL, we skip `last-continuous` and run multiversion suites with only `last-lts` combinations in Evergreen. - ## Working with multiversion tasks in Evergreen - ### Multiversion task generation Please refer to mongo-task-generator [documentation](https://github.com/mongodb/mongo-task-generator/blob/master/docs/generating_tasks.md#multiversion-testing) for generating multiversion tasks in Evergreen. - ### Exclude tests from multiversion testing Sometimes tests are not designed to run in multiversion suites. To avoid implicit multiversion diff --git a/docs/evergreen-testing/task_generation.md b/docs/evergreen-testing/task_generation.md index 4dbae1254c8..dde39f101be 100644 --- a/docs/evergreen-testing/task_generation.md +++ b/docs/evergreen-testing/task_generation.md @@ -14,7 +14,7 @@ for details on how it works. ## Configuring a task to be generated In order to generate a task, we typically create a placeholder task. By convention the name of -these tasks should end in "_gen". Most of the time, generated tasks should inherit the +these tasks should end in "\_gen". Most of the time, generated tasks should inherit the [gen_task_template](https://github.com/mongodb/mongo/blob/31864e3866ce9cc54c08463019846ded2ad9e6e5/etc/evergreen_yml_components/definitions.yml#L99-L107) which configures the required dependencies. @@ -31,9 +31,9 @@ Once a placeholder task in defined, you can reference it just like a normal task Task generation is performed as a 2-step process. 1. The first step is to generator the configuration for the generated tasks and send that to - evergreen to actually create the tasks. This is done by the `version_gen` task using the - `mongo-task-generator` tool. This only needs to be done once for the entire version and rerunning - this task will result in a no-op. + evergreen to actually create the tasks. This is done by the `version_gen` task using the + `mongo-task-generator` tool. This only needs to be done once for the entire version and rerunning + this task will result in a no-op. The tasks will be generated in an "inactive" state. This allows us to generate all available tasks, regardless of whether they are meant to be run or not. This way if we choose to run @@ -45,10 +45,10 @@ Task generation is performed as a 2-step process. placeholder tasks from view. 2. After the tasks have been generated, the placeholder tasks are free to run. The placeholder tasks - simply find the task generated for them and mark it activated. Since generated tasks are - created in the "inactive" state, this will activate any generated tasks whose placeholder task - runs. This enables users to select tasks to run on the initial task selection page even though - the tasks have not yet been generated. + simply find the task generated for them and mark it activated. Since generated tasks are + created in the "inactive" state, this will activate any generated tasks whose placeholder task + runs. This enables users to select tasks to run on the initial task selection page even though + the tasks have not yet been generated. **Note**: While this 2-step process allows a similar user experience to working with normal tasks, it does create a few UI quirks. For example, evergreen will hide "inactive" tasks in the UI, as a diff --git a/docs/evergreen-testing/task_timeouts.md b/docs/evergreen-testing/task_timeouts.md index e370aad22c9..b2c70e85af7 100644 --- a/docs/evergreen-testing/task_timeouts.md +++ b/docs/evergreen-testing/task_timeouts.md @@ -4,12 +4,12 @@ There are two types of timeouts that [evergreen supports](https://github.com/evergreen-ci/evergreen/wiki/Project-Commands#timeoutupdate): -* **Exec timeout**: The _exec_ timeout is the overall timeout for a task. Once the total runtime for -a test hits this value, the timeout logic will be triggered. This value is specified by -**exec_timeout_secs** in the evergreen configuration. -* **Idle timeout**: The _idle_ timeout is the amount of time in which evergreen will wait for -output to be created before it considers the task hung and triggers timeout logic. This value -is specified by **timeout_secs** in the evergreen configuration. +- **Exec timeout**: The _exec_ timeout is the overall timeout for a task. Once the total runtime for + a test hits this value, the timeout logic will be triggered. This value is specified by + **exec_timeout_secs** in the evergreen configuration. +- **Idle timeout**: The _idle_ timeout is the amount of time in which evergreen will wait for + output to be created before it considers the task hung and triggers timeout logic. This value + is specified by **timeout_secs** in the evergreen configuration. **Note**: In most cases, **exec_timeout** is usually the more useful of the timeouts. @@ -17,19 +17,19 @@ is specified by **timeout_secs** in the evergreen configuration. There are a few ways in which the timeout can be determined for a task running in evergreen. -* **Specified in 'etc/evergreen.yml'**: Timeout can be specified directly in the 'evergreen.yml' file, -both on tasks and build variants. This can be useful for setting default timeout values, but is limited -since different build variants frequently have different runtime characteristics and it is not possible -to set timeouts for a task running on a specific build variant. +- **Specified in 'etc/evergreen.yml'**: Timeout can be specified directly in the 'evergreen.yml' file, + both on tasks and build variants. This can be useful for setting default timeout values, but is limited + since different build variants frequently have different runtime characteristics and it is not possible + to set timeouts for a task running on a specific build variant. -* **etc/evergreen_timeouts.yml**: The 'etc/evergreen_timeouts.yml' file for overriding timeouts -for specific tasks on specific build variants. This provides a work-around for the limitations of -specifying the timeouts directly in the 'evergreen.yml'. In order to use this method, the task -must run the "determine task timeout" and "update task timeout expansions" functions at the beginning -of the task evergreen definition. Most resmoke tasks already do this. +- **etc/evergreen_timeouts.yml**: The 'etc/evergreen_timeouts.yml' file for overriding timeouts + for specific tasks on specific build variants. This provides a work-around for the limitations of + specifying the timeouts directly in the 'evergreen.yml'. In order to use this method, the task + must run the "determine task timeout" and "update task timeout expansions" functions at the beginning + of the task evergreen definition. Most resmoke tasks already do this. -* **buildscripts/evergreen_task_timeout.py**: This is the script that reads the 'etc/evergreen_timeouts.yml' -file and calculates the timeout to use. Additionally, it will check the historic test results of the -task being run and see if there is enough information to calculate timeouts based on that. It can -also be used for more advanced ways of determining timeouts (e.g. the script is used to set much -more aggressive timeouts on tasks that are run in the commit-queue). +- **buildscripts/evergreen_task_timeout.py**: This is the script that reads the 'etc/evergreen_timeouts.yml' + file and calculates the timeout to use. Additionally, it will check the historic test results of the + task being run and see if there is enough information to calculate timeouts based on that. It can + also be used for more advanced ways of determining timeouts (e.g. the script is used to set much + more aggressive timeouts on tasks that are run in the commit-queue). diff --git a/docs/exception_architecture.md b/docs/exception_architecture.md index 05c6eb752a1..913b3330caa 100644 --- a/docs/exception_architecture.md +++ b/docs/exception_architecture.md @@ -1,6 +1,7 @@ # Exception Architecture MongoDB code uses the following types of assertions that are available for use: + - `uassert` and `iassert` - Checks for per-operation user errors. Operation-fatal. - `tassert` @@ -15,7 +16,7 @@ MongoDB code uses the following types of assertions that are available for use: - Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should never be null", "we should always be locked"). -__Note__: Calling C function `assert` is not allowed. Use one of the above instead. +**Note**: Calling C function `assert` is not allowed. Use one of the above instead. The following types of assertions are deprecated: @@ -89,13 +90,13 @@ when we expect a failure, a failure might be recoverable, or failure accounting ### Choosing a unique location number -The current convention for choosing a unique location number is to use the 5 digit SERVER ticket number -for the ticket being addressed when the assertion is added, followed by a two digit counter to distinguish -between codes added as part of the same ticket. For example, if you're working on SERVER-12345, the first -error code would be 1234500, the second would be 1234501, etc. This convention can also be used for LOGV2 +The current convention for choosing a unique location number is to use the 5 digit SERVER ticket number +for the ticket being addressed when the assertion is added, followed by a two digit counter to distinguish +between codes added as part of the same ticket. For example, if you're working on SERVER-12345, the first +error code would be 1234500, the second would be 1234501, etc. This convention can also be used for LOGV2 logging id numbers. -The only real constraint for unique location numbers is that they must be unique across the codebase. This is +The only real constraint for unique location numbers is that they must be unique across the codebase. This is verified at compile time with a [python script][errorcodes_py]. ## Exception @@ -120,7 +121,7 @@ upwards harmlessly. The code should also expect, and properly handle, `UserExcep MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g., `BadValue`) are used externally to pass errors over the wire and to clients. These error codes are -the means for MongoDB processes (e.g., *mongod* and *mongo*) to communicate errors, and are visible +the means for MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible to client applications. Other error codes are used internally to indicate the underlying reason for a failed operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed to callback functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is @@ -162,10 +163,9 @@ Gotchas to watch out for: properly. - Think about the location of your asserts in constructors, as the destructor would not be called. But at a minimum, use `wassert` a lot therein, we want to know if something is wrong. -- Do __not__ throw in destructors or allow exceptions to leak out (if you call a function that +- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that may throw). - [raii]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization [error_codes_yml]: ../src/mongo/base/error_codes.yml [periodic_runner_h]: ../src/mongo/util/periodic_runner.h diff --git a/docs/fail_points.md b/docs/fail_points.md index 8c1a20613f4..e80e9e9062b 100644 --- a/docs/fail_points.md +++ b/docs/fail_points.md @@ -13,12 +13,13 @@ For more on what test-only means and how to enable the `configureFailPoint` comm A fail point must first be defined using `MONGO_FAIL_POINT_DEFINE(myFailPoint)`. This statement adds the fail point to a registry and allows it to be evaluated in code. There are three common patterns for evaluating a fail point: -- Exercise a rarely used branch: - `if (whenPigsFly || myFailPoint.shouldFail()) { ... }` -- Block until the fail point is unset: - `myFailPoint.pauseWhileSet();` -- Use the fail point's payload to perform custom behavior: - `myFailPoint.execute([](const BSONObj& data) { useMyPayload(data); };` + +- Exercise a rarely used branch: + `if (whenPigsFly || myFailPoint.shouldFail()) { ... }` +- Block until the fail point is unset: + `myFailPoint.pauseWhileSet();` +- Use the fail point's payload to perform custom behavior: + `myFailPoint.execute([](const BSONObj& data) { useMyPayload(data); };` For more complete usage, see the [fail point header][fail_point] or the [fail point tests][fail_point_test]. @@ -34,11 +35,11 @@ a `FailPointEnableBlock` to enable and configure the fail point for a given bloc a fail point can also be set via setParameter by its name prefixed with "failpoint." (e.g., "failpoint.myFailPoint"). -Users can also wait until a fail point has been evaluated a certain number of times ***over its -lifetime***. A `waitForFailPoint` command request will send a response back when the fail point has +Users can also wait until a fail point has been evaluated a certain number of times **_over its +lifetime_**. A `waitForFailPoint` command request will send a response back when the fail point has been evaluated the given number of times. For ease of use, the `configureFailPoint` JavaScript -helper returns an object that can be used to wait a certain amount of times ***from when the fail -point was enabled***. In C++ tests, users can invoke `FailPoint::waitForTimesEntered()` for similar +helper returns an object that can be used to wait a certain amount of times **_from when the fail +point was enabled_**. In C++ tests, users can invoke `FailPoint::waitForTimesEntered()` for similar behavior. `FailPointEnableBlock` records the amount of times the fail point had been evaluated when it was constructed, accessible via `FailPointEnableBlock::initialTimesEntered()`. diff --git a/docs/futures_and_promises.md b/docs/futures_and_promises.md index 550f05dff54..545c05e71e4 100644 --- a/docs/futures_and_promises.md +++ b/docs/futures_and_promises.md @@ -10,21 +10,23 @@ continue performing other work instead of waiting synchronously for those result ## A Few Definitions -- A `Future` is a type that will eventually contain either a `T`, or an error indicating why the - `T` could not be produced (in MongoDB, the error will take the form of either an exception or a - `Status`). -- A `Promise` is a single-shot producer of a value (i.e., a `T`) for an associated `Future`. - That is, to put a value or error in a `Future` and make it ready for use by consumers, the - value is emplaced in the corresponding `Promise`. -- A continuation is a functor that can be chained on to `Future` that will execute only once the - `T` (or error) is available and ready. A continuation in this way can "consume" the produced `T`, - and handle any errors. +- A `Future` is a type that will eventually contain either a `T`, or an error indicating why the + `T` could not be produced (in MongoDB, the error will take the form of either an exception or a + `Status`). +- A `Promise` is a single-shot producer of a value (i.e., a `T`) for an associated `Future`. + That is, to put a value or error in a `Future` and make it ready for use by consumers, the + value is emplaced in the corresponding `Promise`. +- A continuation is a functor that can be chained on to `Future` that will execute only once the + `T` (or error) is available and ready. A continuation in this way can "consume" the produced `T`, + and handle any errors. ## A First Example + To build some intuition around futures and promises, let's see how they might be used. As an example, we'll look at how they help us rewrite some slow blocking code into fast, concurrent code. As a distributed system, MongoDB often needs to send RPCs from one machine to another. A sketch of a simple, synchronous way of doing so might look like this: + ```c++ Message call(Message& toSend) { ... @@ -40,13 +42,15 @@ Message call(Message& toSend) { ... } ``` + This is fine, but some parts of networking are expensive! `TransportSession::sinkMessage` involves making expensive system calls to enqueue our message into the kernel's networking stack, and `TransportSession::sourceMessage` entails waiting for a network round-trip to occur! We don't want busy worker threads to be forced to wait around to hear back from the kernel for these sorts of expensive operations. Instead, we'd rather let these threads move on to perform other work, and -handle the response from our expensive networking operations when they're available. Futures and +handle the response from our expensive networking operations when they're available. Futures and promises allow us to do this. We can rewrite our example as follows: + ```c++ Future call(Message& toSend) { ... @@ -60,6 +64,7 @@ Future call(Message& toSend) { }); } ``` + First, notice that our calls to `TransportSession::sourceMessage` and `TransportSession::sinkMessage` have been replaced with calls to asynchronous versions of those functions. These asynchronous versions are future-returning; they don't block, but also don't return @@ -82,16 +87,18 @@ thread blocking and waiting. This is explained in more detail in the "How Are Re Down Continuation Chains?" section below. ## Filling In Some Details + The example above hopefully showed us how futures can be used to structure asynchronous programs at a high level, but we've left out some important details about how they work. ### How Are Futures Fulfilled With Values? + In our example, we looked at how some code that needs to wait for results can use `Future`s to be written in an asynchronous, performant way. But some thread running elsewhere needs to actually "fulfill" those futures with a value or error. Threads can fulfull the core "promise" of a `Future` - that it will eventually contain a `T` or an error - by using the appropriately named `Promise` type. Every pending `Future` is associated with exactly one corresponding -`Promise` that can be used to ready the `Future`, providing it with a value. (Note that a +`Promise` that can be used to ready the `Future`, providing it with a value. (Note that a `Future` may also be "born ready"/already filled with a value when constructed). The `Future` can be "made ready" by emplacing a value or error in the associated promise with `Promise::emplaceValue`, `Promise::setError`, or related helper member functions (see the @@ -121,6 +128,7 @@ extract as many associated `SharedSemiFuture`s as you'd like from a `SharedPromi `getFuture()` member function. ### Where Do Continuations Run? + In our example, we chained continuations onto futures using functions like `Future::then()`, and explained that the continuations we chained will only be invoked once the future we've chained them onto is ready. But we haven't yet specified how this continuation is invoked: what thread will @@ -143,10 +151,12 @@ Fortunately, the service can enforce these guarantees using two types closely re `Future`: the types `SemiFuture` and `ExecutorFuture`. #### SemiFuture + `SemiFuture`s are like regular futures, except that continuations cannot be chained to them. Instead, values and errors can only be extracted from them via blocking methods, which threads can call if they are willing to block. A `Future` can always be transformed into a `SemiFuture` using the member function `Future::semi()`. Let's look at a quick example to make this clearer: + ```c++ // Code producing a `SemiFuture` SemiFuture SomeAsyncService::requestWork() { @@ -168,6 +178,7 @@ SemiFuture sf = SomeAsyncService::requestWork(); // sf.onError(...) won't compile for the same reason auto res = sf.get(); // OK; get blocks until sf is ready ``` + Our example begins when a thread makes a request for some asynchronous work to be performed by some service, using `SomeAsyncService::requestWork()`. As was the case in our initial example, this thread receives back a future that will be readied when its request has been completed and a value @@ -181,6 +192,7 @@ run the continuations. By instead returning a `SemiFuture`, the `SomeAsyncServic that requests work from it from using its own internal `_privateExecutor` resource. #### ExecutorFuture + `ExecutorFuture`s are another variation on the core `Future` type; they are like regular `Future`s, except for the fact that code constructing an `ExecutorFuture` is required to provide an [executor][executor] on which any continuations chained to the future will be run. (An executor is @@ -193,20 +205,23 @@ clearer, so we'll reuse the one above. Let's imagine the thread that scheduled w `SomeAsyncService::requestWork()` can't afford to block until the result `SemiFuture` is readied. Instead, it consumes the asynchronous result by specifying a callback to run and an executor on which to run it like so: + ```c++ // Code consuming a `SemiFuture` SomeAsyncService::requestWork() // <-- temporary `SemiFuture` .thenRunOn(_executor) // <-- Transformed into a `ExecutorFuture` .then([](Work w) { doMoreWork(w); }); // <-- Which supports chaining ``` + By calling `.thenRunOn(_executor)` on the `SemiFuture` returned by `SomeAsyncService::requestWork()`, we transform it from a `SemiFuture` to an `ExecutorFuture`. This allows us to again chain continuations to run when the future is ready, but instead of those continuations being run on whatever thread readied the future, they will be run on `_executor`. In -this way, the result of the future returned by `SomeAsyncService::requestWork()` is able to be +this way, the result of the future returned by `SomeAsyncService::requestWork()` is able to be consumed by the `doMoreWork` function which will run on `_executor`. ### How Are Results Propagated Down Continuation Chains? + In our example for an asyncified `call()` function above, we saw that we could attach continuations onto futures, like the one returned by `TransportSession::asyncSinkMessage`. We also saw that once we attached one continuation to a future, we could attach subsequent ones, forming a continuation @@ -219,15 +234,15 @@ in the form of a `Status` or `DBException`. Because a `Future` can resolve to this way, we can chain different continuations to a `Future` to consume its result, depending on what the type of the result is (i.e. a `T` or `Status`). We mentioned above that `.then()` is used to chain continuations that run when the future to which the continuation is chained resolves -successfully. As a result, when a continuation is chained via `.then()` to a `Future`, the +successfully. As a result, when a continuation is chained via `.then()` to a `Future`, the continuation must accept a `T`, the result of the `Future`, as an argument to consume. In the -case of a `Future`, continuations chained via `.then()` accept no arguments. Similarly, as +case of a `Future`, continuations chained via `.then()` accept no arguments. Similarly, as `.onError()` is used to chain continuations that run when the future is resolved with an error, these continuations must accept a `Status` as argument, which contains the error the future it is -chained to resolves with. Lastly, as `.onCompletion()` is used to chain continuations that run in +chained to resolves with. Lastly, as `.onCompletion()` is used to chain continuations that run in case a `Future` resolves with success or error, continuations chained via this function must accept an argument that can contain the results of successful resolution of the chained-to future or -an error. When `T` is non-void, continuations chained via `.onCompletion()` must therefore accept a +an error. When `T` is non-void, continuations chained via `.onCompletion()` must therefore accept a `StatusWith` as argument, which will contain a `T` if the chained-to future resolved successfully and an error status otherwise. If `T` is void, a continuation chained via `.onCompletion()` must accept a `Status` as argument, indicating whether or not the future the continuation is chained to @@ -249,7 +264,7 @@ will be bypassed and will never run. Next, the successful result reaches the con via `.then()`, which must take no arguments as `TransportLayer::asyncSinkMessage` returns a `Future`. Because the future returned by `TransportLayer::asyncSinkMessage` resolved successfully, the continuation chained via `.then()` does run. The result of this continuation is -the future returned by `TransportLayer::asyncSourceMessage`. When this future resolves, the result +the future returned by `TransportLayer::asyncSourceMessage`. When this future resolves, the result will traverse the remaining continuation chain, and find the continuation chained via `.onCompletion()`, which always accepts the result of a future, however it resolves, and therefore is run. @@ -281,7 +296,7 @@ extract the same error in the form of a `Status`. In the case of `.getAsync()`, converted to `Status`, and crucially, callables chained as continuations via `.getAsync()` cannot throw any exceptions, as there is no appropriate context with which to handle an asynchronous exception. If an exception is thrown from a continuation chained via `.getAsync()`, the entire -process will be terminated (i.e. the program will crash). +process will be terminated (i.e. the program will crash). ## Notes and Links @@ -291,6 +306,7 @@ and all the related types, check out the [header file][future] and search for th function you're interested in. ### Future Utilities + We have many utilities written to help make it easier for you to work with futures; check out [future_util.h][future_util.h] to see them. Their [unit tests][utilUnitTests] also help elucidate how they can be useful. Additionally, when making requests for asynchronous work through future-ful @@ -300,13 +316,11 @@ the associated utilities. For more on them, see their architecture guide in [thi README][cancelationArch]. ## General Promise/Future Docs + For intro-documentation on programming with promises and futures, this blog post about future use at [Facebook][fb] and the documentation for the use of promises and futures at [Twitter][twtr] are also very helpful. - - - [future]: ../src/mongo/util/future.h [future_util.h]: ../src/mongo/util/future_util.h [executor]: ../src/mongo/util/out_of_line_executor.h diff --git a/docs/golden_data_test_framework.md b/docs/golden_data_test_framework.md index 1187c5aa5fc..f695be74c31 100644 --- a/docs/golden_data_test_framework.md +++ b/docs/golden_data_test_framework.md @@ -1,117 +1,128 @@ # Overview -Golden Data test framework provides ability to run and manage tests that produce an output which is -verified by comparing it to the checked-in, known valid output. Any differences result in test + +Golden Data test framework provides ability to run and manage tests that produce an output which is +verified by comparing it to the checked-in, known valid output. Any differences result in test failure and either the code or expected output has to be updated. -Golden Data tests excel at bulk diffing of failed test outputs and bulk accepting of new test -outputs. +Golden Data tests excel at bulk diffing of failed test outputs and bulk accepting of new test +outputs. # When to use Golden Data tests? -* Code under test produces a deterministic output: That way tests can consistently succeed or fail. -* Incremental changes to code under test or test fixture result in incremental changes to the -output. -* As an alternative to ASSERT for large output comparison: Serves the same purpose, but provides - tools for diffing/updating. -* The outputs can't be objectively verified (e.g. by verifying well known properties). Examples: - * Verifying if sorting works, can be done by verifying that output is sorted. SHOULD NOT use - Golden Data tests. - * Verifying that pretty printing works, MAY use Golden Data tests to verify the output, as there - might not be well known properties or those properties can easily change. -* As stability/versioning/regression testing. Golden Data tests by storing recorded outputs, are -good candidate for preserving behavior of legacy versions or detecting undesired changes in -behavior, even in cases when new behavior meets other correctness criteria. + +- Code under test produces a deterministic output: That way tests can consistently succeed or fail. +- Incremental changes to code under test or test fixture result in incremental changes to the + output. +- As an alternative to ASSERT for large output comparison: Serves the same purpose, but provides + tools for diffing/updating. +- The outputs can't be objectively verified (e.g. by verifying well known properties). Examples: + - Verifying if sorting works, can be done by verifying that output is sorted. SHOULD NOT use + Golden Data tests. + - Verifying that pretty printing works, MAY use Golden Data tests to verify the output, as there + might not be well known properties or those properties can easily change. +- As stability/versioning/regression testing. Golden Data tests by storing recorded outputs, are + good candidate for preserving behavior of legacy versions or detecting undesired changes in + behavior, even in cases when new behavior meets other correctness criteria. # Best practices for working with Golden Data tests -* Tests MUST produce text output that is diffable can be inspected in the pull request. -* Tests MUST produce an output that is deterministic and repeatable. Including running on different -platforms. Same as with ASSERT_EQ. -* Tests SHOULD produce an output that changes incrementally in response to the incremental test or -code changes. +- Tests MUST produce text output that is diffable can be inspected in the pull request. -* Multiple test variations MAY be bundled into a single test. Recommended when testing same feature -with different inputs. This helps reviewing the outputs by grouping similar tests together, and also -reduces the number of output files. +- Tests MUST produce an output that is deterministic and repeatable. Including running on different + platforms. Same as with ASSERT_EQ. +- Tests SHOULD produce an output that changes incrementally in response to the incremental test or + code changes. -* Changes to test fixture or test code that affect non-trivial amount test outputs MUST BE done in -separate pull request from production code changes: - * Pull request for test code only changes can be easily reviewed, even if large number of test - outputs are modified. While such changes can still introduce merge conflicts, they don't introduce - risk of regression (if outputs were valid - * Pull requests with mixed production +- Multiple test variations MAY be bundled into a single test. Recommended when testing same feature + with different inputs. This helps reviewing the outputs by grouping similar tests together, and also + reduces the number of output files. -* Tests in the same suite SHOULD share the fixtures when appropriate. This reduces cost of adding -new tests to the suite. Changes to the fixture may only affect expected outputs from that fixtures, -and those output can be updated in bulk. +- Changes to test fixture or test code that affect non-trivial amount test outputs MUST BE done in + separate pull request from production code changes: -* Tests in different suites SHOULD NOT reuse/share fixtures. Changes to the fixture can affect large - number of expected outputs. - There are exceptions to that rule, and tests in different suites MAY reuse/share fixtures if: - * Test fixture is considered stable and changes rarely. - * Tests suites are related, either by sharing tests, or testing similar components. - * Setup/teardown costs are excessive, and sharing the same instance of a fixture for performance - reasons can't be avoided. + - Pull request for test code only changes can be easily reviewed, even if large number of test + outputs are modified. While such changes can still introduce merge conflicts, they don't introduce + risk of regression (if outputs were valid + - Pull requests with mixed production -* Tests SHOULD print both inputs and outputs of the tested code. This makes it easy for reviewers to -verify of the expected outputs are indeed correct by having both input and output next to each -other. -Otherwise finding the input used to produce the new output may not be practical, and might not even -be included in the diff. +- Tests in the same suite SHOULD share the fixtures when appropriate. This reduces cost of adding + new tests to the suite. Changes to the fixture may only affect expected outputs from that fixtures, + and those output can be updated in bulk. -* When resolving merge conflicts on the expected output files, one of the approaches below SHOULD be -used: - * "Accept theirs", rerun the tests and verify new outputs. This doesn't require knowledge of - production/test code changes in "theirs" branch, but requires re-review and re-acceptance of c - hanges done by local branch. - * "Accept yours", rerun the tests and verify the new outputs. This approach requires knowledge of - production/test code changes in "theirs" branch. However, if such changes resulted in - straightforward and repetitive output changes, like due to printing code change or fixture change, - it may be easier to verify than reinspecting local changes. +- Tests in different suites SHOULD NOT reuse/share fixtures. Changes to the fixture can affect large + number of expected outputs. + There are exceptions to that rule, and tests in different suites MAY reuse/share fixtures if: -* Expected test outputs SHOULD be reused across tightly-coupled test suites. The suites are -tightly-coupled if: - * Share the same tests, inputs and fixtures. - * Test similar scenarios. - * Test different code paths, but changes to one of the code path is expected to be accompanied by - changes to the other code paths as well. - - Tests SHOULD use different test files, for legitimate and expected output differences between - those suites. + - Test fixture is considered stable and changes rarely. + - Tests suites are related, either by sharing tests, or testing similar components. + - Setup/teardown costs are excessive, and sharing the same instance of a fixture for performance + reasons can't be avoided. - Examples: - * Functional tests, integration tests and unit tests that test the same behavior in different - environments. - * Versioned tests, where expected behavior is the same for majority of test inputs/scenarios. +- Tests SHOULD print both inputs and outputs of the tested code. This makes it easy for reviewers to + verify of the expected outputs are indeed correct by having both input and output next to each + other. + Otherwise finding the input used to produce the new output may not be practical, and might not even + be included in the diff. -* AVOID manually modifying expected output files. Those files are considered to be auto generated. -Instead, run the tests and then copy the generated output as a new expected output file. See "How to - diff and accept new test outputs" section for instructions. +- When resolving merge conflicts on the expected output files, one of the approaches below SHOULD be + used: + - "Accept theirs", rerun the tests and verify new outputs. This doesn't require knowledge of + production/test code changes in "theirs" branch, but requires re-review and re-acceptance of c + hanges done by local branch. + - "Accept yours", rerun the tests and verify the new outputs. This approach requires knowledge of + production/test code changes in "theirs" branch. However, if such changes resulted in + straightforward and repetitive output changes, like due to printing code change or fixture change, + it may be easier to verify than reinspecting local changes. + +- Expected test outputs SHOULD be reused across tightly-coupled test suites. The suites are + tightly-coupled if: + + - Share the same tests, inputs and fixtures. + - Test similar scenarios. + - Test different code paths, but changes to one of the code path is expected to be accompanied by + changes to the other code paths as well. + + Tests SHOULD use different test files, for legitimate and expected output differences between + those suites. + + Examples: + + - Functional tests, integration tests and unit tests that test the same behavior in different + environments. + - Versioned tests, where expected behavior is the same for majority of test inputs/scenarios. + +- AVOID manually modifying expected output files. Those files are considered to be auto generated. + Instead, run the tests and then copy the generated output as a new expected output file. See "How to + diff and accept new test outputs" section for instructions. # How to use write Golden Data tests? -Each golden data test should produce a text output that will be later verified. The output format -must be text, but otherwise test author can choose a most appropriate output format (text, json, -bson, yaml or mixed). If a test consists of multiple variations each variation should be clearly -separated from each other. -Note: Test output is usually only written. It is ok to focus on just writing serialization/printing -code without a need to provide deserialization/parsing code. +Each golden data test should produce a text output that will be later verified. The output format +must be text, but otherwise test author can choose a most appropriate output format (text, json, +bson, yaml or mixed). If a test consists of multiple variations each variation should be clearly +separated from each other. + +Note: Test output is usually only written. It is ok to focus on just writing serialization/printing +code without a need to provide deserialization/parsing code. When actual test output is different from expected output, test framework will fail the test, log - both outputs and also create following files, that can be inspected later: -* /actual/ - with actual test output -* /expected/ - with expected test output +both outputs and also create following files, that can be inspected later: + +- /actual/ - with actual test output +- /expected/ - with expected test output ## CPP tests -`::mongo::unittest::GoldenTestConfig` - Provides a way to configure test suite(s). Defines where the - expected output files are located in the source repo. -`::mongo::unittest::GoldenTestContext` - Provides an output stream where tests should write their +`::mongo::unittest::GoldenTestConfig` - Provides a way to configure test suite(s). Defines where the +expected output files are located in the source repo. + +`::mongo::unittest::GoldenTestContext` - Provides an output stream where tests should write their outputs. Verifies the output with the expected output that is in the source repo See: [golden_test.h](../src/mongo/unittest/golden_test.h) **Example:** + ```c++ #include "mongo/unittest/golden_test.h" @@ -145,24 +156,26 @@ TEST_F(MySuiteFixture, MyFeatureBTest) { } ``` -Also see self-test: +Also see self-test: [golden_test_test.cpp](../src/mongo/unittest/golden_test_test.cpp) # How to diff and accept new test outputs on a workstation Use buildscripts/golden_test.py command line tool to manage the test outputs. This includes: -* diffing all output differences of all tests in a given test run output. -* accepting all output differences of all tests in a given test run output. + +- diffing all output differences of all tests in a given test run output. +- accepting all output differences of all tests in a given test run output. ## Setup + buildscripts/golden_test.py requires a one-time workstation setup. Note: this setup is only required to use buildscripts/golden_test.py itself. It is NOT required to just run the Golden Data tests when not using buildscripts/golden_test.py. 1. Create a yaml config file, as described by [Appendix - Config file reference](#appendix---config-file-reference). -2. Set GOLDEN_TEST_CONFIG_PATH environment variable to config file location, so that is available -when running tests and when running buildscripts/golden_test.py tool. +2. Set GOLDEN_TEST_CONFIG_PATH environment variable to config file location, so that is available + when running tests and when running buildscripts/golden_test.py tool. ### Automatic Setup @@ -171,6 +184,7 @@ Use buildscripts/golden_test.py builtin setup to initialize default config for y **Instructions for Linux** Run buildscripts/golden_test.py setup utility + ```bash buildscripts/golden_test.py setup ``` @@ -179,6 +193,7 @@ buildscripts/golden_test.py setup Run buildscripts/golden_test.py setup utility. You may be asked for a password, when not running in "Run as administrator" shell. + ```cmd c:\python\python310\python.exe buildscripts/golden_test.py setup ``` @@ -188,36 +203,43 @@ c:\python\python310\python.exe buildscripts/golden_test.py setup This is the same config as that would be setup by the [Automatic Setup](#automatic-setup) This config uses a unique subfolder folder for each test run. (default) - * Allows diffing each test run separately. - * Works with multiple source repos. + +- Allows diffing each test run separately. +- Works with multiple source repos. **Instructions for Linux/macOS:** This config uses a unique subfolder folder for each test run. (default) - * Allows diffing each test run separately. - * Works with multiple source repos. + +- Allows diffing each test run separately. +- Works with multiple source repos. Create ~/.golden_test_config.yml with following contents: + ```yaml outputRootPattern: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% diffCmd: git diff --no-index "{{expected}}" "{{actual}}" ``` Update .bashrc, .zshrc + ```bash export GOLDEN_TEST_CONFIG_PATH=~/.golden_test_config.yml ``` + alternatively modify /etc/environment or other configuration if needed by Debugger/IDE etc. **Instructions for Windows:** Create %LocalAppData%\.golden_test_config.yml with the following contents: + ```yaml outputRootPattern: 'C:\Users\Administrator\AppData\Local\Temp\test_output\out-%%%%-%%%%-%%%%-%%%%' diffCmd: 'git diff --no-index "{{expected}}" "{{actual}}"' ``` Add GOLDEN_TEST_CONFIG_PATH=~/.golden_test_config.yml environment variable: + ```cmd runas /profile /user:administrator "setx GOLDEN_TEST_CONFIG_PATH %LocalAppData%\.golden_test_config.yml" ``` @@ -225,6 +247,7 @@ runas /profile /user:administrator "setx GOLDEN_TEST_CONFIG_PATH %LocalAppData%\ ## Usage ### List all available test outputs + ```bash $> buildscripts/golden_test.py list ``` @@ -234,28 +257,34 @@ $> buildscripts/golden_test.py list ```bash $> buildscripts/golden_test.py diff ``` + This will run the diffCmd that was specified in the config file ### Diff test results from most recent test run: + ```bash $> buildscripts/golden_test.py accept ``` -This will copy all actual test outputs from that test run to the source repo and new expected + +This will copy all actual test outputs from that test run to the source repo and new expected outputs. - ### Get paths from most recent test run (to be used by custom tools) + Get expected and actual output paths for most recent test run: + ```bash $> buildscripts/golden_test.py get ``` Get expected and actual output paths for most most recent test run: + ```bash $> buildscripts/golden_test.py get_root ``` Get all available commands and options: + ```bash $> buildscripts/golden_test.py --help ``` @@ -263,8 +292,9 @@ $> buildscripts/golden_test.py --help # How to diff test results from a non-workstation test run ## Bulk folder diff the results: -1. Parse the test log to find the root output locations where expected and actual output files were -written. + +1. Parse the test log to find the root output locations where expected and actual output files were + written. 2. Then compare the folders to see the differences for tests that failed. **Example: (linux/macOS)** @@ -277,9 +307,11 @@ $> diff -ruN --unidirectional-new-file --color=always cat test.log | grep "^{" | jq -s '.[] | select(.id == 6273501 ) | .attr.testPath,.attr.expectedOutput,.attr.actualOutput' @@ -288,31 +320,28 @@ $> cat test.log | grep "^{" | jq -s '.[] | select(.id == 6273501 ) | .attr.testP # Appendix - Config file reference Golden Data test config file is a YAML file specified as: + ```yaml -outputRootPattern: +outputRootPattern: type: String optional: true description: - Root path patten that will be used to write expected and actual test outputs for all tests - in the test run. - If not specified a temporary folder location will be used. - Path pattern string may use '%' characters in the last part of the path. '%' characters in - the last part of the path will be replaced with random lowercase hexadecimal digits. - examples: - /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% - /var/tmp/test_output + Root path patten that will be used to write expected and actual test outputs for all tests + in the test run. + If not specified a temporary folder location will be used. + Path pattern string may use '%' characters in the last part of the path. '%' characters in + the last part of the path will be replaced with random lowercase hexadecimal digits. + examples: /var/tmp/test_output/out-%%%%-%%%%-%%%%-%%%% + /var/tmp/test_output -diffCmd: +diffCmd: type: String optional: true - description: - Shell command to diff a single golden test run output. + description: Shell command to diff a single golden test run output. {{expected}} and {{actual}} variables should be used and will be replaced with expected and actual output folder paths respectively. This property is not used to decide whether the test passes or fails; it is only used to display differences once we've decided that a test failed. - examples: - git diff --no-index "{{expected}}" "{{actual}}" - diff -ruN --unidirectional-new-file --color=always "{{expected}}" "{{actual}}" + examples: git diff --no-index "{{expected}}" "{{actual}}" + diff -ruN --unidirectional-new-file --color=always "{{expected}}" "{{actual}}" ``` - diff --git a/docs/libfuzzer.md b/docs/libfuzzer.md index 3ab836e641d..1839639f26d 100644 --- a/docs/libfuzzer.md +++ b/docs/libfuzzer.md @@ -21,7 +21,7 @@ LibFuzzer implements `int main`, and expects to be linked with an object file which provides the function under test. You will achieve this by writing a cpp file which implements -``` cpp +```cpp extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { // Your code here } @@ -39,16 +39,16 @@ lot of freedom in exactly what you choose to do. Just make sure your function crashes or produces an invariant when something interesting happens! As just a few ideas: -- You might choose to call multiple implementations of a single - operation, and validate that they produce the same output when - presented the same input. -- You could tease out individual bytes from `Data` and provide them as - different arguments to the function under test. +- You might choose to call multiple implementations of a single + operation, and validate that they produce the same output when + presented the same input. +- You could tease out individual bytes from `Data` and provide them as + different arguments to the function under test. Finally, your cpp file will need a SCons target. There is a method which defines fuzzer targets, much like how we define unittests. For example: -``` python +```python env.CppLibfuzzerTest( target='op_msg_fuzzer', source=[ @@ -83,5 +83,5 @@ acquire and re-use a corpus from an earlier commit, if it can. # References -- [LibFuzzer's official - documentation](https://llvm.org/docs/LibFuzzer.html) +- [LibFuzzer's official + documentation](https://llvm.org/docs/LibFuzzer.html) diff --git a/docs/linting.md b/docs/linting.md index 6149808671d..815482d9b10 100644 --- a/docs/linting.md +++ b/docs/linting.md @@ -3,95 +3,104 @@ ## C++ Linters ### `clang-format` -The `buildscripts/clang_format.py` wrapper script runs the `clang-format` linter. You can see the + +The `buildscripts/clang_format.py` wrapper script runs the `clang-format` linter. You can see the usage message for the wrapper by running `buildscripts/clang_format.py --help`. Ex: `buildscripts/clang_format.py lint` -| Linter | Configuration File(s) | Help Command | Documentation | -| --- | --- | --- | --- | -| `clang-format` | `.clang-format` | `clang-format --help` | [https://clang.llvm.org/docs/ClangFormat.html](https://clang.llvm.org/docs/ClangFormat.html) | +| Linter | Configuration File(s) | Help Command | Documentation | +| -------------- | --------------------- | --------------------- | -------------------------------------------------------------------------------------------- | +| `clang-format` | `.clang-format` | `clang-format --help` | [https://clang.llvm.org/docs/ClangFormat.html](https://clang.llvm.org/docs/ClangFormat.html) | ### `clang-tidy` -The `evergreen/run_clang_tidy.sh` shell script runs the `clang-tidy` linter. In order to run + +The `evergreen/run_clang_tidy.sh` shell script runs the `clang-tidy` linter. In order to run `clang-tidy` you must have a compilation database (`compile_commands.json` file). Ex: `bash buildscripts/run_clang_tidy.sh` -| Linter | Configuration File(s) | Help Command | Documentation | -| --- | --- | --- | --- | -| `clang-tidy` | `.clang-tidy` | `clang-tidy --help` | [https://clang.llvm.org/extra/clang-tidy/index.html](https://clang.llvm.org/extra/clang-tidy/index.html) | +| Linter | Configuration File(s) | Help Command | Documentation | +| ------------ | --------------------- | ------------------- | -------------------------------------------------------------------------------------------------------- | +| `clang-tidy` | `.clang-tidy` | `clang-tidy --help` | [https://clang.llvm.org/extra/clang-tidy/index.html](https://clang.llvm.org/extra/clang-tidy/index.html) | ### `errorcodes.py` -The `buildscripts/errorcodes.py` script runs a custom error code linter, which verifies that all -assertion codes are distinct. You can see the usage by running the following command: + +The `buildscripts/errorcodes.py` script runs a custom error code linter, which verifies that all +assertion codes are distinct. You can see the usage by running the following command: `buildscripts/errorcodes.py --help`. Ex: `buildscripts/errorcodes.py` ### `quickmongolint.py` -The `buildscripts/quickmongolint.py` script runs a simple MongoDB C++ linter. You can see the usage -by running the following command: `buildscripts/quickmongolint.py --help`. You can take a look at + +The `buildscripts/quickmongolint.py` script runs a simple MongoDB C++ linter. You can see the usage +by running the following command: `buildscripts/quickmongolint.py --help`. You can take a look at `buildscripts/linter/mongolint.py` to better understand the rules for this linter. Ex: `buildscripts/quickmongolint.py lint` ## Javascript Linters -The `buildscripts/eslint.py` wrapper script runs the `eslint` javascript linter. You can see the + +The `buildscripts/eslint.py` wrapper script runs the `eslint` javascript linter. You can see the usage message for the wrapper by running `buildscripts/eslint.py --help`. Ex: `buildscripts/eslint.py lint` -| Linter | Configuration File(s) | Help Command | Documentation | -| --- | --- | --- | --- | +| Linter | Configuration File(s) | Help Command | Documentation | +| -------- | ------------------------------- | --------------- | ------------------------------------------ | | `eslint` | `.eslintrc.yml` `.eslintignore` | `eslint --help` | [https://eslint.org/](https://eslint.org/) | ## Yaml Linters -The `buildscripts/yamllinters.sh` shell script runs the yaml linters. The supported yaml linters -are: `yamllint` & `evergreen-lint`. `evergreen-lint` is a custom MongoDB linter used specifically + +The `buildscripts/yamllinters.sh` shell script runs the yaml linters. The supported yaml linters +are: `yamllint` & `evergreen-lint`. `evergreen-lint` is a custom MongoDB linter used specifically for `evergreen` yaml files. Ex: `bash buildscripts/yamllinters.sh` -| Linter | Configuration File(s) | Help Command | Documentation | -| --- | --- | --- | --- | -| `yamllint` | `etc/yamllint_config.yml` | `yamllint --help` | [https://readthedocs.org/projects/yamllint/](https://readthedocs.org/projects/yamllint/) | -| `evergreen-lint` | `etc/evergreen_lint.yml` | `python -m evergreen_lint --help` | [https://github.com/evergreen-ci/config-linter](https://github.com/evergreen-ci/config-linter) | +| Linter | Configuration File(s) | Help Command | Documentation | +| ---------------- | ------------------------- | --------------------------------- | ---------------------------------------------------------------------------------------------- | +| `yamllint` | `etc/yamllint_config.yml` | `yamllint --help` | [https://readthedocs.org/projects/yamllint/](https://readthedocs.org/projects/yamllint/) | +| `evergreen-lint` | `etc/evergreen_lint.yml` | `python -m evergreen_lint --help` | [https://github.com/evergreen-ci/config-linter](https://github.com/evergreen-ci/config-linter) | ## Python Linters -The `buildscripts/pylinters.py` wrapper script runs the Python linters. You can -see the usage message for the wrapper by running the following command: -`buildscripts/pylinters.py --help`. The following linters are supported: `pylint`, `mypy`, + +The `buildscripts/pylinters.py` wrapper script runs the Python linters. You can +see the usage message for the wrapper by running the following command: +`buildscripts/pylinters.py --help`. The following linters are supported: `pylint`, `mypy`, `pydocstyle` & `yapf`. Ex: `buildscripts/pylinters.py lint` -| Linter | Configuration File(s) | Help Command | Documentation | -| --- | --- | --- | --- | -| `pylint` | `.pylintrc` | `pylint --help` | [https://www.pylint.org/](https://www.pylint.org/) | -| `mypy` | `.mypy.ini` | `mypy --help` | [https://readthedocs.org/projects/mypy/](https://readthedocs.org/projects/mypy/) | -| `pydocstyle` | `.pydocstyle` | `pydocstyle --help` | [https://readthedocs.org/projects/pydocstyle/](https://readthedocs.org/projects/pydocstyle/) | -| `yapf` | `.style.yapf` | `yapf --help` | [https://github.com/google/yapf](https://github.com/google/yapf) | +| Linter | Configuration File(s) | Help Command | Documentation | +| ------------ | --------------------- | ------------------- | -------------------------------------------------------------------------------------------- | +| `pylint` | `.pylintrc` | `pylint --help` | [https://www.pylint.org/](https://www.pylint.org/) | +| `mypy` | `.mypy.ini` | `mypy --help` | [https://readthedocs.org/projects/mypy/](https://readthedocs.org/projects/mypy/) | +| `pydocstyle` | `.pydocstyle` | `pydocstyle --help` | [https://readthedocs.org/projects/pydocstyle/](https://readthedocs.org/projects/pydocstyle/) | +| `yapf` | `.style.yapf` | `yapf --help` | [https://github.com/google/yapf](https://github.com/google/yapf) | ### SCons Linters -`buildscripts/pylinters.py` has the `lint-scons` and `fix-scons` commands to lint -and fix SCons and build system related code. Currently `yapf` is the only + +`buildscripts/pylinters.py` has the `lint-scons` and `fix-scons` commands to lint +and fix SCons and build system related code. Currently `yapf` is the only linter supported for SCons code. ## Using SCons for linting -You can use SCons to run most of the linters listed above via their corresponding Python wrapper -script. SCons also provides the ability to run multiple linters in a single command. At this time, + +You can use SCons to run most of the linters listed above via their corresponding Python wrapper +script. SCons also provides the ability to run multiple linters in a single command. At this time, SCons does not support `clang-tidy` or `buildscripts/yamllinters.sh` Here are some examples: -| SCons Target | Linter(s) | Example | -| --- | --- | --- | -| `lint` | `clang-format` `errorcodes.py` `quickmongolint.py` `eslint` `pylint` `mypy` `pydocstyle` `yapf` | `buildscripts/scons.py lint` | -| `lint-fast` | `clang-format` `errorcodes.py` `eslint` `pylint` `mypy` `pydocstyle` `yapf` | `buildscripts/scons.py lint-fast` | -| `lint-clang-format` | `clang-format` | `buildscripts/scons.py lint-clang-format` | -| `lint-errorcodes` | `errorcodes.py` | `buildscripts/scons.py lint-errorcodes` | -| `lint-lint.py` | `quickmongolint.py` | `buildscripts/scons.py lint-lint.py` | -| `lint-eslint` | `eslint` | `buildscripts/scons.py lint-eslint` | -| `lint-pylinters` | `pylint` `mypy` `pydocstyle` `yapf` | `buildscripts/scons.py lint-pylinters` | -| `lint-sconslinters` | `yapf` | `buildscripts/scons.py lint-sconslinters` | +| SCons Target | Linter(s) | Example | +| ------------------- | ----------------------------------------------------------------------------------------------- | ----------------------------------------- | +| `lint` | `clang-format` `errorcodes.py` `quickmongolint.py` `eslint` `pylint` `mypy` `pydocstyle` `yapf` | `buildscripts/scons.py lint` | +| `lint-fast` | `clang-format` `errorcodes.py` `eslint` `pylint` `mypy` `pydocstyle` `yapf` | `buildscripts/scons.py lint-fast` | +| `lint-clang-format` | `clang-format` | `buildscripts/scons.py lint-clang-format` | +| `lint-errorcodes` | `errorcodes.py` | `buildscripts/scons.py lint-errorcodes` | +| `lint-lint.py` | `quickmongolint.py` | `buildscripts/scons.py lint-lint.py` | +| `lint-eslint` | `eslint` | `buildscripts/scons.py lint-eslint` | +| `lint-pylinters` | `pylint` `mypy` `pydocstyle` `yapf` | `buildscripts/scons.py lint-pylinters` | +| `lint-sconslinters` | `yapf` | `buildscripts/scons.py lint-sconslinters` | diff --git a/docs/load_balancer_support.md b/docs/load_balancer_support.md index cfb9a6b65df..7718a829c43 100644 --- a/docs/load_balancer_support.md +++ b/docs/load_balancer_support.md @@ -4,15 +4,16 @@ endpoints behind load balancers requires proper configuration of the load balancers, `mongos`, and any drivers or shells used to connect to the database. Three conditions must be fulfilled for `mongos` to be used behind a load balancer: -* `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation. -This option causes `mongos` to open a second port that expects _only_ load balanced connections. All connections made from load -balancers _must_ be made over this port, and no regular connections may be made over this port. -* The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header -at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy -protocol standard. -* The connection string used to establish the `mongos` connection must set the `loadBalanced` option, -e.g., when connecting to a local `mongos` instance, if the `loadBalancerPort` server parameter was set to 20100, the -connection string must be of the form `"mongodb://localhost:20100/?loadBalanced=true"`. + +- `mongos` must be configured with the [MongoDB Server Parameter](https://docs.mongodb.com/manual/reference/parameters/) `loadBalancerPort` whose value can be specified at program start in any of the ways mentioned in the server parameter documentation. + This option causes `mongos` to open a second port that expects _only_ load balanced connections. All connections made from load + balancers _must_ be made over this port, and no regular connections may be made over this port. +- The L4 load balancer _must_ be configured to emit a [proxy protocol][proxy-protocol-url] header + at the [start of its connection stream](https://github.com/mongodb/mongo/commit/3a18d295d22b377cc7bc4c97bd3b6884d065bb85). `mongos` [supports](https://github.com/mongodb/mongo/commit/786482da93c3e5e58b1c690cb060f00c60864f69) both version 1 and version 2 of the proxy + protocol standard. +- The connection string used to establish the `mongos` connection must set the `loadBalanced` option, + e.g., when connecting to a local `mongos` instance, if the `loadBalancerPort` server parameter was set to 20100, the + connection string must be of the form `"mongodb://localhost:20100/?loadBalanced=true"`. `mongos` will emit appropiate error messages on connection attempts if these requirements are not met. diff --git a/docs/logging.md b/docs/logging.md index b2f7d01c440..b0fae13706a 100644 --- a/docs/logging.md +++ b/docs/logging.md @@ -33,15 +33,15 @@ repetition possible, and shares attribute names with other related log lines. The `msg` field predicates a reader's interpretation of the log line. It should be crafted with care and attention. -* Concisely describe what the log line is reporting, providing enough - context necessary for interpreting attribute field names and values -* Capitalize the first letter, as in a sentence -* Avoid unnecessary punctuation, but punctuate between sentences if using - multiple sentences -* Do not conclude with punctuation -* You may occasionally encounter `msg` strings containing fmt-style - `{expr}` braces. These are legacy artifacts and should be rephrased - according to these guidelines. +- Concisely describe what the log line is reporting, providing enough + context necessary for interpreting attribute field names and values +- Capitalize the first letter, as in a sentence +- Avoid unnecessary punctuation, but punctuate between sentences if using + multiple sentences +- Do not conclude with punctuation +- You may occasionally encounter `msg` strings containing fmt-style + `{expr}` braces. These are legacy artifacts and should be rephrased + according to these guidelines. ### Attributes (fields in the attr subdocument) @@ -57,10 +57,10 @@ For `attr` field names, do the following: The bar for understanding should be: -* Someone with reasonable understanding of mongod behavior should understand - immediately what is being logged -* Someone with reasonable troubleshooting skill should be able to extract doc- - or code-searchable phrases to learn about what is being logged +- Someone with reasonable understanding of mongod behavior should understand + immediately what is being logged +- Someone with reasonable troubleshooting skill should be able to extract doc- + or code-searchable phrases to learn about what is being logged #### Precisely describe values and units @@ -77,46 +77,44 @@ attribute name. Alternatively, specify an `attr` name of “durationMillis” and provide the number of milliseconds as an integer type. -__Importantly__: downstream analysis tools will rely on this convention, as a +**Importantly**: downstream analysis tools will rely on this convention, as a replacement for the "[0-9]+ms$" format of prior logs. #### Use certain specific terms whenever possible When logging the below information, do so with these specific terms: -* __namespace__ - when logging a value of the form - "\.\". Do not use "collection" or abbreviate to "ns" -* __db__ - instead of "database" -* __error__ - when an error occurs, instead of "status". Use this for objects - of type Status and DBException -* __reason__ - to provide rationale for an event/action when "error" isn't - appropriate +- **namespace** - when logging a value of the form + "\.\". Do not use "collection" or abbreviate to "ns" +- **db** - instead of "database" +- **error** - when an error occurs, instead of "status". Use this for objects + of type Status and DBException +- **reason** - to provide rationale for an event/action when "error" isn't + appropriate ### Examples -- Example 1: +- Example 1: - LOGV2(1041, "Transition to PRIMARY complete"); + LOGV2(1041, "Transition to PRIMARY complete"); - JSON Output: + JSON Output: - { ... , "id": 1041, "msg": "Transition to PRIMARY complete", "attr": {} } + { ... , "id": 1041, "msg": "Transition to PRIMARY complete", "attr": {} } -- Example 2: +- Example 2: - LOGV2(1042, "Slow query", "duration"_attr = getDurationMillis()); + LOGV2(1042, "Slow query", "duration"_attr = getDurationMillis()); - JSON Output: + JSON Output: - { ..., "id": 1042, "msg": "Slow query", "attr": { "durationMillis": 1000 } } + { ..., "id": 1042, "msg": "Slow query", "attr": { "durationMillis": 1000 } } +- For adding STL containers as dynamic attributes, see + [RollbackImpl::\_summarizeRollback][_summarizeRollback] -- For adding STL containers as dynamic attributes, see - [RollbackImpl::_summarizeRollback][_summarizeRollback] - -- For sharing a string between a log line and a status see [this section of - InitialSyncer::_lastOplogEntryFetcherCallbackForStopTimestamp][ - _lastOplogEntryFetcherCallbackForStopTimestamp] +- For sharing a string between a log line and a status see [this section of + InitialSyncer::\_lastOplogEntryFetcherCallbackForStopTimestamp][ _lastOplogEntryFetcherCallbackForStopTimestamp] # Basic Usage @@ -126,7 +124,7 @@ The log system is made available with the following header: The macro `MONGO_LOGV2_DEFAULT_COMPONENT` is expanded by all logging macros. This configuration macro must expand at their point of use to a `LogComponent` -expression, which is implicitly attached to the emitted message. It is +expression, which is implicitly attached to the emitted message. It is conventionally defined near the top of a `.cpp` file after headers are included, and before any logging macros are invoked. Example: @@ -175,16 +173,16 @@ can use this pattern: ##### Examples -- No attributes. +- No attributes. - LOGV2(1000, "Logging event, no replacement fields is OK"); + LOGV2(1000, "Logging event, no replacement fields is OK"); -- Some attributes. +- Some attributes. - LOGV2(1002, - "Replication state change", - "from"_attr = getOldState(), - "to"_attr = getNewState()); + LOGV2(1002, + "Replication state change", + "from"_attr = getOldState(), + "to"_attr = getNewState()); ### Log Component @@ -206,17 +204,17 @@ log statement. to different severities there are separate logging macros to be used, they all take paramaters like `LOGV2`: -* `LOGV2_WARNING` -* `LOGV2_ERROR` -* `LOGV2_FATAL` -* `LOGV2_FATAL_NOTRACE` -* `LOGV2_FATAL_CONTINUE` +- `LOGV2_WARNING` +- `LOGV2_ERROR` +- `LOGV2_FATAL` +- `LOGV2_FATAL_NOTRACE` +- `LOGV2_FATAL_CONTINUE` There is also variations that take `LogOptions` if needed: -* `LOGV2_WARNING_OPTIONS` -* `LOGV2_ERROR_OPTIONS` -* `LOGV2_FATAL_OPTIONS` +- `LOGV2_WARNING_OPTIONS` +- `LOGV2_ERROR_OPTIONS` +- `LOGV2_FATAL_OPTIONS` Fatal level log statements using `LOGV2_FATAL` perform `fassert` after logging, using the provided ID as assert id. `LOGV2_FATAL_NOTRACE` perform @@ -288,7 +286,7 @@ When finished, it is logged using the regular logging API but the `_attr` literals with the `DynamicAttributes` is not supported. When using the `DynamicAttributes` you need to be careful about parameter -lifetimes. The `DynamicAttributes` binds attributes *by reference* and the +lifetimes. The `DynamicAttributes` binds attributes _by reference_ and the reference must be valid when passing the `DynamicAttributes` to the log statement. @@ -309,26 +307,26 @@ statement. Many basic types have built in support: -* Boolean -* Integral types - * Single `char` is logged as integer -* Enums - * Logged as their underlying integral type -* Floating point types - * `long double` is prohibited -* String types - * `std::string` - * `StringData` - * `const char*` -* Duration types - * Special formatting, see below -* BSON types - * `BSONObj` - * `BSONArray` - * `BSONElement` -* BSON appendable types - * `BSONObjBuilder::append` overload available -* `boost::optional` of any loggable type `T` +- Boolean +- Integral types + - Single `char` is logged as integer +- Enums + - Logged as their underlying integral type +- Floating point types + - `long double` is prohibited +- String types + - `std::string` + - `StringData` + - `const char*` +- Duration types + - Special formatting, see below +- BSON types + - `BSONObj` + - `BSONArray` + - `BSONElement` +- BSON appendable types + - `BSONObjBuilder::append` overload available +- `boost::optional` of any loggable type `T` ### User-defined types @@ -338,16 +336,16 @@ that the log system can bind to. The system binds and uses serialization functions by looking for functions in the following priority order: -- Structured serialization functions - - `void x.serialize(BSONObjBuilder*) const` (member) - - `BSONObj x.toBSON() const` (member) - - `BSONArray x.toBSONArray() const` (member) - - `toBSON(x)` (non-member) -- Stringification functions - - `toStringForLogging(x)` (non-member) - - `x.serialize(&fmtMemoryBuffer)` (member) - - `x.toString() ` (member) - - `toString(x)` (non-member) +- Structured serialization functions + - `void x.serialize(BSONObjBuilder*) const` (member) + - `BSONObj x.toBSON() const` (member) + - `BSONArray x.toBSONArray() const` (member) + - `toBSON(x)` (non-member) +- Stringification functions + - `toStringForLogging(x)` (non-member) + - `x.serialize(&fmtMemoryBuffer)` (member) + - `x.toString() ` (member) + - `toString(x)` (non-member) Enums cannot have member functions, but they will still try to bind to the `toStringForLogging(e)` or `toString(e)` non-members. If neither is available, @@ -363,7 +361,7 @@ logging perhaps because it's needed for other non-logging formatting. Usually a `toString` (member or nonmember) is a sufficient customization point and should be preferred as a canonical stringification of the object. -*NOTE: No `operator<<` overload is used even if available* +_NOTE: No `operator<<` overload is used even if available_ ##### Example @@ -400,8 +398,8 @@ is a JSON object where the field names are the key. Ranges is loggable via helpers to indicate what type of range it is -* `seqLog(begin, end)` -* `mapLog(begin, end)` +- `seqLog(begin, end)` +- `mapLog(begin, end)` seqLog indicates that it is a sequential range where the iterators point to loggable value directly. @@ -411,28 +409,28 @@ the iterators point to a key-value pair. ##### Examples -- Logging a sequence: +- Logging a sequence: - std::array arrayOfInts = ...; - LOGV2(1010, - "Log container directly", - "values"_attr = arrayOfInts); - LOGV2(1011, - "Log iterator range", - "values"_attr = seqLog(arrayOfInts.begin(), arrayOfInts.end()); - LOGV2(1012, - "Log first five elements", - "values"_attr = seqLog(arrayOfInts.data(), arrayOfInts.data() + 5); + std::array arrayOfInts = ...; + LOGV2(1010, + "Log container directly", + "values"_attr = arrayOfInts); + LOGV2(1011, + "Log iterator range", + "values"_attr = seqLog(arrayOfInts.begin(), arrayOfInts.end()); + LOGV2(1012, + "Log first five elements", + "values"_attr = seqLog(arrayOfInts.data(), arrayOfInts.data() + 5); -- Logging a map-like container: +- Logging a map-like container: - StringMap bsonMap = ...; - LOGV2(1013, - "Log map directly", - "values"_attr = bsonMap); - LOGV2(1014, - "Log map iterator range", - "values"_attr = mapLog(bsonMap.begin(), bsonMap.end()); + StringMap bsonMap = ...; + LOGV2(1013, + "Log map directly", + "values"_attr = bsonMap); + LOGV2(1014, + "Log map iterator range", + "values"_attr = mapLog(bsonMap.begin(), bsonMap.end()); #### Containers and `uint64_t` @@ -457,7 +455,6 @@ type. auto asDecimal128 = [](uint64_t i) { return Decimal128(i); }; auto asString = [](uint64_t i) { return std::to_string(i); }; - ### Duration types Duration types have special formatting to match existing practices in the @@ -474,27 +471,26 @@ formatted as a BSON object. ##### Examples -- "duration" attribute +- "duration" attribute C++ expression: - "duration"_attr = Milliseconds(10) + "duration"_attr = Milliseconds(10) JSON format: - "durationMillis": 10 + "durationMillis": 10 -- Container of Duration objects +- Container of Duration objects C++ expression: - "samples"_attr = std::vector{Nanoseconds(200), - Nanoseconds(400)} + "samples"_attr = std::vector{Nanoseconds(200), + Nanoseconds(400)} JSON format: - "samples": [{"durationNanos": 200}, {"durationNanos": 400}] - + "samples": [{"durationNanos": 200}, {"durationNanos": 400}] # Attribute naming abstraction @@ -546,7 +542,6 @@ functions implemented. LOGV2(2002, "Statement", logAttrs(t)); LOGV2(2002, "Statement", "name"_attr=t.name, "data"_attr=t.data); - ## Handling temporary lifetime with multiple attributes To avoid lifetime issues (log attributes bind their values by reference) it is @@ -604,7 +599,6 @@ The assertion reason string will be a plain text formatted log (replacement fields filled in format-string). If replacement fields are not provided in the message string, attribute values will be missing from the assertion message. - ##### Examples LOGV2_ERROR_OPTIONS(1050000, @@ -628,7 +622,6 @@ Would emit a `uassert` after performing the log that is equivalent to: uasserted(ErrorCodes::DataCorruptionDetected, "Data corruption detected for RecordId(123456)"); - ## Unstructured logging for local development To make it easier to use the log system for tracing in local development, there @@ -698,8 +691,8 @@ Output: } } +--- ------ [relaxed_json_2]: https://github.com/mongodb/specifications/blob/master/source/extended-json.rst [_lastOplogEntryFetcherCallbackForStopTimestamp]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/initial_syncer.cpp#L1500-L1512 [_summarizeRollback]: https://github.com/mongodb/mongo/blob/13caf3c499a22c2274bd533043eb7e06e6f8e8a4/src/mongo/db/repl/rollback_impl.cpp#L1263-L1305 diff --git a/docs/parsing_stack_traces.md b/docs/parsing_stack_traces.md index 84dd8b3cd34..c606c5ff5a1 100644 --- a/docs/parsing_stack_traces.md +++ b/docs/parsing_stack_traces.md @@ -8,12 +8,10 @@ addr2line -e mongod -ifC ``` - ## `c++filt` Use [`c++filt`][2] to demangle function names by pasting the whole stack trace to stdin. - ## Finding the Right Binary To find the correct binary for a specific log you need to: diff --git a/docs/poetry_execution.md b/docs/poetry_execution.md index e4070720934..40d7520e78b 100644 --- a/docs/poetry_execution.md +++ b/docs/poetry_execution.md @@ -1,15 +1,18 @@ # Poetry Project Execution ## Project Impetus -We frequently encounter Python errors that are caused by a python dependency author updating their package that is backward breaking. The following tickets are a few examples of this happening: + +We frequently encounter Python errors that are caused by a python dependency author updating their package that is backward breaking. The following tickets are a few examples of this happening: [SERVER-79126](https://jira.mongodb.org/browse/SERVER-79126), [SERVER-79798](https://jira.mongodb.org/browse/SERVER-79798), [SERVER-53348](https://jira.mongodb.org/browse/SERVER-53348), [SERVER-57036](https://jira.mongodb.org/browse/SERVER-57036), [SERVER-44579](https://jira.mongodb.org/browse/SERVER-44579), [SERVER-70845](https://jira.mongodb.org/browse/SERVER-70845), [SERVER-63974](https://jira.mongodb.org/browse/SERVER-63974), [SERVER-61791](https://jira.mongodb.org/browse/SERVER-61791), and [SERVER-60950](https://jira.mongodb.org/browse/SERVER-60950). We have always known this was a problem and have known there was a way to fix it. We finally had the bandwidth to tackle this problem. ## Project Prework + First, we wanted to test out using poetry so we converted mongo-container project to use poetry [SERVER-76974](https://jira.mongodb.org/browse/SERVER-76974). This showed promise and we considered this a green light to move forward on converting the server python to use poetry. Before we could start the project we had to upgrade python to a version that was not EoL. This work is captured in [SERVER-72262](https://jira.mongodb.org/browse/SERVER-72262). We upgraded python to 3.10 on every system except windows. Windows could not be upgraded due to a test problem relating to some cipher suites [SERVER-79172](https://jira.mongodb.org/browse/SERVER-79172). ## Conversion to Poetry + After the prework was done we wrote, tested, and merged [SERVER-76751](https://jira.mongodb.org/browse/SERVER-76751) which is converting the mongo python dependencies to poetry. This ticket had an absurd amount of dependencies and required a significant amount of patch builds. The total number of changes was pretty small but it affected a lot of different projects. Knowing there was a lot this touched we expected to see some bugs and were quick to try to fix them. Some of these were caught before merge and some were caught after. diff --git a/docs/primary_only_service.md b/docs/primary_only_service.md index 269af5671c0..4140e8e2bad 100644 --- a/docs/primary_only_service.md +++ b/docs/primary_only_service.md @@ -13,25 +13,24 @@ There are three main classes/interfaces that make up the PrimaryOnlyService mach ### PrimaryOnlyServiceRegistry The PrimaryOnlyServiceRegistry is a singleton that is installed as a decoration on the -ServiceContext at startup and lives for the lifetime of the mongod process. During mongod global +ServiceContext at startup and lives for the lifetime of the mongod process. During mongod global startup, all PrimaryOnlyServices must be registered against the PrimaryOnlyServiceRegistry before the ReplicationCoordinator is started up (as it is the ReplicationCoordinator startup that starts up the registered PrimaryOnlyServices). Specific PrimaryOnlyServices can be looked up from the registry at runtime, and are handed out by raw pointer, which is safe since the set of registered -PrimaryOnlyServices does not change during runtime. The PrimaryOnlyServiceRegistry is itself a +PrimaryOnlyServices does not change during runtime. The PrimaryOnlyServiceRegistry is itself a [ReplicaSetAwareService](../src/mongo/db/repl/README.md#ReplicaSetAwareService-interface), which is how it receives notifications about changes in and out of Primary state. ### PrimaryOnlyService -The PrimaryOnlyService interface is used to define a new Primary Only Service. A PrimaryOnlyService +The PrimaryOnlyService interface is used to define a new Primary Only Service. A PrimaryOnlyService is a grouping of tasks (Instances) that run only when the node is Primary and are resumed after -failover. Each PrimaryOnlyService must declare a unique, replicated collection (most likely in the +failover. Each PrimaryOnlyService must declare a unique, replicated collection (most likely in the admin or config databases), where the state documents for all Instances of the service will be -persisted. At stepUp, each PrimaryOnlyService will create and launch Instance objects for each +persisted. At stepUp, each PrimaryOnlyService will create and launch Instance objects for each document found in this collection. This is how PrimaryOnlyService tasks get resumed after failover. - ### PrimaryOnlyService::Instance/TypedInstance The PrimaryOnlyService::Instance interface is used to contain the state and core logic for running a @@ -39,26 +38,25 @@ single task belonging to a PrimaryOnlyService. The Instance interface includes a method which is provided an executor which is used to run all work that is done on behalf of the Instance. Implementations should not extend PrimaryOnlyService::Instance directly, instead they should extend PrimaryOnlyService::TypedInstance, which allows individual Instances to be looked up -and returned as pointers to the proper Instance sub-type. The InstanceID for an Instance is the _id +and returned as pointers to the proper Instance sub-type. The InstanceID for an Instance is the \_id field of its state document. - ## Defining a new PrimaryOnlyService To define a new PrimaryOnlyService one must add corresponding subclasses of both PrimaryOnlyService -and PrimaryOnlyService::TypedInstance. The PrimaryOnlyService subclass just exists to specify what +and PrimaryOnlyService::TypedInstance. The PrimaryOnlyService subclass just exists to specify what collection state documents for this service are stored in, and to hand out corresponding Instances -of the proper type. Most of the work of a new PrimaryOnlyService will be implemented in the +of the proper type. Most of the work of a new PrimaryOnlyService will be implemented in the PrimaryOnlyService::Instance subclass. PrimaryOnlyService::Instance subclasses will be responsible for running the work they need to perform to complete their task, as well as for managing and synchronizing their own in-memory and on-disk state. No part of the PrimaryOnlyService **machinery** -ever performs writes to the PrimaryOnlyService state document collections. All writes to a given +ever performs writes to the PrimaryOnlyService state document collections. All writes to a given Instance's state document (including creating it initially and deleting it when the work has been -completed) are performed by Instance implementations. This means that for the majority of +completed) are performed by Instance implementations. This means that for the majority of PrimaryOnlyServices, the first step of its Instance's run() method will be to insert an initial state document into the state document collection, to ensure that the Instance is now persisted and -will be resumed after failover. When an Instance is resumed after failover, it is provided the -current version of the state document as it exists in the state document collection. That document +will be resumed after failover. When an Instance is resumed after failover, it is provided the +current version of the state document as it exists in the state document collection. That document can be used to rebuild the in-memory state for this Instance so that when run() is called it knows what state it is in and thus what work still needs to be performed, and what work has already been completed by the previous Primary. @@ -66,7 +64,6 @@ completed by the previous Primary. To see an example bare-bones PrimaryOnlyService implementation to use as a reference, check out the TestService defined in this unit test: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/primary_only_service_test.cpp - ## Behavior during state transitions At stepUp, each PrimaryOnlyService queries its state document collection, and for each document @@ -83,12 +80,12 @@ the same time on the same node. ### Interrupting Instances at stepDown At stepDown, there are 3 main ways that Instances are interrupted and we guarantee that no more work -is performed on behalf of any PrimaryOnlyServices. The first is that the executor provided to each +is performed on behalf of any PrimaryOnlyServices. The first is that the executor provided to each Instance's run() method gets shut down, preventing any more work from being scheduled on behalf of -that Instance. The second is that all OperationContexts created on threads (Clients) that are part +that Instance. The second is that all OperationContexts created on threads (Clients) that are part of an Executor owned by a PrimaryOnlyService get interrupted. The third is that each individual Instance is explicitly interrupted, so that it can unblock any work running on threads that are -*not* a part of an executor owned by the PrimaryOnlyService that are dependent on that Instance +_not_ a part of an executor owned by the PrimaryOnlyService that are dependent on that Instance signaling them (e.g. commands that are waiting on the Instance to reach a certain state). Currently this happens via a call to an interrupt() method that each Instance must override, but in the future this is likely to change to signaling a CancellationToken owned by the Instance instead. @@ -96,9 +93,9 @@ this is likely to change to signaling a CancellationToken owned by the Instance ## Instance lifetime Instances are held by shared_ptr in their parent PrimaryOnlyService. Each PrimaryOnlyService -releases all Instance shared_ptrs it owns on stepDown. Additionally, a PrimaryOnlyService will +releases all Instance shared_ptrs it owns on stepDown. Additionally, a PrimaryOnlyService will release an Instance shared_ptr when the state document for that Instance is deleted (via an -OpObserver). Since generally speaking it is logic from an Instance's run() method that will be +OpObserver). Since generally speaking it is logic from an Instance's run() method that will be responsible for deleting its state document, such logic needs to be careful as the moment the state document is deleted, the corresponding PrimaryOnlyService is no longer keeping that Instance alive. If an Instance has any additional logic or internal state to update after deleting its state diff --git a/docs/security_guide.md b/docs/security_guide.md index 798bf8c626e..ea34acb7f26 100644 --- a/docs/security_guide.md +++ b/docs/security_guide.md @@ -1,7 +1,7 @@ # Links to Security Architecture Guide -- [Identity and Access Management](https://github.com/mongodb/mongo/blob/master/src/mongo/db/auth/README.md) -- [TLS](https://github.com/mongodb/mongo/blob/master/src/mongo/util/net/README.md) -- [FTDC](https://github.com/mongodb/mongo/blob/master/src/mongo/db/ftdc/README.md) -- [LibFuzzer](https://github.com/mongodb/mongo/blob/master/docs/libfuzzer.md) -- [SELinux](https://github.com/mongodb/mongodb-selinux/blob/master/README.md) +- [Identity and Access Management](https://github.com/mongodb/mongo/blob/master/src/mongo/db/auth/README.md) +- [TLS](https://github.com/mongodb/mongo/blob/master/src/mongo/util/net/README.md) +- [FTDC](https://github.com/mongodb/mongo/blob/master/src/mongo/db/ftdc/README.md) +- [LibFuzzer](https://github.com/mongodb/mongo/blob/master/docs/libfuzzer.md) +- [SELinux](https://github.com/mongodb/mongodb-selinux/blob/master/README.md) diff --git a/docs/server-parameters.md b/docs/server-parameters.md index 7bb10a4f7da..cad77f82965 100644 --- a/docs/server-parameters.md +++ b/docs/server-parameters.md @@ -1,15 +1,18 @@ # Server Parameters -Mongo database and router servers (i.e., `mongod` and `mongos`) provide a number of configuration -options through server parameters. These parameters allow users to configure the behavior of the -server at startup or runtime. For instance, `logLevel` is a server parameter that configures the + +Mongo database and router servers (i.e., `mongod` and `mongos`) provide a number of configuration +options through server parameters. These parameters allow users to configure the behavior of the +server at startup or runtime. For instance, `logLevel` is a server parameter that configures the logging verbosity. ## How to define new parameters -Parameters are defined by the elements of the `server_parameters` section of an IDL file. The IDL -machinery will parse these files and generate C++ code, and corresponding header files where + +Parameters are defined by the elements of the `server_parameters` section of an IDL file. The IDL +machinery will parse these files and generate C++ code, and corresponding header files where appropriate. The generated code will self-register server parameters with the runtime. Consider `logLevel` parameter from [`parameters.idl`][parameters.idl] for example: + ```yaml ... server_parameters: @@ -23,126 +26,132 @@ server_parameters: ... ``` -This defines a server parameter called `logLevel`, which is settable at startup or at runtime, and -declares a C++ class for the parameter (i.e., `LogLevelServerParameter`). Refer to the +This defines a server parameter called `logLevel`, which is settable at startup or at runtime, and +declares a C++ class for the parameter (i.e., `LogLevelServerParameter`). Refer to the [Server Parameters Syntax](#server-parameters-syntax) documentation for the complete IDL syntax. ## How to change a defined parameter -Users can set or modify a server parameter at startup and/or runtime, depending on the value -specified for `set_at`. For instance, `logLevel` may be set at both startup and runtime, as + +Users can set or modify a server parameter at startup and/or runtime, depending on the value +specified for `set_at`. For instance, `logLevel` may be set at both startup and runtime, as indicated by `set_at` (see the above code snippet). -At startup, server parameters may be set using the `--setParameter` command line option. -At runtime, the `setParameter` command may be used to modify server parameters. +At startup, server parameters may be set using the `--setParameter` command line option. +At runtime, the `setParameter` command may be used to modify server parameters. See the [`setParameter` documentation][set-parameter] for details. ## How to get the value provided for a parameter -Server developers may retrieve the value of a server parameter by: -* Accessing the C++ expression that corresponds to the parameter of interest. For example, reading -from [`serverGlobalParams.quiet`][quiet-param] returns the current value for `quiet`. -* Registering a callback to be notified about changes to the server parameter (e.g., -[`onUpdateFTDCFileSize`][ftdc-file-size-param] for `diagnosticDataCollectionFileSizeMB`). -Database users may use the [`getParameter`][get-parameter] command to query the current value for a +Server developers may retrieve the value of a server parameter by: + +- Accessing the C++ expression that corresponds to the parameter of interest. For example, reading + from [`serverGlobalParams.quiet`][quiet-param] returns the current value for `quiet`. +- Registering a callback to be notified about changes to the server parameter (e.g., + [`onUpdateFTDCFileSize`][ftdc-file-size-param] for `diagnosticDataCollectionFileSizeMB`). + +Database users may use the [`getParameter`][get-parameter] command to query the current value for a server parameter. ## Server Parameters Syntax -The following shows the IDL syntax for declaring server parameters. Field types are denoted in each -section. For details regarding `string or expression map`, see that section + +The following shows the IDL syntax for declaring server parameters. Field types are denoted in each +section. For details regarding `string or expression map`, see that section [below](#string-or-expression-map). ```yaml server_parameters: - "nameOfParameter": # string - set_at: # string or list of strings - description: # string - cpp_vartype: # string - cpp_varname: # string - cpp_class: # string (name field) or map - name: # string - data: # string - override_ctor: # bool - override_set: # bool - override_validate: # bool - redact: # bool - test_only: # bool - default: # string or expression map - deprecated_name: # string or list of strings - on_update: # string - condition: - expr: # C++ bool expression, evaluated at run time - constexpr: # C++ bool expression, evaluated at compilation time - preprocessor: # C preprocessor condition - min_fcv: # string - feature_flag: # string - validator: # Map containing one or more of the below - lt: # string or expression map - gt: # string or expression map - lte: # string or expression map - gte: # string or expression map - callback: # string + "nameOfParameter": # string + set_at: # string or list of strings + description: # string + cpp_vartype: # string + cpp_varname: # string + cpp_class: # string (name field) or map + name: # string + data: # string + override_ctor: # bool + override_set: # bool + override_validate: # bool + redact: # bool + test_only: # bool + default: # string or expression map + deprecated_name: # string or list of strings + on_update: # string + condition: + expr: # C++ bool expression, evaluated at run time + constexpr: # C++ bool expression, evaluated at compilation time + preprocessor: # C preprocessor condition + min_fcv: # string + feature_flag: # string + validator: # Map containing one or more of the below + lt: # string or expression map + gt: # string or expression map + lte: # string or expression map + gte: # string or expression map + callback: # string ``` -Each entry in the `server_parameters` map represents one server parameter. The name of the parameter + +Each entry in the `server_parameters` map represents one server parameter. The name of the parameter must be unique across the server instance. More information on the specific fields: -* `set_at` (required): Must contain the value `startup`, `runtime`, [`startup`, `runtime`], or -`cluster`. If `runtime` is specified along with `cpp_varname`, then `decltype(cpp_varname)` must -refer to a thread-safe storage type, specifically: `AtomicWord`, `AtomicDouble`, `std::atomic`, -or `boost::synchronized`. Parameters declared as `cluster` can only be set at runtime and exhibit -numerous differences. See [Cluster Server Parameters](cluster-server-parameters) below. +- `set_at` (required): Must contain the value `startup`, `runtime`, [`startup`, `runtime`], or + `cluster`. If `runtime` is specified along with `cpp_varname`, then `decltype(cpp_varname)` must + refer to a thread-safe storage type, specifically: `AtomicWord`, `AtomicDouble`, `std::atomic`, + or `boost::synchronized`. Parameters declared as `cluster` can only be set at runtime and exhibit + numerous differences. See [Cluster Server Parameters](cluster-server-parameters) below. -* `description` (required): Free-form text field currently used only for commenting the generated C++ -code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or other -programmatic and potentially user-facing purposes. +- `description` (required): Free-form text field currently used only for commenting the generated C++ + code. Future uses may preserve this value for a possible `{listSetParameters:1}` command or other + programmatic and potentially user-facing purposes. -* `cpp_vartype`: Declares the full storage type. If `cpp_vartype` is not defined, it may be inferred -from the C++ variable referenced by `cpp_varname`. +- `cpp_vartype`: Declares the full storage type. If `cpp_vartype` is not defined, it may be inferred + from the C++ variable referenced by `cpp_varname`. -* `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or reading the -server parameter. If defined together with `cpp_vartype`, the storage will be declared as a global -variable, and externed in the generated header file. If defined alone, a variable of this name will -assume to have been declared and defined by the implementer, and its type will be automatically -inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be specified. +- `cpp_varname`: Declares the underlying variable or C++ `struct` member to use when setting or reading the + server parameter. If defined together with `cpp_vartype`, the storage will be declared as a global + variable, and externed in the generated header file. If defined alone, a variable of this name will + assume to have been declared and defined by the implementer, and its type will be automatically + inferred at compile time. If `cpp_varname` is not defined, then `cpp_class` must be specified. -* `cpp_class`: Declares a custom `ServerParameter` class in the generated header using the provided -string, or the name field in the associated map. The declared class will require an implementation -of `setFromString()`, and optionally `set()`, `append()`, and a constructor. -See [Specialized Server Parameters](#specialized-server-parameters) below. +- `cpp_class`: Declares a custom `ServerParameter` class in the generated header using the provided + string, or the name field in the associated map. The declared class will require an implementation + of `setFromString()`, and optionally `set()`, `append()`, and a constructor. + See [Specialized Server Parameters](#specialized-server-parameters) below. -* `default`: String or expression map representation of the initial value. +- `default`: String or expression map representation of the initial value. -* `redact`: Set to `true` to replace values of this setting with placeholders (e.g., for passwords). +- `redact`: Set to `true` to replace values of this setting with placeholders (e.g., for passwords). -* `test_only`: Set to `true` to disable this set parameter if `enableTestCommands` is not specified. +- `test_only`: Set to `true` to disable this set parameter if `enableTestCommands` is not specified. -* `deprecated_name`: One or more names which can be used with the specified setting and underlying -storage. Reading or writing a setting using this name will result in a warning in the server log. +- `deprecated_name`: One or more names which can be used with the specified setting and underlying + storage. Reading or writing a setting using this name will result in a warning in the server log. -* `on_update`: C++ callback invoked after all validation rules have completed successfully and the -new value has been stored. Prototype: `Status(const cpp_vartype&);` +- `on_update`: C++ callback invoked after all validation rules have completed successfully and the + new value has been stored. Prototype: `Status(const cpp_vartype&);` -* `condition`: Up to five conditional rules for deciding whether or not to apply this server -parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`. If -no provided setting evaluates to `false`, the server parameter will be registered. `feature_flag` and -`min_fcv` are evaluated after the parameter is registered, and instead affect whether the parameter -is enabled. `min_fcv` is a string of the form `X.Y`, representing the minimum FCV version for which -this parameter should be enabled. `feature_flag` is the name of a feature flag variable upon which -this server parameter depends -- if the feature flag is disabled, this parameter will be disabled. -`feature_flag` should be removed when all other instances of that feature flag are deleted, which -typically is done after the next LTS version of the server is branched. `min_fcv` should be removed -after it is no longer possible to downgrade to a FCV lower than that version - this occurs when the -next LTS version of the server is branched. +- `condition`: Up to five conditional rules for deciding whether or not to apply this server + parameter. `preprocessor` will be evaluated first, followed by `constexpr`, then finally `expr`. If + no provided setting evaluates to `false`, the server parameter will be registered. `feature_flag` and + `min_fcv` are evaluated after the parameter is registered, and instead affect whether the parameter + is enabled. `min_fcv` is a string of the form `X.Y`, representing the minimum FCV version for which + this parameter should be enabled. `feature_flag` is the name of a feature flag variable upon which + this server parameter depends -- if the feature flag is disabled, this parameter will be disabled. + `feature_flag` should be removed when all other instances of that feature flag are deleted, which + typically is done after the next LTS version of the server is branched. `min_fcv` should be removed + after it is no longer possible to downgrade to a FCV lower than that version - this occurs when the + next LTS version of the server is branched. -* `validator`: Zero or many validation rules to impose on the setting. All specified rules must pass -to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric limits -or expression maps which evaluate to numeric values. For all other validation cases, specify -callback as a C++ function or static method. Note that validation rules (including callback) may run -in any order. To perform an action after all validation rules have completed, `on_update` should be -preferred instead. Callback prototype: `Status(const cpp_vartype&, const boost::optional&);` +- `validator`: Zero or many validation rules to impose on the setting. All specified rules must pass + to consider the new setting valid. `lt`, `gt`, `lte`, `gte` fields provide for simple numeric limits + or expression maps which evaluate to numeric values. For all other validation cases, specify + callback as a C++ function or static method. Note that validation rules (including callback) may run + in any order. To perform an action after all validation rules have completed, `on_update` should be + preferred instead. Callback prototype: `Status(const cpp_vartype&, const boost::optional&);` -Any symbols such as global variables or callbacks used by a server parameter must be imported using -the usual IDL machinery via `globals.cpp_includes`. Similarly, all generated code will be nested +Any symbols such as global variables or callbacks used by a server parameter must be imported using +the usual IDL machinery via `globals.cpp_includes`. Similarly, all generated code will be nested inside the namespace defined by `globals.cpp_namespace`. Consider the following for example: + ```yaml global: cpp_namespace: "mongo" @@ -157,13 +166,15 @@ server_parameters: ``` ### String or Expression Map -The default and implicit fields above, as well as the `gt`, `lt`, `gte`, and `lte` validators accept -either a simple scalar string which is treated as a literal value, or a YAML map containing an -attribute called `expr`, which must be a string containing an arbitrary C++ expression to be used -as-is. Optionally, an expression map may also include the `is_constexpr: false` attribute, which + +The default and implicit fields above, as well as the `gt`, `lt`, `gte`, and `lte` validators accept +either a simple scalar string which is treated as a literal value, or a YAML map containing an +attribute called `expr`, which must be a string containing an arbitrary C++ expression to be used +as-is. Optionally, an expression map may also include the `is_constexpr: false` attribute, which will suspend enforcement of the value being a `constexpr`. For example, consider: + ```yaml server_parameters: connPoolMaxInUseConnsPerHost: @@ -174,79 +185,88 @@ server_parameters: ... ``` -Here, the server parameter's default value is the evaluation of the C++ expression -`std::numeric_limits::max()`. Additionally, since default was not explicitly given the -`is_constexpr: false` attribute, it will be round-tripped through the following lambda to guarantee +Here, the server parameter's default value is the evaluation of the C++ expression +`std::numeric_limits::max()`. Additionally, since default was not explicitly given the +`is_constexpr: false` attribute, it will be round-tripped through the following lambda to guarantee that it does not rely on runtime information. + ```cpp []{ constexpr auto value = ; return value; }() ``` ### Specialized Server Parameters -When `cpp_class` is specified on a server parameter, a child class of `ServerParameter` will be -created in the `gen.h` file named for either the string value of `cpp_class`, or if it is expressed + +When `cpp_class` is specified on a server parameter, a child class of `ServerParameter` will be +created in the `gen.h` file named for either the string value of `cpp_class`, or if it is expressed as a dictionary, then `cpp_class.name`. A `cpp_class` directive may also contain: + ```yaml server_parameters: - someParameter: - cpp_class: - name: string # Name to assign to the class (e.g., SomeParameterImpl) - data: string # cpp data type to add to the class as a property named "_data" - override_ctor: bool # True to allow defining a custom constructor, default: false - override_set: bool # True to allow defining a custom set() method, default: false - override_validate: bool # True to allow defining a custom validate() method, default: false + someParameter: + cpp_class: + name: string # Name to assign to the class (e.g., SomeParameterImpl) + data: string # cpp data type to add to the class as a property named "_data" + override_ctor: bool # True to allow defining a custom constructor, default: false + override_set: bool # True to allow defining a custom set() method, default: false + override_validate: bool # True to allow defining a custom validate() method, default: false ``` -`override_ctor`: If `false`, the inherited constructor from the `ServerParameter` base class will be -used. If `true`, then the implementer must provide a -`{name}::{name}(StringData serverParameterName, ServerParameterType type)` constructor. In addition +`override_ctor`: If `false`, the inherited constructor from the `ServerParameter` base class will be +used. If `true`, then the implementer must provide a +`{name}::{name}(StringData serverParameterName, ServerParameterType type)` constructor. In addition to any other work, this custom constructor must invoke its parent's constructor. `override_set`: If `true`, the implementer must provide a `set` member function as: + ```cpp Status {name}::set(const BSONElement& val, const boost::optional& tenantId); ``` + Otherwise the base class implementation `ServerParameter::set` is used. It invokes `setFromString` using a string representation of `val`, if the `val` is holding one of the supported types. `override_validate`: If `true`, the implementer must provide a `validate` member function as: + ```cpp Status {name}::validate(const BSONElement& newValueElement, const boost::optional& tenantId); ``` + Otherwise, the base class implementation `ServerParameter::validate` is used. This simply returns `Status::OK()` without performing any kind of validation of the new BSON element. -If `param.redact` was specified as `true`, then a standard append method will be provided which -injects a placeholder value. If `param.redact` was not specified as `true`, then an implementation -must be provided with the following signature: +If `param.redact` was specified as `true`, then a standard append method will be provided which +injects a placeholder value. If `param.redact` was not specified as `true`, then an implementation +must be provided with the following signature: ```cpp Status {name}::append(OperationContext*, BSONObjBuilder*, StringData, const boost::optional& tenantId); ``` Lastly, a `setFromString` method must always be provided with the following signature: + ```cpp Status {name}::setFromString(StringData value, const boost::optional& tenantId); ``` The following table summarizes `ServerParameter` method override rules. -| `ServerParameter` method | Override | Default Behavior | -| ------------------------ | -------- | -------------------------------------------------------------------- | -| constructor | Optional | Instantiates only the name and type. | -| `set()` | Optional | Calls `setFromString()` on a string representation of the new value. | -| `setFromString()` | Required | None, won't compile without implementation. | -| `append() // redact=true` | Optional | Replaces parameter value with '###'. | -| `append() // redact=false` | Required | None, won't compile without implementation. | -| `validate()` | Optional | Returns `Status::OK()` without any checks. | +| `ServerParameter` method | Override | Default Behavior | +| -------------------------- | -------- | -------------------------------------------------------------------- | +| constructor | Optional | Instantiates only the name and type. | +| `set()` | Optional | Calls `setFromString()` on a string representation of the new value. | +| `setFromString()` | Required | None, won't compile without implementation. | +| `append() // redact=true` | Optional | Replaces parameter value with '###'. | +| `append() // redact=false` | Required | None, won't compile without implementation. | +| `validate()` | Optional | Returns `Status::OK()` without any checks. | Note that by default, server parameters are not tenant-aware and thus will always have `boost::none` provided as `tenantId`, unless defined as cluster server parameters (discussed [below](#cluster-server-parameters)). -Each server parameter encountered will produce a block of code to run at process startup similar to +Each server parameter encountered will produce a block of code to run at process startup similar to the following: + ```cpp /** * Iteration count to use when creating new users with @@ -263,20 +283,21 @@ MONGO_COMPILER_VARIABLE_UNUSED auto* scp_unique_ident = [] { }(); ``` -Any additional validator and callback would be set on `ret` as determined by the server parameter +Any additional validator and callback would be set on `ret` as determined by the server parameter configuration block. ## Cluster Server Parameters -As indicated earlier, one of the options for the `set_at` field is `cluster`. If this value is -selected, then the generated server parameter will be known as a _cluster server parameter_. These + +As indicated earlier, one of the options for the `set_at` field is `cluster`. If this value is +selected, then the generated server parameter will be known as a _cluster server parameter_. These server parameters are set at runtime via the `setClusterParameter` command and are propagated to all nodes in a sharded cluster or a replica set deployment. Cluster server parameters should be preferred to implementing custom parameter propagation whenever possible. -`setClusterParameter` persists the new value of the indicated cluster server parameter onto a -majority of nodes on non-sharded replica sets. On sharded clusters, it majority-writes the new value -onto every shard and the config server. This ensures that every **mongod** in the cluster will be able -to recover the most recently written value for all cluster server parameters on restart. +`setClusterParameter` persists the new value of the indicated cluster server parameter onto a +majority of nodes on non-sharded replica sets. On sharded clusters, it majority-writes the new value +onto every shard and the config server. This ensures that every **mongod** in the cluster will be able +to recover the most recently written value for all cluster server parameters on restart. Additionally, `setClusterParameter` blocks until the majority write succeeds in a replica set deployment, which guarantees that the parameter value will not be rolled back after being set. In a sharded cluster deployment, the new value has to be majority-committed on the config shard and @@ -289,38 +310,39 @@ server parameter values every `clusterServerParameterRefreshIntervalSecs` using `ClusterParameterRefresher` periodic job. `getClusterParameter` returns the cached value of the requested cluster server parameter on the node -that it is run on. It can accept a single cluster server parameter name, a list of names, or `*` to +that it is run on. It can accept a single cluster server parameter name, a list of names, or `*` to return all cluster server parameter values on the node. Specifying `cpp_vartype` for cluster server parameters must result in the usage of an IDL-defined type that has `ClusterServerParameter` listed as a chained structure. This chaining adds the following members to the resulting type: -* `_id` - cluster server parameters are uniquely identified by their names. -* `clusterParameterTime` - `LogicalTime` at which the current value of the cluster server parameter - was updated; used by runtime audit configuration, and to prevent concurrent and redundant cluster - parameter updates. +- `_id` - cluster server parameters are uniquely identified by their names. +- `clusterParameterTime` - `LogicalTime` at which the current value of the cluster server parameter + was updated; used by runtime audit configuration, and to prevent concurrent and redundant cluster + parameter updates. It is highly recommended to specify validation rules or a callback function via the `param.validator` -field. These validators are called before the new value of the cluster server parameter is written -to disk during `setClusterParameter`. +field. These validators are called before the new value of the cluster server parameter is written +to disk during `setClusterParameter`. See [server_parameter_with_storage_test.idl][cluster-server-param-with-storage-test] and [server_parameter_with_storage_test_structs.idl][cluster-server-param-with-storage-test-structs] for examples. ### Specialized Cluster Server Parameters + Cluster server parameters can also be specified as specialized server parameters. The table below summarizes `ServerParameter` method override rules in this case. -| `ServerParameter` method | Override | Default Behavior | -| --------------------------- | ------------------- | --------------------------------------------| -| constructor | Optional | Instantiates only the name and type. | -| `set()` | Required | None, won't compile without implementation. | -| `setFromString()` | Prohibited | Returns `ErrorCodes::BadValue`. | -| `append()` | Required | None, won't compile without implementation. | -| `validate()` | Optional | Return `Status::OK()` without any checks. | -| `reset()` | Required | None, won't compile without implementation. | -| `getClusterParameterTime()` | Required | Return `LogicalTime::kUninitialized`. | +| `ServerParameter` method | Override | Default Behavior | +| --------------------------- | ---------- | ------------------------------------------- | +| constructor | Optional | Instantiates only the name and type. | +| `set()` | Required | None, won't compile without implementation. | +| `setFromString()` | Prohibited | Returns `ErrorCodes::BadValue`. | +| `append()` | Required | None, won't compile without implementation. | +| `validate()` | Optional | Return `Status::OK()` without any checks. | +| `reset()` | Required | None, won't compile without implementation. | +| `getClusterParameterTime()` | Required | Return `LogicalTime::kUninitialized`. | Specifying `override_ctor` to true is optional. An override constructor can be useful for allocating additional resources at the time of parameter registration. Otherwise, the default likely suffices, @@ -363,7 +385,7 @@ disabled due to either of these conditions, `setClusterParameter` on it will alw `getClusterParameter` will fail on **mongod**, and return the default value on **mongos** -- this difference in behavior is due to **mongos** being unaware of the current FCV. -See [server_parameter_specialized_test.idl][specialized-cluster-server-param-test-idl] and +See [server_parameter_specialized_test.idl][specialized-cluster-server-param-test-idl] and [server_parameter_specialized_test.h][specialized-cluster-server-param-test-data] for examples. ### Implementation Details diff --git a/docs/string_manipulation.md b/docs/string_manipulation.md index ed881784b06..551e0253028 100644 --- a/docs/string_manipulation.md +++ b/docs/string_manipulation.md @@ -7,6 +7,7 @@ For string manipulation, use the util/mongoutils/str.h library. `util/mongoutils/str.h` provides string helper functions for each manipulation. `str::stream()` is quite useful for assembling strings inline: + ``` uassert(12345, str::stream() << "bad ns:" << ns, isOk); ``` @@ -27,5 +28,4 @@ class StringData { See also [`bson/string_data.h`][1]. - [1]: ../src/mongo/base/string_data.h diff --git a/docs/test_commands.md b/docs/test_commands.md index e247ae66988..35f4802851c 100644 --- a/docs/test_commands.md +++ b/docs/test_commands.md @@ -17,10 +17,10 @@ parameter for testing. Some often-used commands that are test-only: -- [configureFailPoint][fail_point_cmd] -- [emptyCapped][empty_capped_cmd] -- [replSetTest][repl_set_test_cmd] -- [sleep][sleep_cmd] +- [configureFailPoint][fail_point_cmd] +- [emptyCapped][empty_capped_cmd] +- [replSetTest][repl_set_test_cmd] +- [sleep][sleep_cmd] As a very rough estimate, about 10% of all server commands are test-only. These additional commands will appear in `db.runCommand({listCommands: 1})` when the server has test commands enabled. @@ -29,10 +29,9 @@ will appear in `db.runCommand({listCommands: 1})` when the server has test comma A few pointers to relevant code that sets this up: -- [test_commands_enabled.h][test_commands_enabled] - -- [MONGO_REGISTER_COMMAND][register_command] +- [test_commands_enabled.h][test_commands_enabled] +- [MONGO_REGISTER_COMMAND][register_command] [empty_capped_cmd]: ../src/mongo/db/commands/test_commands.cpp [fail_point_cmd]: ../src/mongo/db/commands/fail_point_cmd.cpp diff --git a/docs/testing/fsm_concurrency_testing_framework.md b/docs/testing/fsm_concurrency_testing_framework.md index 43736450d92..f3d9bf98774 100644 --- a/docs/testing/fsm_concurrency_testing_framework.md +++ b/docs/testing/fsm_concurrency_testing_framework.md @@ -1,7 +1,7 @@ # FSM-based Concurrency Testing Framework - ## Overview + The FSM tests are meant to exercise concurrency within MongoDB. The suite consists of workloads, which define discrete units of work as states in a FSM, and runners, which define which tests to run and how they should be run. Each @@ -42,19 +42,19 @@ some assertions, even when running a mixture of different workloads together. There are three assertion levels: `ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can be thought of as follows: -* `ALWAYS`: A statement that remains unequivocally true, regardless of what - another workload might be doing to the collection I was given (hint: think - defensively). Examples include "1 = 1" or inserting a document into a - collection (disregarding any unique indices). +- `ALWAYS`: A statement that remains unequivocally true, regardless of what + another workload might be doing to the collection I was given (hint: think + defensively). Examples include "1 = 1" or inserting a document into a + collection (disregarding any unique indices). -* `OWN_COLL`: A statement that is true only if I am the only workload operating - on the collection I was given. Examples include counting the number of - documents in a collection or updating a previously inserted document. +- `OWN_COLL`: A statement that is true only if I am the only workload operating + on the collection I was given. Examples include counting the number of + documents in a collection or updating a previously inserted document. -* `OWN_DB`: A statement that is true only if I am the only workload operating on - the database I was given. Examples include renaming a collection or verifying - that a collection is capped. The workload typically relies on the use of - another collection aside from the one given. +- `OWN_DB`: A statement that is true only if I am the only workload operating on + the database I was given. Examples include renaming a collection or verifying + that a collection is capped. The workload typically relies on the use of + another collection aside from the one given. ## Creating your own workload @@ -97,6 +97,7 @@ When finished executing, `$config` must return an object containing the properti above (some of which are optional, see below). ### Defining states + It's best to also declare states within its own closure so as not to interfere with the scope of $config. Each state takes two arguments, the db object and the collection name. For later, note that this db and collection are the only one @@ -107,9 +108,9 @@ with a name as opposed to anonymously - this makes easier to read backtraces when things go wrong. ```javascript -$config = (function() { +$config = (function () { /* ... */ - var states = (function() { + var states = (function () { function getRand() { return Random.randInt(10); } @@ -119,18 +120,17 @@ $config = (function() { } function scanGT(db, collName) { - db[collName].find({ _id: { $gt: this.start } }).itcount(); + db[collName].find({_id: {$gt: this.start}}).itcount(); } function scanLTE(db, collName) { - db[collName].find({ _id: { $lte: this.start } }).itcount(); + db[collName].find({_id: {$lte: this.start}}).itcount(); } - return { init: init, scanGT: scanGT, - scanLTE: scanLTE + scanLTE: scanLTE, }; })(); @@ -156,13 +156,12 @@ example below, we're denoting an equal probability of moving to either of the scan states from the init state: ```javascript - -$config = (function() { +$config = (function () { /* ... */ var transitions = { - init: { scanGT: 0.5, scanLTE: 0.5 }, - scanGT: { scanGT: 0.8, scanLTE: 0.2 }, - scanLTE: { scanGT: 0.2, scanLTE: 0.8 } + init: {scanGT: 0.5, scanLTE: 0.5}, + scanGT: {scanGT: 0.8, scanLTE: 0.2}, + scanLTE: {scanGT: 0.2, scanLTE: 0.8}, }; /* ... */ return { @@ -186,25 +185,31 @@ against the provided `db` you should use the provided `cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality. ```javascript -$config = (function() { +$config = (function () { /* ... */ function setup(db, collName, cluster) { // Workloads should NOT drop the collection db[collName], as doing so // is handled by jstests/concurrency/fsm_libs/runner.js before 'setup' is called. for (var i = 0; i < 1000; ++i) { - db[collName].insert({ _id: i }); + db[collName].insert({_id: i}); } - cluster.executeOnMongodNodes(function(db) { - db.adminCommand({ setParameter: 1, internalQueryExecYieldIterations: 5 }); + cluster.executeOnMongodNodes(function (db) { + db.adminCommand({ + setParameter: 1, + internalQueryExecYieldIterations: 5, + }); }); - cluster.executeOnMongosNodes(function(db) { + cluster.executeOnMongosNodes(function (db) { printjson(db.serverCmdLineOpts()); }); } function teardown(db, collName, cluster) { - cluster.executeOnMongodNodes(function(db) { - db.adminCommand({ setParameter: 1, internalQueryExecYieldIterations: 128 }); + cluster.executeOnMongodNodes(function (db) { + db.adminCommand({ + setParameter: 1, + internalQueryExecYieldIterations: 128, + }); }); } /* ... */ @@ -233,9 +238,9 @@ composition, each workload has its own data, meaning you don't have to worry about properties being overridden by workloads other than the current one. ```javascript -$config = (function() { +$config = (function () { var data = { - start: 0 + start: 0, }; /* ... */ return { @@ -262,7 +267,7 @@ number of threads available due to system or performance constraints. #### `iterations` This is just the number of states the FSM will go through before exiting. NOTE: -it is *not* the number of times each state will be executed. +it is _not_ the number of times each state will be executed. #### `startState` (optional) @@ -298,8 +303,8 @@ workload you are extending has a function in its data object called ```javascript import {extendWorkload} from "jstests/concurrency/fsm_libs/extend_workload.js"; -load('jstests/concurrency/fsm_workload_modifiers/indexed_noindex.js'); // for indexedNoindex -import {$config as $baseConfig} from 'jstests/concurrency/fsm_workloads/workload_with_index.js'; +load("jstests/concurrency/fsm_workload_modifiers/indexed_noindex.js"); // for indexedNoindex +import {$config as $baseConfig} from "jstests/concurrency/fsm_workloads/workload_with_index.js"; export const $config = extendWorkload($baseConfig, indexedNoIndex); ``` @@ -314,7 +319,6 @@ prefix defined by your workload name is a good idea since the workload file name can be assumed unique and will allow you to only affect your workload in these cases. - ## Test runners By default, all runners below are allowed to open a maximum of @@ -345,7 +349,6 @@ all complete, all threads have their teardown function run. ![fsm_simultaneous_example.png](../images/testing/fsm_simultaneous_example.png) - ### Existing runners The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to @@ -358,10 +361,10 @@ is explained in the other components section below. Execution options for runWorkloads functions, the third argument, can contain the following options (some depend on the run mode): -* `numSubsets` - Not available in serial mode, determines how many subsets of - workloads to execute in parallel mode -* `subsetSize` - Not available in serial mode, determines how large each subset of - workloads executed is +- `numSubsets` - Not available in serial mode, determines how many subsets of + workloads to execute in parallel mode +- `subsetSize` - Not available in serial mode, determines how large each subset of + workloads executed is #### fsm_all.js @@ -443,16 +446,16 @@ use of the shell's built-in cluster test helpers like `ShardingTest` and `ReplSetTest`. clusterOptions are passed to cluster.js for initialization. clusterOptions include: -* `replication`: boolean, whether or not to use replication in the cluster -* `sameCollection`: boolean, whether or not all workloads are passed the same - collection -* `sameDB`: boolean, whether or not all workloads are passed the same DB -* `setupFunctions`: object, containing at most two functions under the keys - 'mongod' and 'mongos'. This allows you to run a function against all mongod or - mongos nodes in the cluster as part of the cluster initialization. Each - function takes a single argument, the db object against which configuration - can be run (will be set for each mongod/mongos) -* `sharded`: boolean, whether or not to use sharding in the cluster +- `replication`: boolean, whether or not to use replication in the cluster +- `sameCollection`: boolean, whether or not all workloads are passed the same + collection +- `sameDB`: boolean, whether or not all workloads are passed the same DB +- `setupFunctions`: object, containing at most two functions under the keys + 'mongod' and 'mongos'. This allows you to run a function against all mongod or + mongos nodes in the cluster as part of the cluster initialization. Each + function takes a single argument, the db object against which configuration + can be run (will be set for each mongod/mongos) +- `sharded`: boolean, whether or not to use sharding in the cluster Note that sameCollection and sameDB can increase contention for a resource, but will also decrease the strength of the assertions by ruling out the use of OwnDB @@ -460,12 +463,12 @@ and OwnColl assertions. ### Miscellaneous Execution Notes -* A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) - is used as a synchronization primitive by the ThreadManager to wait until all - spawned threads have finished being spawned before starting workload - execution. -* If more than 20% of the threads fail while spawning, we abort the test. If - fewer than 20% of the threads fail while spawning we allow the non-failed - threads to continue with the test. The 20% threshold is somewhat arbitrary; - the goal is to abort if "mostly all" of the threads failed but to tolerate "a - few" threads failing. +- A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) + is used as a synchronization primitive by the ThreadManager to wait until all + spawned threads have finished being spawned before starting workload + execution. +- If more than 20% of the threads fail while spawning, we abort the test. If + fewer than 20% of the threads fail while spawning we allow the non-failed + threads to continue with the test. The 20% threshold is somewhat arbitrary; + the goal is to abort if "mostly all" of the threads failed but to tolerate "a + few" threads failing. diff --git a/docs/testing/hang_analyzer.md b/docs/testing/hang_analyzer.md index 7a3e560e3e9..3fa2276d1d0 100644 --- a/docs/testing/hang_analyzer.md +++ b/docs/testing/hang_analyzer.md @@ -3,7 +3,7 @@ The hang analyzer is a tool to collect cores and other information from processes that are suspected to have hung. Any task which exceeds its timeout in Evergreen will automatically be hang-analyzed, with information being written compressed -and uploaded to S3. +and uploaded to S3. The hang analyzer can also be invoked locally at any time. For all non-Jepsen tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -m exact -p python`. You may need to substitute `python` with the name of the python binary @@ -13,6 +13,7 @@ you are using, which may be one of `python`, `python3`, or on Windows: `Python`, For jepsen tasks, the invocation is `buildscripts/resmoke.py hang-analyzer -o file -o stdout -p dbtest,java,mongo,mongod,mongos,python,_test`. ## Interesting Processes + The hang analyzer detects and runs against processes which are considered interesting. @@ -21,33 +22,37 @@ of `dbtest,java,mongo,mongod,mongos,python,_test`. In all other scenarios, including local use of the hang-analyzer, an interesting process is any of: -* process that starts with `python` or `live-record` -* one which has been spawned as a child process of resmoke. + +- process that starts with `python` or `live-record` +- one which has been spawned as a child process of resmoke. The resmoke subcommand `hang-analyzer` will send SIGUSR1/use SetEvent to signal resmoke to: -* Print stack traces for all python threads -* Collect core dumps and other information for any non-python child -processes, see `Data Collection` below -* Re-signal any python child processes to do the same + +- Print stack traces for all python threads +- Collect core dumps and other information for any non-python child + processes, see `Data Collection` below +- Re-signal any python child processes to do the same ## Data Collection -Data collection occurs in the following sequence: -* Pause all non-python processes -* Grab debug symbols on non-Sanitizer builds -* Signal python Processes -* Dump cores of as many processes as possible, until the disk quota is exceeded. -The default quota is 90% of total volume space. -* Collect additional, non-core data. Ideally: - * Print C++ Stack traces - * Print MozJS Stack Traces - * Dump locks/mutexes info - * Dump Server Sessions - * Dump Recovery Units - * Dump Storage engine info -* Dump java processes (Jepsen tests) with jstack -* SIGABRT (Unix)/terminate (Windows) go processes +Data collection occurs in the following sequence: + +- Pause all non-python processes +- Grab debug symbols on non-Sanitizer builds +- Signal python Processes +- Dump cores of as many processes as possible, until the disk quota is exceeded. + The default quota is 90% of total volume space. + +- Collect additional, non-core data. Ideally: + - Print C++ Stack traces + - Print MozJS Stack Traces + - Dump locks/mutexes info + - Dump Server Sessions + - Dump Recovery Units + - Dump Storage engine info +- Dump java processes (Jepsen tests) with jstack +- SIGABRT (Unix)/terminate (Windows) go processes Note that the list of non-core data collected is only accurate on Linux. Other platforms only perform a subset of these operations. @@ -57,11 +62,12 @@ timeouts, and may not have enough time to collect all information before being terminated by the Evergreen agent. When running locally there is no timeout, and the hang analyzer may ironically hang indefinitely. - ### Implementations + Platform-specific concerns for data collection are handled by dumper objects in `buildscripts/resmokelib/hang_analyzer/dumper.py`. -* Linux: See `GDBDumper` -* MacOS: See `LLDBDumper` -* Windows: See `WindowsDumper` and `JstackWindowsDumper` -* Java (non-Windows): `JstackDumper` + +- Linux: See `GDBDumper` +- MacOS: See `LLDBDumper` +- Windows: See `WindowsDumper` and `JstackWindowsDumper` +- Java (non-Windows): `JstackDumper` diff --git a/docs/testing/otel_resmoke.md b/docs/testing/otel_resmoke.md index 30cfea940b8..4a9f19a90f4 100644 --- a/docs/testing/otel_resmoke.md +++ b/docs/testing/otel_resmoke.md @@ -1,8 +1,11 @@ # Open telemetry (OTel) in resmoke + OTel is one of two systems we use to capture metrics from resmoke. For mongo-tooling-metrics please see the documentation [here](README.md). ## What Do We Capture + Using OTel we capture the following things + 1. How long a resmoke suite takes to run (a collection of js tests) 2. How long each test in a suite takes to run (a single js test) 3. Duration of hooks before and after test/suite @@ -13,17 +16,22 @@ To see this visually navigate to the [resmoke dataset](https://ui.honeycomb.io/m ## A look at source code ### Configuration + The bulk of configuration is done in the `_set_up_tracing(...)` method in [configure_resmoke.py#L164](https://github.com/10gen/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/configure_resmoke.py#L164). This method includes documentation on how it works. ## BatchedBaggageSpanProcessor + See documentation [batched_baggage_span_processor.py#L8](https://github.com/mongodb/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/batched_baggage_span_processor.py#L8) ## FileSpanExporter + See documentation [file_span_exporter.py#L16](https://github.com/10gen/mongo/blob/976ce50f6134789e73c639848b35f10040f0ff4a/buildscripts/resmokelib/utils/file_span_exporter.py#L16) ## Capturing Data + We mostly capture data by using a decorator on methods. Example taken from [job.py#L200](https://github.com/10gen/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L200) + ``` TRACER = trace.get_tracer("resmoke") @@ -32,7 +40,9 @@ def func_name(...): span = trace.get_current_span() span.set_attribute("attr1", True) ``` + This system is nice because the decorator captures exceptions and other failures and a user can never forget to close a span. On occasion we will also start a span using the `with` clause in python. However, the decorator method is preferred since the method below makes more of a readability impact on the code. This example is taken from [job.py#L215](https://github.com/10gen/mongo/blob/6d36ac392086df85844870eef1d773f35020896c/buildscripts/resmokelib/testing/job.py#L215) + ``` with TRACER.start_as_current_span("func_name", attributes={}): func_name(...) @@ -40,4 +50,5 @@ with TRACER.start_as_current_span("func_name", attributes={}): ``` ## Insights We Have Made (so far) + Using [this dashboard](https://ui.honeycomb.io/mongodb-4b/environments/production/board/3bATQLb38bh/Server-CI) and [this query](https://ui.honeycomb.io/mongodb-4b/environments/production/datasets/resmoke/result/GFa2YJ6d4vU/a/7EYuMJtH8KX/Slowest-Resmoke-Tests) we can see the most expensive single js tests. We plan to make tickets for teams to fix these long running tests for cloud savings as well as developer time savings. diff --git a/docs/thread_pools.md b/docs/thread_pools.md index 75ae14797aa..c17681f9278 100644 --- a/docs/thread_pools.md +++ b/docs/thread_pools.md @@ -9,6 +9,7 @@ burden of starting and destroying a dedicated thead. ## Classes ### `ThreadPoolInterface` + The [`ThreadPoolInterface`][thread_pool_interface.h] abstract interface is an extension of the `OutOfLineExecutor` (see [the executors architecture guide][executors]) abstract interface, adding `startup`, `shutdown`, and @@ -58,4 +59,3 @@ resources it simulates a thread pool well enough to be used by a [network_interface_thread_pool.h]: ../src/mongo/executor/network_interface_thread_pool.h [network_interface.h]: ../src/mongo/executor/network_interface.h [thread_pool_mock.h]: ../src/mongo/executor/thread_pool_mock.h - diff --git a/docs/vpat.md b/docs/vpat.md index 06f93b7e40c..c7917417406 100644 --- a/docs/vpat.md +++ b/docs/vpat.md @@ -4,133 +4,133 @@ Contact for more Information: https://www.mongodb.com/contact ## Summary Table - -|Criteria|Supporting Features|Remarks and explanations| -|---|---|---| -|Section 1194.21 Software Applications and Operating Systems|Product has been coded to meet this standard subject to the remarks on the right.|| -|Section 1194.22 Web-based Internet Information and Applications|Product has been coded to meet this standard subject to the remarks on the right.|| -|Section 1194.23 Telecommunications Products|Not Applicable|| -|Section 1194.24 Video and Multi-media Products|Not Applicable|| -|Section 1194.25 Self-Contained, Closed Products|Not Applicable|| -|Section 1194.26 Desktop and Portable Computers|Not Applicable|| -|Section 1194.31 Functional Performance Criteria|Product has been coded to meet this standard subject to the remarks on the right.|| -|Section 1194.41 Information, Documentation and Support|Product has been coded to meet this standard subject to the remarks on the right.|| +| Criteria | Supporting Features | Remarks and explanations | +| --------------------------------------------------------------- | --------------------------------------------------------------------------------- | ------------------------ | +| Section 1194.21 Software Applications and Operating Systems | Product has been coded to meet this standard subject to the remarks on the right. | | +| Section 1194.22 Web-based Internet Information and Applications | Product has been coded to meet this standard subject to the remarks on the right. | | +| Section 1194.23 Telecommunications Products | Not Applicable | | +| Section 1194.24 Video and Multi-media Products | Not Applicable | | +| Section 1194.25 Self-Contained, Closed Products | Not Applicable | | +| Section 1194.26 Desktop and Portable Computers | Not Applicable | | +| Section 1194.31 Functional Performance Criteria | Product has been coded to meet this standard subject to the remarks on the right. | | +| Section 1194.41 Information, Documentation and Support | Product has been coded to meet this standard subject to the remarks on the right. | | ## Section 1194.21 Software Applications and Operating Systems – Detail - -|Criteria |Supporting Features|Remarks and explanations| -|---|---|---| -|(a) When software is designed to run on a system that has a keyboard, product functions shall be executable from a keyboard where the function itself or the result of performing a function can be discerned textually.|Product has been coded to meet this standard subject to the remarks on the right.|All functions can be executed from the keyboard.| -|(b) Applications shall not disrupt or disable activated features of other products that are identified as accessibility features, where those features are developed and documented according to industry standards. Applications also shall not disrupt or disable activated features of any operating system that are identified as accessibility features where the application programming interface for those accessibility features has been documented by the manufacturer of the operating system and is available to the product developer.|Product has been coded to meet this standard subject to the remarks on the right.|Does not interfere with Mouse Keys, Sticky Keys, Filter Keys or Toggle Keys.| -|(c\) A well-defined on-screen indication of the current focus shall be provided that moves among interactive interface elements as the input focus changes. The focus shall be programmatically exposed so that Assistive Technology can track focus and focus changes.|Product has been coded to meet this standard subject to the remarks on the right.|Focus is programmatically exposed.| -|(d) Sufficient information about a user interface element including the identity, operation and state of the element shall be available to Assistive Technology. When an image represents a program element, the information conveyed by the image must also be available in text.|Product has been coded to meet this standard subject to the remarks on the right.|Information about each UI element is programmatically exposed.| -|(e) When bitmap images are used to identify controls, status indicators, or other programmatic elements, the meaning assigned to those images shall be consistent throughout an application's performance.|Product has been coded to meet this standard subject to the remarks on the right.|Does not use bitmap images.| -|(f) Textual information shall be provided through operating system functions for displaying text. The minimum information that shall be made available is text content, text input caret location, and text attributes.|Product has been coded to meet this standard subject to the remarks on the right.|Information about each UI element is programmatically exposed.| -|(g) Applications shall not override user selected contrast and color selections and other individual display attributes.|Product has been coded to meet this standard subject to the remarks on the right.|Windows or other OS-level color settings are not over-ruled by product.| -|(h) When animation is displayed, the information shall be displayable in at least one non-animated presentation mode at the option of the user.|Product has been coded to meet this standard subject to the remarks on the right.|Does not use animation in UI.| -|(i) Color coding shall not be used as the only means of conveying information, indicating an action, prompting a response, or distinguishing a visual element.|Product has been coded to meet this standard subject to the remarks on the right.|Color coding is not used.| -|(j) When a product permits a user to adjust color and contrast settings, a variety of color selections capable of producing a range of contrast levels shall be provided.|Product has been coded to meet this standard subject to the remarks on the right.|Does not permit use to adjust color and contrast settings. | -|(k) Software shall not use flashing or blinking text, objects, or other elements having a flash or blink frequency greater than 2 Hz and lower than 55 Hz.|Product has been coded to meet this standard subject to the remarks on the right. |There is no instance of blinking or flashing objects that are within the danger range of 2hz to 55hz.| -|(l) When electronic forms are used, the form shall allow people using Assistive Technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues.|Product has been coded to meet this standard subject to the remarks on the right.|All functions can be executed from the keyboard.| + +| Criteria | Supporting Features | Remarks and explanations | +| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | +| (a) When software is designed to run on a system that has a keyboard, product functions shall be executable from a keyboard where the function itself or the result of performing a function can be discerned textually. | Product has been coded to meet this standard subject to the remarks on the right. | All functions can be executed from the keyboard. | +| (b) Applications shall not disrupt or disable activated features of other products that are identified as accessibility features, where those features are developed and documented according to industry standards. Applications also shall not disrupt or disable activated features of any operating system that are identified as accessibility features where the application programming interface for those accessibility features has been documented by the manufacturer of the operating system and is available to the product developer. | Product has been coded to meet this standard subject to the remarks on the right. | Does not interfere with Mouse Keys, Sticky Keys, Filter Keys or Toggle Keys. | +| (c\) A well-defined on-screen indication of the current focus shall be provided that moves among interactive interface elements as the input focus changes. The focus shall be programmatically exposed so that Assistive Technology can track focus and focus changes. | Product has been coded to meet this standard subject to the remarks on the right. | Focus is programmatically exposed. | +| (d) Sufficient information about a user interface element including the identity, operation and state of the element shall be available to Assistive Technology. When an image represents a program element, the information conveyed by the image must also be available in text. | Product has been coded to meet this standard subject to the remarks on the right. | Information about each UI element is programmatically exposed. | +| (e) When bitmap images are used to identify controls, status indicators, or other programmatic elements, the meaning assigned to those images shall be consistent throughout an application's performance. | Product has been coded to meet this standard subject to the remarks on the right. | Does not use bitmap images. | +| (f) Textual information shall be provided through operating system functions for displaying text. The minimum information that shall be made available is text content, text input caret location, and text attributes. | Product has been coded to meet this standard subject to the remarks on the right. | Information about each UI element is programmatically exposed. | +| (g) Applications shall not override user selected contrast and color selections and other individual display attributes. | Product has been coded to meet this standard subject to the remarks on the right. | Windows or other OS-level color settings are not over-ruled by product. | +| (h) When animation is displayed, the information shall be displayable in at least one non-animated presentation mode at the option of the user. | Product has been coded to meet this standard subject to the remarks on the right. | Does not use animation in UI. | +| (i) Color coding shall not be used as the only means of conveying information, indicating an action, prompting a response, or distinguishing a visual element. | Product has been coded to meet this standard subject to the remarks on the right. | Color coding is not used. | +| (j) When a product permits a user to adjust color and contrast settings, a variety of color selections capable of producing a range of contrast levels shall be provided. | Product has been coded to meet this standard subject to the remarks on the right. | Does not permit use to adjust color and contrast settings.  | +| (k) Software shall not use flashing or blinking text, objects, or other elements having a flash or blink frequency greater than 2 Hz and lower than 55 Hz. | Product has been coded to meet this standard subject to the remarks on the right. | There is no instance of blinking or flashing objects that are within the danger range of 2hz to 55hz. | +| (l) When electronic forms are used, the form shall allow people using Assistive Technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues. | Product has been coded to meet this standard subject to the remarks on the right. | All functions can be executed from the keyboard. | ## Section 1194.22 Web-based Internet information and applications – Detail - -|Criteria |Supporting Features|Remarks and explanations| -|---|---|---| -|(a) A text equivalent for every non-text element shall be provided (e.g., via "alt", "longdesc", or in element content).|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/.| -|(b) Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/.| -|(c\) Web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(d) Documents shall be organized so they are readable without requiring an associated style sheet.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(e) Redundant text links shall be provided for each active region of a server-side image map.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(f) Client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(g) Row and column headers shall be identified for data tables.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(h) Markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(i) Frames shall be titled with text that facilitates frame identification and navigation|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(j) Pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(k) A text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished in any other way. The content of the text-only page shall be updated whenever the primary page changes.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/.| -|(l) When pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by Assistive Technology.|Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(m) When a web page requires that an applet, plug-in or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with §1194.21(a) through (l). |Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(n) When electronic forms are designed to be completed on-line, the form shall allow people using Assistive Technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues. |Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(o) A method shall be provided that permits users to skip repetitive navigation links. |Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | -|(p\) When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. |Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. |Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | + +| Criteria | Supporting Features | Remarks and explanations | +| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | +| (a) A text equivalent for every non-text element shall be provided (e.g., via "alt", "longdesc", or in element content). | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (b) Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (c\) Web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (d) Documents shall be organized so they are readable without requiring an associated style sheet. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (e) Redundant text links shall be provided for each active region of a server-side image map. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (f) Client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (g) Row and column headers shall be identified for data tables. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (h) Markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (i) Frames shall be titled with text that facilitates frame identification and navigation | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (j) Pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (k) A text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished in any other way. The content of the text-only page shall be updated whenever the primary page changes. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (l) When pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by Assistive Technology. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (m) When a web page requires that an applet, plug-in or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with §1194.21(a) through (l). | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (n) When electronic forms are designed to be completed on-line, the form shall allow people using Assistive Technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (o) A method shall be provided that permits users to skip repetitive navigation links. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | +| (p\) When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. | Core MongoDB documentation has been coded to meet this standard subject to the remarks on the right. | Our documentation complies to this criteria and can be found at https://docs.mongodb.com/manual/. | ### Note to 1194.22 -The Board interprets paragraphs (a) through (k) of this section as consistent with the following -priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999) published by the Web -Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1, (b) - 1.4, (c\) - 2.1, (d) - 6.1, + +The Board interprets paragraphs (a) through (k) of this section as consistent with the following +priority 1 Checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG 1.0) (May 5 1999) published by the Web +Accessibility Initiative of the World Wide Web Consortium: Paragraph (a) - 1.1, (b) - 1.4, (c\) - 2.1, (d) - 6.1, (e) - 1.2, (f) - 9.1, (g) - 5.1, (h) - 5.2, (i) - 12.1, (j) - 7.1, (k) - 11.4. ## Section 1194.23 Telecommunications Products – Detail -|Criteria |Supporting Features|Remarks and explanations| -|---|---|---| -|(a) Telecommunications products or systems which provide a function allowing voice communication and which do not themselves provide a TTY functionality shall provide a standard non-acoustic connection point for TTYs. Microphones shall be capable of being turned on and off to allow the user to intermix speech with TTY use.|Not Applicable|| -|(b) Telecommunications products which include voice communication functionality shall support all commonly used cross-manufacturer non-proprietary standard TTY signal protocols.|Not Applicable|| -|(c\) Voice mail, auto-attendant, and interactive voice response telecommunications systems shall be usable by TTY users with their TTYs.|Not Applicable|| -|(d) Voice mail, messaging, auto-attendant, and interactive voice response telecommunications systems that require a response from a user within a time interval, shall give an alert when the time interval is about to run out, and shall provide sufficient time for the user to indicate more time is required.|Not Applicable|| -|(e) Where provided, caller identification and similar telecommunications functions shall also be available for users of TTYs, and for users who cannot see displays.|Not Applicable|| -|(f) For transmitted voice signals, telecommunications products shall provide a gain adjustable up to a minimum of 20 dB. For incremental volume control, at least one intermediate step of 12 dB of gain shall be provided.|Not Applicable|| -|(g) If the telecommunications product allows a user to adjust the receive volume, a function shall be provided to automatically reset the volume to the default level after every use.|Not Applicable|| -|(h) Where a telecommunications product delivers output by an audio transducer which is normally held up to the ear, a means for effective magnetic wireless coupling to hearing technologies shall be provided.|Not Applicable|| -|(i) Interference to hearing technologies (including hearing aids, cochlear implants, and assistive listening devices) shall be reduced to the lowest possible level that allows a user of hearing technologies to utilize the telecommunications product.|Not Applicable|| -|(j) Products that transmit or conduct information or communication, shall pass through cross-manufacturer, non-proprietary, industry-standard codes, translation protocols, formats or other information necessary to provide the information or communication in a usable format. Technologies which use encoding, signal compression, format transformation, or similar techniques shall not remove information needed for access or shall restore it upon delivery.|Not Applicable|| -|(k)(1) Products which have mechanically operated controls or keys shall comply with the following: Controls and Keys shall be tactilely discernible without activating the controls or keys.|Not Applicable|| -|(k)(2) Products which have mechanically operated controls or keys shall comply with the following: Controls and Keys shall be operable with one hand and shall not require tight grasping, pinching, twisting of the wrist. The force required to activate controls and keys shall be 5 lbs. (22.2N) maximum.|Not Applicable|| -|(k)(3) Products which have mechanically operated controls or keys shall comply with the following: If key repeat is supported, the delay before repeat shall be adjustable to at least 2 seconds. Key repeat rate shall be adjustable to 2 seconds per character.|Not Applicable|| -|(k)(4) Products which have mechanically operated controls or keys shall comply with the following: The status of all locking or toggle controls or keys shall be visually discernible, and discernible either through touch or sound.|Not Applicable|| +| Criteria | Supporting Features | Remarks and explanations | +| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ------------------------ | +| (a) Telecommunications products or systems which provide a function allowing voice communication and which do not themselves provide a TTY functionality shall provide a standard non-acoustic connection point for TTYs. Microphones shall be capable of being turned on and off to allow the user to intermix speech with TTY use. | Not Applicable | | +| (b) Telecommunications products which include voice communication functionality shall support all commonly used cross-manufacturer non-proprietary standard TTY signal protocols. | Not Applicable | | +| (c\) Voice mail, auto-attendant, and interactive voice response telecommunications systems shall be usable by TTY users with their TTYs. | Not Applicable | | +| (d) Voice mail, messaging, auto-attendant, and interactive voice response telecommunications systems that require a response from a user within a time interval, shall give an alert when the time interval is about to run out, and shall provide sufficient time for the user to indicate more time is required. | Not Applicable | | +| (e) Where provided, caller identification and similar telecommunications functions shall also be available for users of TTYs, and for users who cannot see displays. | Not Applicable | | +| (f) For transmitted voice signals, telecommunications products shall provide a gain adjustable up to a minimum of 20 dB. For incremental volume control, at least one intermediate step of 12 dB of gain shall be provided. | Not Applicable | | +| (g) If the telecommunications product allows a user to adjust the receive volume, a function shall be provided to automatically reset the volume to the default level after every use. | Not Applicable | | +| (h) Where a telecommunications product delivers output by an audio transducer which is normally held up to the ear, a means for effective magnetic wireless coupling to hearing technologies shall be provided. | Not Applicable | | +| (i) Interference to hearing technologies (including hearing aids, cochlear implants, and assistive listening devices) shall be reduced to the lowest possible level that allows a user of hearing technologies to utilize the telecommunications product. | Not Applicable | | +| (j) Products that transmit or conduct information or communication, shall pass through cross-manufacturer, non-proprietary, industry-standard codes, translation protocols, formats or other information necessary to provide the information or communication in a usable format. Technologies which use encoding, signal compression, format transformation, or similar techniques shall not remove information needed for access or shall restore it upon delivery. | Not Applicable | | +| (k)(1) Products which have mechanically operated controls or keys shall comply with the following: Controls and Keys shall be tactilely discernible without activating the controls or keys. | Not Applicable | | +| (k)(2) Products which have mechanically operated controls or keys shall comply with the following: Controls and Keys shall be operable with one hand and shall not require tight grasping, pinching, twisting of the wrist. The force required to activate controls and keys shall be 5 lbs. (22.2N) maximum. | Not Applicable | | +| (k)(3) Products which have mechanically operated controls or keys shall comply with the following: If key repeat is supported, the delay before repeat shall be adjustable to at least 2 seconds. Key repeat rate shall be adjustable to 2 seconds per character. | Not Applicable | | +| (k)(4) Products which have mechanically operated controls or keys shall comply with the following: The status of all locking or toggle controls or keys shall be visually discernible, and discernible either through touch or sound. | Not Applicable | | ## Section 1194.24 Video and Multi-media Products – Detail -|Criteria|Supporting Features|Remarks and explanations| -|---|---|---| -|(a) All analog television displays 13 inches and larger, and computer equipment that includes analog television receiver or display circuitry, shall be equipped with caption decoder circuitry which appropriately receives, decodes, and displays closed captions from broadcast, cable, videotape, and DVD signals. As soon as practicable, but not later than July 1, 2002, widescreen digital television (DTV) displays measuring at least 7.8 inches vertically, DTV sets with conventional displays measuring at least 13 inches vertically, and stand-alone DTV tuners, whether or not they are marketed with display screens, and computer equipment that includes DTV receiver or display circuitry, shall be equipped with caption decoder circuitry which appropriately receives, decodes, and displays closed captions from broadcast, cable, videotape, and DVD signals.|Not Applicable|| -|(b) Television tuners, including tuner cards for use in computers, shall be equipped with secondary audio program playback circuitry.|Not Applicable|| -|(c\) All training and informational video and multimedia productions which support the agency's mission, regardless of format, that contain speech or other audio information necessary for the comprehension of the content, shall be open or closed captioned.|Not Applicable|| -|(d) All training and informational video and multimedia productions which support the agency's mission, regardless of format, that contain visual information necessary for the comprehension of the content, shall be audio described.|Not Applicable|| -|(e) Display or presentation of alternate text presentation or audio descriptions shall be user-selectable unless permanent.|Not Applicable|| +| Criteria | Supporting Features | Remarks and explanations | +| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ------------------------ | +| (a) All analog television displays 13 inches and larger, and computer equipment that includes analog television receiver or display circuitry, shall be equipped with caption decoder circuitry which appropriately receives, decodes, and displays closed captions from broadcast, cable, videotape, and DVD signals. As soon as practicable, but not later than July 1, 2002, widescreen digital television (DTV) displays measuring at least 7.8 inches vertically, DTV sets with conventional displays measuring at least 13 inches vertically, and stand-alone DTV tuners, whether or not they are marketed with display screens, and computer equipment that includes DTV receiver or display circuitry, shall be equipped with caption decoder circuitry which appropriately receives, decodes, and displays closed captions from broadcast, cable, videotape, and DVD signals. | Not Applicable | | +| (b) Television tuners, including tuner cards for use in computers, shall be equipped with secondary audio program playback circuitry. | Not Applicable | | +| (c\) All training and informational video and multimedia productions which support the agency's mission, regardless of format, that contain speech or other audio information necessary for the comprehension of the content, shall be open or closed captioned. | Not Applicable | | +| (d) All training and informational video and multimedia productions which support the agency's mission, regardless of format, that contain visual information necessary for the comprehension of the content, shall be audio described. | Not Applicable | | +| (e) Display or presentation of alternate text presentation or audio descriptions shall be user-selectable unless permanent. | Not Applicable | | ## Section 1194.25 Self-Contained, Closed Products – Detail - -|Criteria |Supporting Features|Remarks and explanations| -|---|---|---| -|(a) Self contained products shall be usable by people with disabilities without requiring an end-user to attach Assistive Technology to the product. Personal headsets for private listening are not Assistive Technology.|Not Applicable|| -|(b) When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required.|Not Applicable|| -|(c\) Where a product utilizes touchscreens or contact-sensitive controls, an input method shall be provided that complies with §1194.23 (k) (1) through (4).|Not Applicable|| -|(d) When biometric forms of user identification or control are used, an alternative form of identification or activation, which does not require the user to possess particular biological characteristics, shall also be provided.|Not Applicable|| -|(e) When products provide auditory output, the audio signal shall be provided at a standard signal level through an industry standard connector that will allow for private listening. The product must provide the ability to interrupt, pause, and restart the audio at anytime.|Not Applicable|| -|(f) When products deliver voice output in a public area, incremental volume control shall be provided with output amplification up to a level of at least 65 dB. Where the ambient noise level of the environment is above 45 dB, a volume gain of at least 20 dB above the ambient level shall be user selectable. A function shall be provided to automatically reset the volume to the default level after every use.|Not Applicable|| -|(g) Color coding shall not be used as the only means of conveying information, indicating an action, prompting a response, or distinguishing a visual element.|Not Applicable|| -|(h) When a product permits a user to adjust color and contrast settings, a range of color selections capable of producing a variety of contrast levels shall be provided.|Not Applicable|| -|(i) Products shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz.|Not Applicable|| -|(j)(1) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: The position of any operable control shall be determined with respect to a vertical plane, which is 48 inches in length, centered on the operable control, and at the maximum protrusion of the product within the 48 inch length on products which are freestanding, non-portable, and intended to be used in one location and which have operable controls.|Not Applicable|| -|(j)(2) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: Where any operable control is 10 inches or less behind the reference plane, the height shall be 54 inches maximum and 15 inches minimum above the floor.|Not Applicable|| -|(j)(3) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: Where any operable control is more than 10 inches and not more than 24 inches behind the reference plane, the height shall be 46 inches maximum and 15 inches minimum above the floor.|Not Applicable|| -|(j)(4) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: Operable controls shall not be more than 24 inches behind the reference plane.|Not Applicable|| + +| Criteria | Supporting Features | Remarks and explanations | +| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ------------------------ | +| (a) Self contained products shall be usable by people with disabilities without requiring an end-user to attach Assistive Technology to the product. Personal headsets for private listening are not Assistive Technology. | Not Applicable | | +| (b) When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. | Not Applicable | | +| (c\) Where a product utilizes touchscreens or contact-sensitive controls, an input method shall be provided that complies with §1194.23 (k) (1) through (4). | Not Applicable | | +| (d) When biometric forms of user identification or control are used, an alternative form of identification or activation, which does not require the user to possess particular biological characteristics, shall also be provided. | Not Applicable | | +| (e) When products provide auditory output, the audio signal shall be provided at a standard signal level through an industry standard connector that will allow for private listening. The product must provide the ability to interrupt, pause, and restart the audio at anytime. | Not Applicable | | +| (f) When products deliver voice output in a public area, incremental volume control shall be provided with output amplification up to a level of at least 65 dB. Where the ambient noise level of the environment is above 45 dB, a volume gain of at least 20 dB above the ambient level shall be user selectable. A function shall be provided to automatically reset the volume to the default level after every use. | Not Applicable | | +| (g) Color coding shall not be used as the only means of conveying information, indicating an action, prompting a response, or distinguishing a visual element. | Not Applicable | | +| (h) When a product permits a user to adjust color and contrast settings, a range of color selections capable of producing a variety of contrast levels shall be provided. | Not Applicable | | +| (i) Products shall be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz. | Not Applicable | | +| (j)(1) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: The position of any operable control shall be determined with respect to a vertical plane, which is 48 inches in length, centered on the operable control, and at the maximum protrusion of the product within the 48 inch length on products which are freestanding, non-portable, and intended to be used in one location and which have operable controls. | Not Applicable | | +| (j)(2) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: Where any operable control is 10 inches or less behind the reference plane, the height shall be 54 inches maximum and 15 inches minimum above the floor. | Not Applicable | | +| (j)(3) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: Where any operable control is more than 10 inches and not more than 24 inches behind the reference plane, the height shall be 46 inches maximum and 15 inches minimum above the floor. | Not Applicable | | +| (j)(4) Products which are freestanding, non-portable, and intended to be used in one location and which have operable controls shall comply with the following: Operable controls shall not be more than 24 inches behind the reference plane. | Not Applicable | | ## Section 1194.26 Desktop and Portable Computers – Detail - -|Criteria|Supporting Features|Remarks and explanations| -|---|---|---| -|(a) All mechanically operated controls and keys shall comply with §1194.23 (k) (1) through (4).|Not Applicable|| -|(b) If a product utilizes touchscreens or touch-operated controls, an input method shall be provided that complies with §1194.23 (k) (1) through (4).|Not Applicable|| -|(c\) When biometric forms of user identification or control are used, an alternative form of identification or activation, which does not require the user to possess particular biological characteristics, shall also be provided.|Not Applicable|| -|(d) Where provided, at least one of each type of expansion slots, ports and connectors shall comply with publicly available industry standards|Not Applicable|| + +| Criteria | Supporting Features | Remarks and explanations | +| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------- | ------------------------ | +| (a) All mechanically operated controls and keys shall comply with §1194.23 (k) (1) through (4). | Not Applicable | | +| (b) If a product utilizes touchscreens or touch-operated controls, an input method shall be provided that complies with §1194.23 (k) (1) through (4). | Not Applicable | | +| (c\) When biometric forms of user identification or control are used, an alternative form of identification or activation, which does not require the user to possess particular biological characteristics, shall also be provided. | Not Applicable | | +| (d) Where provided, at least one of each type of expansion slots, ports and connectors shall comply with publicly available industry standards | Not Applicable | | ## Section 1194.31 Functional Performance Criteria – Detail - -|Criteria|Supporting Features|Remarks and explanations| -|---|---|---| -|(a) At least one mode of operation and information retrieval that does not require user vision shall be provided, or support for Assistive Technology used by people who are blind or visually impaired shall be provided.|Applicable|All user operation of the product are compatible with Assistive Technology.| -|(b) At least one mode of operation and information retrieval that does not require visual acuity greater than 20/70 shall be provided in audio and enlarged print output working together or independently, or support for Assistive Technology used by people who are visually impaired shall be provided.|Applicable|All user operation of the product are compatible with Assistive Technology.| -|(c\) At least one mode of operation and information retrieval that does not require user hearing shall be provided, or support for Assistive Technology used by people who are deaf or hard of hearing shall be provided|Not Applicable|There is no reliance on user hearing.| -|(d) Where audio information is important for the use of a product, at least one mode of operation and information retrieval shall be provided in an enhanced auditory fashion, or support for assistive hearing devices shall be provided.|Not Applicable|There is no reliance on sound.| -|(e) At least one mode of operation and information retrieval that does not require user speech shall be provided, or support for Assistive Technology used by people with disabilities shall be provided.|Not Applicable|There is no reliance on user speech.| -|(f) At least one mode of operation and information retrieval that does not require fine motor control or simultaneous actions and that is operable with limited reach and strength shall be provided.|Not Applicable|There is no reliance on fine motor control.| + +| Criteria | Supporting Features | Remarks and explanations | +| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | --------------------------------------------------------------------------- | +| (a) At least one mode of operation and information retrieval that does not require user vision shall be provided, or support for Assistive Technology used by people who are blind or visually impaired shall be provided. | Applicable | All user operation of the product are compatible with Assistive Technology. | +| (b) At least one mode of operation and information retrieval that does not require visual acuity greater than 20/70 shall be provided in audio and enlarged print output working together or independently, or support for Assistive Technology used by people who are visually impaired shall be provided. | Applicable | All user operation of the product are compatible with Assistive Technology. | +| (c\) At least one mode of operation and information retrieval that does not require user hearing shall be provided, or support for Assistive Technology used by people who are deaf or hard of hearing shall be provided | Not Applicable | There is no reliance on user hearing. | +| (d) Where audio information is important for the use of a product, at least one mode of operation and information retrieval shall be provided in an enhanced auditory fashion, or support for assistive hearing devices shall be provided. | Not Applicable | There is no reliance on sound. | +| (e) At least one mode of operation and information retrieval that does not require user speech shall be provided, or support for Assistive Technology used by people with disabilities shall be provided. | Not Applicable | There is no reliance on user speech. | +| (f) At least one mode of operation and information retrieval that does not require fine motor control or simultaneous actions and that is operable with limited reach and strength shall be provided. | Not Applicable | There is no reliance on fine motor control. | ## Section 1194.41 Information, Documentation and Support – Detail - -|Criteria|Supporting Features|Remarks and explanations| -|---|---|---| -|(a) Product support documentation provided to end-users shall be made available in alternate formats upon request, at no additional charge| Support documentation for this product is available in accessible electronic format or print format.| MongoDB documentation is available online: http://docs.mongodb.org/manual/. Alternately, users may, from the same link, access the documentation as single page HTML, epub or PDF format. There is no additional charge for these alternate formats.| -|(b) End-users shall have access to a description of the accessibility and compatibility features of products in alternate formats or alternate methods upon request, at no additional charge.|Description is available in accessible electronic format online. |Information regarding accessibility and compatibility features are available online: https://www.mongodb.com/accessibility/vpat Links to alternative formats may also be contained online, as applicable. There is no additional charge for these alternate formats.| -|(c\) Support services for products shall accommodate the communication needs of end-users with disabilities.|MongoDB, Inc.'s support services via web support https://support.mongodb.com/ . | MongoDB, Inc. customers primarily use SalesForce Service Cloud for communications with support. SalesForce Service Cloud delivers content via a web interface that is accessible to existing screen readers. SalesForce Service Cloud has a VPAT located at http://www.sfdcstatic.com/assets/pdf/misc/VPAT_servicecloud_summer2013.pdf .| + +| Criteria | Supporting Features | Remarks and explanations | +| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| (a) Product support documentation provided to end-users shall be made available in alternate formats upon request, at no additional charge | Support documentation for this product is available in accessible electronic format or print format. | MongoDB documentation is available online: http://docs.mongodb.org/manual/. Alternately, users may, from the same link, access the documentation as single page HTML, epub or PDF format. There is no additional charge for these alternate formats. | +| (b) End-users shall have access to a description of the accessibility and compatibility features of products in alternate formats or alternate methods upon request, at no additional charge. | Description is available in accessible electronic format online. | Information regarding accessibility and compatibility features are available online: https://www.mongodb.com/accessibility/vpat Links to alternative formats may also be contained online, as applicable. There is no additional charge for these alternate formats. | +| (c\) Support services for products shall accommodate the communication needs of end-users with disabilities. | MongoDB, Inc.'s support services via web support https://support.mongodb.com/ . | MongoDB, Inc. customers primarily use SalesForce Service Cloud for communications with support. SalesForce Service Cloud delivers content via a web interface that is accessible to existing screen readers. SalesForce Service Cloud has a VPAT located at http://www.sfdcstatic.com/assets/pdf/misc/VPAT_servicecloud_summer2013.pdf . | diff --git a/jstests/README.md b/jstests/README.md index e3c839151f9..6c82490d266 100644 --- a/jstests/README.md +++ b/jstests/README.md @@ -1,54 +1,88 @@ # Javascript Test Guide + At MongoDB we write integration tests in JavaScript. These are tests written to exercise some behavior of a running MongoDB server, replica set, or sharded cluster. This guide aims to provide some general guidelines and best practices on how to write good tests. + ## Principles + ### Minimize the test case as much as possible while still exercising and testing the desired behavior. -- For example, if you are testing that document deletion works correctly, it may be entirely sufficient to insert just a single document and then delete that document. Inserting multiple documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would be for a new person coming to this test to quickly understand it. If there are multiple documents being inserted into a collection, in a test that only tests document deletion, a newcomer might ask the question: “is it important that the test uses multiple documents, or incidental?”. It is best if you can remove these kinds of questions from a person’s mind, by keeping only the absolute essential parts of a test. -- We should always strive for unittesting when possible, so if the functionality you want to test can be covered by a unit test, we should write a unit test instead. -### Add a block comment at the top of the JavaScript test file giving a clear and concise overview of what a test is trying to verify. -- For tests that are more complicated, a brief description of the test steps might be useful as well. -### Keep debuggability in mind. -- Assertion error messages should contain all information relevant to debugging the test. This means the server’s response from the failed command should almost always be included in the assertion error message. It can also be helpful to include parameters that vary during the test to avoid requiring the investigator to use the logs/backtrace to determine what the test was attempting to do. -- Think about how easy it would be to debug your test if something failed and a newcomer only had the logs of the test to look at. This can help guide your decision on what log messages to include and to what level of detail. The jsTestLog function is useful for this, as it is good at visually demarcating different phases of a test. As a tip, run your test a few times and just study the log messages, imagining you are an engineer debugging the test with only these logs to look at. Think about how understandable the logs would be to a newcomer. It is easy to add log messages to a test but then forget to see how they would actually appear. -- Never insert identical documents unless necessary. It is very useful in debugging to be able to figure out where a given piece of data came from. -- If a test does the same thing multiple times, consider factoring it out into a library. Shorter running tests are easier to debug and code duplication is always bad. -### Do not hardcode collection or database names, especially if they are used multiple times throughout a test. + +- For example, if you are testing that document deletion works correctly, it may be entirely sufficient to insert just a single document and then delete that document. Inserting multiple documents would be unnecessary. A guiding principle on this is to ask yourself how easy it would be for a new person coming to this test to quickly understand it. If there are multiple documents being inserted into a collection, in a test that only tests document deletion, a newcomer might ask the question: “is it important that the test uses multiple documents, or incidental?”. It is best if you can remove these kinds of questions from a person’s mind, by keeping only the absolute essential parts of a test. +- We should always strive for unittesting when possible, so if the functionality you want to test can be covered by a unit test, we should write a unit test instead. + +### Add a block comment at the top of the JavaScript test file giving a clear and concise overview of what a test is trying to verify. + +- For tests that are more complicated, a brief description of the test steps might be useful as well. + +### Keep debuggability in mind. + +- Assertion error messages should contain all information relevant to debugging the test. This means the server’s response from the failed command should almost always be included in the assertion error message. It can also be helpful to include parameters that vary during the test to avoid requiring the investigator to use the logs/backtrace to determine what the test was attempting to do. +- Think about how easy it would be to debug your test if something failed and a newcomer only had the logs of the test to look at. This can help guide your decision on what log messages to include and to what level of detail. The jsTestLog function is useful for this, as it is good at visually demarcating different phases of a test. As a tip, run your test a few times and just study the log messages, imagining you are an engineer debugging the test with only these logs to look at. Think about how understandable the logs would be to a newcomer. It is easy to add log messages to a test but then forget to see how they would actually appear. +- Never insert identical documents unless necessary. It is very useful in debugging to be able to figure out where a given piece of data came from. +- If a test does the same thing multiple times, consider factoring it out into a library. Shorter running tests are easier to debug and code duplication is always bad. + +### Do not hardcode collection or database names, especially if they are used multiple times throughout a test. + It is best to use variable names that attempt to describe what a value is used for. For example, naming a variable that stores a collection name collectionToDrop is much better than just naming the variable collName. + ### Make every effort to make your test as deterministic as possible. -- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself and other engineers to determine if the system really is working correctly or not. Flaky integration tests should be considered bugs, and we should not allow them to be committed to the server codebase. One way to make jstests more deterministic is to use failpoints to force the events happening in expected order. However, if we have to use failpoints to make this test deterministic, we should consider write a unit test instead. -- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those cases we sometimes give up some level of determinism in order to trigger a wider class of rare edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should be the goal. -### Think hard about all the assumptions that the test relies on. -- For example, if a certain phase of the test ran much slower or much faster, would it cause your test to fail for the wrong reason? -- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is waiting for a certain condition to be true, and the test should not proceed until that condition is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for example, 10 minutes. -- Also, for replication tests, make sure data exists on the right nodes at the right time. For example, if you a do a write and don’t explicitly wait for it to replicate, it might not reach a secondary node before you try to do the next step of the test. -- Does your test require data to be stored persistently? Remember that we have test variants that run on in-memory/ephemeral storage engines -- There are timeouts in the test suites and we aim to make all tests in the same suite finish before timeout. That says we should always make the test run quickly to keep the test short in terms of duration. -### Make tests fail as early as possible. -- If something goes wrong early in the test, it’s much harder to diagnose when that error becomes visible much later. -- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also assert.commandFailed that won't check the return error code, but we should always try to use assert.commandFailedWithCode to make sure the test won't pass on an unexpected error. + +- Non-deterministic tests add noise to our build system and, in general, make it harder for yourself and other engineers to determine if the system really is working correctly or not. Flaky integration tests should be considered bugs, and we should not allow them to be committed to the server codebase. One way to make jstests more deterministic is to use failpoints to force the events happening in expected order. However, if we have to use failpoints to make this test deterministic, we should consider write a unit test instead. +- Note that our fuzzer and concurrency test suites are often an exception to this rule. In those cases we sometimes give up some level of determinism in order to trigger a wider class of rare edge cases. For targeted JavaScript integration tests, however, highly deterministic tests should be the goal. + +### Think hard about all the assumptions that the test relies on. + +- For example, if a certain phase of the test ran much slower or much faster, would it cause your test to fail for the wrong reason? +- If your test includes hard-coded timeouts, make sure they are set appropriately. If a test is waiting for a certain condition to be true, and the test should not proceed until that condition is met, it is often correct to just wait “indefinitely”, instead of adding some arbitrary timeout value, like 30 seconds. In practice this usually means setting some reasonable upper limit, for example, 10 minutes. +- Also, for replication tests, make sure data exists on the right nodes at the right time. For example, if you a do a write and don’t explicitly wait for it to replicate, it might not reach a secondary node before you try to do the next step of the test. +- Does your test require data to be stored persistently? Remember that we have test variants that run on in-memory/ephemeral storage engines +- There are timeouts in the test suites and we aim to make all tests in the same suite finish before timeout. That says we should always make the test run quickly to keep the test short in terms of duration. + +### Make tests fail as early as possible. + +- If something goes wrong early in the test, it’s much harder to diagnose when that error becomes visible much later. +- Wrap every command in assert.commandWorked, or assert.commandFailedWithCode. There is also assert.commandFailed that won't check the return error code, but we should always try to use assert.commandFailedWithCode to make sure the test won't pass on an unexpected error. + ### Be aware of all the configurations and variants that your test might run under. -- Make sure that your test still works correctly if is run in a different configuration or on a different platform than the one you might have tested on. -- Varying storage engines and suites can often affect a test’s behavior. For example, maybe your test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine. You don’t have to run a new test on every possible platform before committing it, but you should be confident that your test doesn’t break in an unexpected configuration. + +- Make sure that your test still works correctly if is run in a different configuration or on a different platform than the one you might have tested on. +- Varying storage engines and suites can often affect a test’s behavior. For example, maybe your test fails unexpectedly if it runs with authentication turned on with an in-memory storage engine. You don’t have to run a new test on every possible platform before committing it, but you should be confident that your test doesn’t break in an unexpected configuration. + ### Avoid assertions that verify properties indirectly. + All assertions in a test should attempt to verify the most specific property possible. For example, if you are trying to test that a certain collection exists, it is better to assert that the collection’s exact name exists in the list of collections, as opposed to verifying that the collection count is equal to 1. The desired collection’s existence is sufficient for the collection count to be 1, but not necessary (a different collection could exist in its place). Be wary of adding these kind of indirect assertions in a test. ## Modules in Practice + We have fully migrated to the modularized JavaScript world so any new test should use modules and adapt the new style. + ### Only import/export what you need. + It's always important to keep the test context clean so we should only import/export what we need. -- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars) rule in ESLint though we haven't enforced it. -- We don't have a linter to check export since it's hard to tell the necessity, but we should only export the modules that are imported by other tests or will be needed in the future. + +- The unused import is against [no-unused-vars](https://eslint.org/docs/latest/rules/no-unused-vars) rule in ESLint though we haven't enforced it. +- We don't have a linter to check export since it's hard to tell the necessity, but we should only export the modules that are imported by other tests or will be needed in the future. + ### Declare variables in proper scope. -In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are actually introduced through `load()`. Now with modules, the scope is more clear. We can use global variables properly to setup the test and don't need to worry about polluting other tests. + +In the past, we have seen tests referring some "undeclared" or "redeclared" variables, which are actually introduced through `load()`. Now with modules, the scope is more clear. We can use global variables properly to setup the test and don't need to worry about polluting other tests. + ### Name variables properly when exporting. -To avoid naming conflicts, we should not make the name of exported variables too general which could easily conflict with another variable from the test which import your module. For example, in the following case, the module exported a variable named `alphabet` and it will lead to a re-declaration error. + +To avoid naming conflicts, we should not make the name of exported variables too general which could easily conflict with another variable from the test which import your module. For example, in the following case, the module exported a variable named `alphabet` and it will lead to a re-declaration error. + ``` import {alphabet} from "/matts/module.js"; const alphabet = "xyz"; // ERROR ``` + ### Prefer let/const over var + `let/const` should be preferred over `var` since these can help detect double declaration at the first place. Like, in the naming conflict example, if the second line is using var, it could easily mess up without throwing an error. + ### Export in ES6 style + Due to legacy, we have a lot of code that is using the old style to do export, like the following. + ``` const MyModule = (function() { function myFeature() {} @@ -57,7 +91,9 @@ function myOtherFeature() {} return {myFeature, myOtherFeature}; })(); ``` + Instead, we should use the ES6 way to do export, as follows. + ``` export function myFeature() {} export function myOtherFeature() {} @@ -65,4 +101,5 @@ export function myOtherFeature() {} // When import from test import * as MyModule from "/path/to/my_module.js"; ``` + This can help the language server to discover the methods and provide code navigation for it. diff --git a/jstests/multiVersion/README.md b/jstests/multiVersion/README.md index 035238966ad..d84fe0afd81 100644 --- a/jstests/multiVersion/README.md +++ b/jstests/multiVersion/README.md @@ -1,26 +1,31 @@ # Multiversion Tests ## Context + These tests test specific upgrade/downgrade behavior expected between -different versions of MongoDB. Some of these tests will persist indefinitely & some of these tests -will be removed upon branching. All targeted tests must go in a targeted test directory. Do not put +different versions of MongoDB. Some of these tests will persist indefinitely & some of these tests +will be removed upon branching. All targeted tests must go in a targeted test directory. Do not put any files in the multiVersion/ top-level directory. ## Generic Tests -These tests test the general functionality of upgrades/downgrades regardless -of version. These will persist indefinitely, as they should always pass regardless + +These tests test the general functionality of upgrades/downgrades regardless +of version. These will persist indefinitely, as they should always pass regardless of MongoDB version. ## Targeted Tests -These tests are specific to the current development cycle. These can/will fail after branching and + +These tests are specific to the current development cycle. These can/will fail after branching and are subject to removal during branching. ### targetedTestsLastLtsFeatures -These tests rely on a specific last-lts version. After the next major release, last-lts is a -different version than expected, so these are subject to failure. Tests in this directory will be -removed after the next major release. + +These tests rely on a specific last-lts version. After the next major release, last-lts is a +different version than expected, so these are subject to failure. Tests in this directory will be +removed after the next major release. ### targetedTestsLastContinuousFeatures -These tests rely on a specific last-continuous version. After the next minor release, -last-continuous is a different version than expected, so these are subject to failure. Tests in + +These tests rely on a specific last-continuous version. After the next minor release, +last-continuous is a different version than expected, so these are subject to failure. Tests in this directory will be removed after the next minor release. diff --git a/src/mongo/client/README.md b/src/mongo/client/README.md index 4bc8afea3c7..01fd9cf2d2c 100644 --- a/src/mongo/client/README.md +++ b/src/mongo/client/README.md @@ -1,6 +1,7 @@ # Internal Client ## Replica set monitoring and host targeting + The internal client driver responsible for routing a command request to a replica set must determine which member to target. Host targeting involves finding which nodes in a topology satisfy the $readPreference. Node eligibility depends on the type of a node (i.e primary, secondary, etc.) and @@ -33,16 +34,17 @@ through the hello response latency. Aside from the RTT, the remaining informatio read preferences is gathered through awaitable hello commands asynchronously sent to each node in the topology. - #### Code references -* [**Read Preference**](https://docs.mongodb.com/manual/core/read-preference/) -* [**ReplicaSetMonitorInterface class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/replica_set_monitor_interface.h) -* [**ReplicaSetMonitorManager class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/replica_set_monitor_manager.h) -* [**RemoteCommandTargeter class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/remote_command_targeter.h) -* [**ServerDiscoveryMonitor class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/server_discovery_monitor.cpp) -* [**ServerPingMonitor class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/server_ping_monitor.h) -* The specifications for -[**Server Discovery and Monitoring (SDAM)**](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst) -* [**TopologyDescription class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/sdam/topology_description.h) -* [**Replication Architecture Guide**](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/README.md#replication-and-topology-coordinators) + +- [**Read Preference**](https://docs.mongodb.com/manual/core/read-preference/) +- [**ReplicaSetMonitorInterface class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/replica_set_monitor_interface.h) +- [**ReplicaSetMonitorManager class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/replica_set_monitor_manager.h) +- [**RemoteCommandTargeter class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/remote_command_targeter.h) +- [**ServerDiscoveryMonitor class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/server_discovery_monitor.cpp) +- [**ServerPingMonitor class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/server_ping_monitor.h) +- The specifications for + [**Server Discovery and Monitoring (SDAM)**](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst) +- [**TopologyDescription class**](https://github.com/mongodb/mongo/blob/v4.4/src/mongo/client/sdam/topology_description.h) +- [**Replication Architecture Guide**](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/README.md#replication-and-topology-coordinators) + --- diff --git a/src/mongo/crypto/README.JWT.md b/src/mongo/crypto/README.JWT.md index c447af6ef32..42d77ebc290 100644 --- a/src/mongo/crypto/README.JWT.md +++ b/src/mongo/crypto/README.JWT.md @@ -3,22 +3,22 @@ At present, usage of JWS in MongoDB is limited to the Linux platform only and is not implemented on other platforms. Since signature validation is not available on other platforms, use of the unvalidated JWT types, while present, is not useful. -* [Glossary](#glossary) -* [`JWKManager`](#jwkmanager) -* [`JWSValidator`](#jwsvalidator) -* [`JWSValidatedToken`](#jwsvalidatedtoken) -* [Compact Serialization Format](#compact-serialization-format) +- [Glossary](#glossary) +- [`JWKManager`](#jwkmanager) +- [`JWSValidator`](#jwsvalidator) +- [`JWSValidatedToken`](#jwsvalidatedtoken) +- [Compact Serialization Format](#compact-serialization-format) ## Glossary -* **JWK** (JSON Web Key): A human readable representation of a cryptographic key. - * See [RFC 7517](https://www.rfc-editor.org/rfc/rfc7517) JSON Web Key - * Note: This library currently supports [RSA](https://www.rfc-editor.org/rfc/rfc7517#section-9.3) based keys only. -* **JWS** (JSON Web Signature): A cryptographic signature on a JWT, typically presented as a single object with the token and a header. - * See [RFC 7515](https://www.rfc-editor.org/rfc/rfc7515) JSON Web Signature - * Note: This library currently supports the [Compact Serialization](https://www.rfc-editor.org/rfc/rfc7515#section-3.1) only. -* **JWT** (JSON Web Token): A JSON object representing a number of claims such as, but not limited to: bearer identity, issuer, and validity. - * See [RFC 7519](https://www.rfc-editor.org/rfc/rfc7519) JSON Web Token +- **JWK** (JSON Web Key): A human readable representation of a cryptographic key. + - See [RFC 7517](https://www.rfc-editor.org/rfc/rfc7517) JSON Web Key + - Note: This library currently supports [RSA](https://www.rfc-editor.org/rfc/rfc7517#section-9.3) based keys only. +- **JWS** (JSON Web Signature): A cryptographic signature on a JWT, typically presented as a single object with the token and a header. + - See [RFC 7515](https://www.rfc-editor.org/rfc/rfc7515) JSON Web Signature + - Note: This library currently supports the [Compact Serialization](https://www.rfc-editor.org/rfc/rfc7515#section-3.1) only. +- **JWT** (JSON Web Token): A JSON object representing a number of claims such as, but not limited to: bearer identity, issuer, and validity. + - See [RFC 7519](https://www.rfc-editor.org/rfc/rfc7519) JSON Web Token ## JWKManager @@ -31,16 +31,16 @@ Later, when validating a client supplied token, the application will use the ### JSON Web Keys -* `JWK`: The base key material type in IDL. This parses only the `kid` (Key ID) and `kty` (Key Type) fields. - In order to expect and process key specific data, the `kty` must already be known, therefore - type specific IDL structs are defined as chaining the base `JWK` type. - * `JWKRSA`: Chains the base `JWK` type and adds expected fields `n` and `e` which represent the - modulus and public-exponent portions of the RSA key respectively. -* `JWKSet`: A simple wrapper struct containing a single field named `keys` of type `array`. - This allows the [`JWKManager`](#jwkmanager) class to load a `JWKSet` URI resource and pull out a set of keys - which are expected to conform to the `JWK` interface, and as of this writing, represent `JWKRSA` data specifically. +- `JWK`: The base key material type in IDL. This parses only the `kid` (Key ID) and `kty` (Key Type) fields. + In order to expect and process key specific data, the `kty` must already be known, therefore + type specific IDL structs are defined as chaining the base `JWK` type. + - `JWKRSA`: Chains the base `JWK` type and adds expected fields `n` and `e` which represent the + modulus and public-exponent portions of the RSA key respectively. +- `JWKSet`: A simple wrapper struct containing a single field named `keys` of type `array`. + This allows the [`JWKManager`](#jwkmanager) class to load a `JWKSet` URI resource and pull out a set of keys + which are expected to conform to the `JWK` interface, and as of this writing, represent `JWKRSA` data specifically. -These types, as well as [`JWSHeader`](#jwsheader) and [`JWT`](#jwt) can be found in [jwt\_types.idl](jwt_types.idl). +These types, as well as [`JWSHeader`](#jwsheader) and [`JWT`](#jwt) can be found in [jwt_types.idl](jwt_types.idl). ### Example JWK file @@ -48,20 +48,20 @@ A typical JSON file containing keys may look something like the following: ```json { - "keys": [ - { - "kid": "custom-key-1", - "kty": "RSA", - "n": "ALtUlNS31SzxwqMzMR9jKOJYDhHj8zZtLUYHi3s1en3wLdILp1Uy8O6Jy0Z66tPyM1u8lke0JK5gS-40yhJ-bvqioW8CnwbLSLPmzGNmZKdfIJ08Si8aEtrRXMxpDyz4Is7JLnpjIIUZ4lmqC3MnoZHd6qhhJb1v1Qy-QGlk4NJy1ZI0aPc_uNEUM7lWhPAJABZsWc6MN8flSWCnY8pJCdIk_cAktA0U17tuvVduuFX_94763nWYikZIMJS_cTQMMVxYNMf1xcNNOVFlUSJHYHClk46QT9nT8FWeFlgvvWhlXfhsp9aNAi3pX-KxIxqF2wABIAKnhlMa3CJW41323Js", - "e": "AQAB" - }, - { - "kid": "custom-key-2", - "kty": "RSA", - "n": "ANBv7-YFoyL8EQVhig7yF8YJogUTW-qEkE81s_bs2CTsI1oepDFNAeMJ-Krfx1B7yllYAYtScZGo_l60R9Ou4X89LA66bnVRWVFCp1YV1r0UWtn5hJLlAbqKseSmjdwZlL_e420GlUAiyYsiIr6wltC1dFNYyykq62RhfYhM0xpnt0HiN-k71y9A0GO8H-dFU1WgOvEYMvHmDAZtAP6RTkALE3AXlIHNb4mkOc9gwwn-7cGBc08rufYcniKtS0ZHOtD1aE2CTi1MMQMKkqtVxWIdTI3wLJl1t966f9rBHR6qVtTV8Qpq1bquUc2oaHjR4lPTf0Z_hTaELJa5-BBbvJU", - "e": "AQAB" - } - ] + "keys": [ + { + "kid": "custom-key-1", + "kty": "RSA", + "n": "ALtUlNS31SzxwqMzMR9jKOJYDhHj8zZtLUYHi3s1en3wLdILp1Uy8O6Jy0Z66tPyM1u8lke0JK5gS-40yhJ-bvqioW8CnwbLSLPmzGNmZKdfIJ08Si8aEtrRXMxpDyz4Is7JLnpjIIUZ4lmqC3MnoZHd6qhhJb1v1Qy-QGlk4NJy1ZI0aPc_uNEUM7lWhPAJABZsWc6MN8flSWCnY8pJCdIk_cAktA0U17tuvVduuFX_94763nWYikZIMJS_cTQMMVxYNMf1xcNNOVFlUSJHYHClk46QT9nT8FWeFlgvvWhlXfhsp9aNAi3pX-KxIxqF2wABIAKnhlMa3CJW41323Js", + "e": "AQAB" + }, + { + "kid": "custom-key-2", + "kty": "RSA", + "n": "ANBv7-YFoyL8EQVhig7yF8YJogUTW-qEkE81s_bs2CTsI1oepDFNAeMJ-Krfx1B7yllYAYtScZGo_l60R9Ou4X89LA66bnVRWVFCp1YV1r0UWtn5hJLlAbqKseSmjdwZlL_e420GlUAiyYsiIr6wltC1dFNYyykq62RhfYhM0xpnt0HiN-k71y9A0GO8H-dFU1WgOvEYMvHmDAZtAP6RTkALE3AXlIHNb4mkOc9gwwn-7cGBc08rufYcniKtS0ZHOtD1aE2CTi1MMQMKkqtVxWIdTI3wLJl1t966f9rBHR6qVtTV8Qpq1bquUc2oaHjR4lPTf0Z_hTaELJa5-BBbvJU", + "e": "AQAB" + } + ] } ``` @@ -75,10 +75,10 @@ as they are received from third parties such as connected clients. Platform specific implementations of the cryptographic functions may be found in: -* Linux: [jws\_validator\_openssl.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_openssl.cpp) -* Windows: [jws\_validator\_windows.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_windows.cpp) UNIMPLEMENTED -* macOS: [jws\_validator\_apple.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_apple.cpp) UNIMPLEMENTED -* Non-TLS builds: [jws\_validator\_none.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_none.cpp) UNIMPLEMENTED +- Linux: [jws_validator_openssl.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_openssl.cpp) +- Windows: [jws_validator_windows.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_windows.cpp) UNIMPLEMENTED +- macOS: [jws_validator_apple.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_apple.cpp) UNIMPLEMENTED +- Non-TLS builds: [jws_validator_none.cpp](https://github.com/mongodb/mongo/blob/master/src/mongo/crypto/jws_validator_none.cpp) UNIMPLEMENTED ## JWSValidatedToken @@ -88,12 +88,13 @@ maintaining a type on the post-processed token as well. An application will construct a `JWSValidatedToken` by passing both a signed [`JWS Compact Serialization`](#compact-serialization-format) and a [`JWKManager`](#jwkmanager). + 1. The token's header is parsed using IDL type [`JWSHeader`](#jwsheader) to determine the `kid` (Key ID) which was used for signing. 2. The [`JWKManager`](#jwkmanager) is queried for a suitable [`JWSValidator`](#jwsvalidator). - 1. If the requested `kid` is unknown to the [`JWKManager`](#jwkmanager), it will requery its `JWKSet` URI to reload from the key server. -3. That validator is used to check the provided signature against the header and body payload. -4. The body of the token is parsed using IDL type [`JWT`](#jwt). -5. Relevant validity claims `nbf` ([Not before](https://www.rfc-editor.org/rfc/rfc7519.html#section-4.1.5)) and `exp` ([Expires At](https://www.rfc-editor.org/rfc/rfc7519.html#section-4.1.4)) are verified. +3. If the requested `kid` is unknown to the [`JWKManager`](#jwkmanager), it will requery its `JWKSet` URI to reload from the key server. +4. That validator is used to check the provided signature against the header and body payload. +5. The body of the token is parsed using IDL type [`JWT`](#jwt). +6. Relevant validity claims `nbf` ([Not before](https://www.rfc-editor.org/rfc/rfc7519.html#section-4.1.5)) and `exp` ([Expires At](https://www.rfc-editor.org/rfc/rfc7519.html#section-4.1.4)) are verified. If, at any point during this construction, an error is encountered, a `DBException` will be thrown and no `JWSValidatedToken` will be created. @@ -104,7 +105,7 @@ To access fields, use the `getBody()`/`getBodyBSON()` accessors on an as-needed from the `JWSValidatedToken` object rather than storing these values by themselves. This ensures that the data being used has been validated. -**Directly parsing the signed JWS compact serialization using *any* other means +**Directly parsing the signed JWS compact serialization using _any_ other means than `JWSValidatedToken` should be considered an error and rejected during code review.** ## Compact Serialization Format @@ -126,7 +127,7 @@ The first section in our example is the `base64url::encode()` output of the `JWS represented as `JSON`, so it would decode as: ```json -{ "typ": "JWT", "alg": "RS256", "kid": "custom-key-1" } +{"typ": "JWT", "alg": "RS256", "kid": "custom-key-1"} ``` This tells us that the payload in the second field of the compact serialized signature is a `JWT` @@ -134,7 +135,8 @@ This tells us that the payload in the second field of the compact serialized sig We also see that the `alg`orithm for signing uses [`RS256`](https://www.rfc-editor.org/rfc/rfc7518.html#section-3.3) (`RSASSA-PKCS1-v1_5 using SHA-256`) and that the signature can be verified using the key material associated with `custom-key-1`. The IDL struct `JWSHeader` may be used to parse this section for access to the relevant `alg` and `kid` fields. -* See also: [RFC 7515 Section 4](https://www.rfc-editor.org/rfc/rfc7515#section-4) JOSE Header + +- See also: [RFC 7515 Section 4](https://www.rfc-editor.org/rfc/rfc7515#section-4) JOSE Header ### JWT @@ -148,20 +150,17 @@ So in this example, it would decode as: "sub": "user1@mongodb.com", "nbf": 1661374077, "exp": 2147483647, - "aud": [ - "jwt@kernel.mongodb.com" - ], + "aud": ["jwt@kernel.mongodb.com"], "nonce": "gdfhjj324ehj23k4", - "mongodb-roles": [ - "myReadRole" - ] + "mongodb-roles": ["myReadRole"] } ``` The IDL struct `JWT` will be used to parse this section by [`JWSValidatedToken`](#jwsvalidatedtoken), however token payloads SHOULD NOT be inspected without processing them through the validating infrastructure. -* See [RFC 7519 Section 4](https://www.rfc-editor.org/rfc/rfc7519#section-4) JWT Claims + +- See [RFC 7519 Section 4](https://www.rfc-editor.org/rfc/rfc7519#section-4) JWT Claims Note that this token payload contains an additional field not defined by any RFC or ietf-draft. The content of this, or any other unknown fields is treated as opaque and ignored by diff --git a/src/mongo/crypto/README.md b/src/mongo/crypto/README.md index e2088ca48b1..49f2313e1a3 100644 --- a/src/mongo/crypto/README.md +++ b/src/mongo/crypto/README.md @@ -1,3 +1,3 @@ # MongoDB Crypto library architecture guides -* [JSON Web Tokens (JWT)](README.JWT.md) +- [JSON Web Tokens (JWT)](README.JWT.md) diff --git a/src/mongo/db/README.md b/src/mongo/db/README.md index 2afb92736d4..91042ff42e4 100644 --- a/src/mongo/db/README.md +++ b/src/mongo/db/README.md @@ -1,13 +1,12 @@ # MongoDB Internals -This document aims to provide a high-level specification for the MongoDB's -infrastructure to support client/server interaction and process globals. -Examples for such components are `ServiceContext` and `OperationContext`. +This document aims to provide a high-level specification for the MongoDB's +infrastructure to support client/server interaction and process globals. +Examples for such components are `ServiceContext` and `OperationContext`. This is a work in progress and more sections will be added gradually. ## Server-Internal Baton Pattern -For details on the server-internal *Baton* pattern, see [this document][baton]. +For details on the server-internal _Baton_ pattern, see [this document][baton]. [baton]: ../../../docs/baton.md - diff --git a/src/mongo/db/README_shard_role_api.md b/src/mongo/db/README_shard_role_api.md index 50ff8364db4..4293d411336 100644 --- a/src/mongo/db/README_shard_role_api.md +++ b/src/mongo/db/README_shard_role_api.md @@ -3,19 +3,20 @@ ## Shard Role API Any code that accesses data collections with the intention to read or write is said to be operating -in the _Shard Role_. This contrasts with _Router Role_ operations, which do not access data +in the _Shard Role_. This contrasts with _Router Role_ operations, which do not access data collections directly — they only route operations to the appropriate shard. -Shard Role operations are sharding-aware and thus require establishing a consistent view of the _storage engine_, _local catalog_ +Shard Role operations are sharding-aware and thus require establishing a consistent view of the _storage engine_, _local catalog_ and _sharding catalog_. The storage engine contains the "data". The local catalog contains shard-local metadata such as indexes and storage options. The sharding catalog contains the sharding -description (whether the collection is sharded, its shard key pattern, etc.) and the +description (whether the collection is sharded, its shard key pattern, etc.) and the ownership filter (which shard key ranges are owned by this shard). -Shard Role operations are also responsible for validating routing decisions taken by possibly-stale +Shard Role operations are also responsible for validating routing decisions taken by possibly-stale upstream routers. ## Acquiring collections + [shard_role.h provides](https://github.com/mongodb/mongo/blob/23c92c3cca727209a68e22d2d9cabe46bac11bb1/src/mongo/db/shard_role.h#L333-L375) the `acquireCollection*` family of primitives to acquire a consistent view of the catalogs for collections and views. Shard role code is required to use these primitives to access collections/views. @@ -51,31 +52,38 @@ CollectionOrViewAcquisitions acquireCollectionsOrViewsMaybeLockFree( ``` The dimensions of this family of methods are: -- Collection/View: Whether the caller is okay with the namespace potentially corresponding to a view or not. -- Locks/MaybeLockFree: The "MaybeLockFree" variant will skip acquiring locks if it is allowed given the opCtx state. It must be only used for read operations. An operation is allowed to skip locks if all the following conditions are met: - - (i) it's not part of a multi-document transaction, - - (ii) it is not already holding write locks, - - (iii) does not already have a non-lock-free storage transaction open. - - The normal variant acquires locks. -- One or multiple acquisitions: The "plural" variants allow acquiring multiple collections/views in a single call. Acquiring multiple collections in the same acquireCollections call prevents the global lock from getting recursively locked, which would impede yielding. + +- Collection/View: Whether the caller is okay with the namespace potentially corresponding to a view or not. +- Locks/MaybeLockFree: The "MaybeLockFree" variant will skip acquiring locks if it is allowed given the opCtx state. It must be only used for read operations. An operation is allowed to skip locks if all the following conditions are met: + + - (i) it's not part of a multi-document transaction, + - (ii) it is not already holding write locks, + - (iii) does not already have a non-lock-free storage transaction open. + + The normal variant acquires locks. + +- One or multiple acquisitions: The "plural" variants allow acquiring multiple collections/views in a single call. Acquiring multiple collections in the same acquireCollections call prevents the global lock from getting recursively locked, which would impede yielding. For each collection/view the caller desires to acquire, `CollectionAcquisitionRequest`/`CollectionOrViewAcquisitionRequest` represents the prerequisites for it, which are: -- `nsOrUUID`: The NamespaceString or uuid of the desired collection/view. -- `placementConcern`: The sharding placementConcern, also known as ShardVersion and DatabaseVersion, that the router attached. -- `operationType`: Whether we are acquiring this collection for reading (`kRead`) or for writing (`kWrite`). `kRead` operations will keep the same orphan filter and range preserver across yields. This way, even if chunk migrations commit, the query plan is guaranteed to keep seeing the documents for the owned ranges at the time the query started. -- Optionally, `expectedUUID`: for requests where `nsOrUUID` takes the NamespaceString form, this is the UUID we expect the collection to have. + +- `nsOrUUID`: The NamespaceString or uuid of the desired collection/view. +- `placementConcern`: The sharding placementConcern, also known as ShardVersion and DatabaseVersion, that the router attached. +- `operationType`: Whether we are acquiring this collection for reading (`kRead`) or for writing (`kWrite`). `kRead` operations will keep the same orphan filter and range preserver across yields. This way, even if chunk migrations commit, the query plan is guaranteed to keep seeing the documents for the owned ranges at the time the query started. +- Optionally, `expectedUUID`: for requests where `nsOrUUID` takes the NamespaceString form, this is the UUID we expect the collection to have. If the prerequisites can be met, then the acquisition will succeed and one or multiple `CollectionAcquisition`/`ViewAcquisition` objects are returned. These objects are the entry point for accessing the catalog information, including: -- CollectionPtr: The local catalog. -- CollectionDescription: The sharding catalog. -- ShardingOwnershipFilter: Used to filter out orphaned documents. + +- CollectionPtr: The local catalog. +- CollectionDescription: The sharding catalog. +- ShardingOwnershipFilter: Used to filter out orphaned documents. Additionally, these objects hold several resources during their lifetime: -- For locked acquisitions, the locks. -- For sharded collections, the _RangePreserver_, which prevents documents that became orphans after having established the collectionAcquisition from being deleted. + +- For locked acquisitions, the locks. +- For sharded collections, the _RangePreserver_, which prevents documents that became orphans after having established the collectionAcquisition from being deleted. As an example: + ``` CollectionAcquisition collection = acquireCollection(opCtx, @@ -91,24 +99,29 @@ collection.getShardingFilter(); ``` ## TransactionResources + `CollectionAcquisition`/`CollectionOrViewAcquisition` are reference-counted views to a `TransactionResources` object. `TransactionResources` is the holder of the acquisition's resources, which include the global/db/collection locks (in case of a locked acquisition), the local catalog snapshot (collectionPtr), the sharding catalog snapshot (collectionDescription) and ownershipFilter. Copying a `CollectionAcquisition`/`CollectionOrViewAcquisition` object increases its associated `TransactionResources` reference counter. When it reaches zero, the resources are released. ## Acquisitions and query plans + Query plans are to use `CollectionAcquisitions` as the sole entry point to access the different catalogs (e.g. to access a CollectionPtr, to get the sharding description or the orphan filter). Plans should never store references to the catalogs because they can become invalid after a yield. Upon restore, they will find the `CollectionAcquisition` in a valid state. ## Yielding and restoring + `TransactionResources` can be detached from its current operation context and later attached to a different one -- this is the case for getMore. Acquisitions associated with a particular `TransactionResources` object must only be used by that operation context. shard_role.h provides primitives for yielding and restoring. There are two different types of yields: One where the operation will resume on the same operation context (e.g. an update write operation), and the other where they will be restored to a different operation context (e.g. a getMore). The restore procedure checks that the acquisition prerequisites are still met, namely: -- That the collection still exists and has not been renamed. -- That the sharding placement concern can still be met. For `kWrite` acquisitions, this means that the shard version has not changed. This can be relaxed for `kRead` acquisitions: It is allowed that the shard version changes, because the RangePreserver guarantees that all documents corresponding to that placement version are still on the shard. + +- That the collection still exists and has not been renamed. +- That the sharding placement concern can still be met. For `kWrite` acquisitions, this means that the shard version has not changed. This can be relaxed for `kRead` acquisitions: It is allowed that the shard version changes, because the RangePreserver guarantees that all documents corresponding to that placement version are still on the shard. ### Yield and restore to the same operation context + [`yieldTransactionResourcesFromOperationContext`](https://github.com/mongodb/mongo/blob/2e0259b3050e4c27d47e353222395d21bb80b9e4/src/mongo/db/shard_role.h#L442-L453) yields the resources associated with the acquisition, yielding its locks, and returns a `YieldedTransactionResources` object holding the yielded resources. After that call, @@ -138,11 +151,13 @@ myPlanExecutor.getNext(); ``` ### Yield and restore to a different operation context + Operations that build a plan executor and return a cursor to be consumed over repeated getMore commands do so by stashing its resources to the CursorManager. [`stashTransactionResourcesFromOperationContext`](https://github.com/mongodb/mongo/blob/2e0259b3050e4c27d47e353222395d21bb80b9e4/src/mongo/db/shard_role.h#L512-L516) yields the `TransactionResources` and detaches it from the current operation context. The yielded TransactionResources are stashed to to the CursorManager. When executing a getMore, the yielded TransactionResources is retrieved from the CursorManager and attached to the new operation context. This is done by constructing the `HandleTransactionResourcesFromCursor` RAII object. Its destructor will re-stash the TransactionResources back to the CursorManager. In case of failure during getMore, `HandleTransactionResourcesFromCursor::dismissRestoredResources()` must be called to dismiss its resources. As an example, build a PlanExectuor and stash it to the CursorManager: + ``` CollectionAcquisition collection = acquireCollection(opCtx1, CollectionAcquisitionRequest(nss, kRead, ...)); @@ -159,11 +174,13 @@ stashTransactionResourcesFromOperationContext(opCtx, pinnedCursor.getCursor()); // [Command ends] ``` + And now getMore consumes more documents from the cursor: + ``` // -------- // [getMore command] -auto cursorPin = uassertStatusOK(CursorManager::get(opCtx2)->pinCursor(opCtx2, cursorId)); +auto cursorPin = uassertStatusOK(CursorManager::get(opCtx2)->pinCursor(opCtx2, cursorId)); // Restore the stashed TransactionResources to the current opCtx. HandleTransactionResourcesFromCursor transactionResourcesHandler(opCtx2, cursorPin.getCursor()); @@ -175,4 +192,3 @@ while (...) { // ~HandleTransactionResourcesFromCursor will re-stash the TransactionResources to 'cursorPin' ``` - diff --git a/src/mongo/db/STABLE_API_README.md b/src/mongo/db/STABLE_API_README.md index 11da90707b3..1017487a0a9 100644 --- a/src/mongo/db/STABLE_API_README.md +++ b/src/mongo/db/STABLE_API_README.md @@ -16,46 +16,45 @@ API versions. For any API version V the following changes are prohibited, and must be introduced in a new API version W. -- Remove StableCommand (where StableCommand is some command in V). -- Remove a documented StableCommand parameter. -- Prohibit a formerly permitted StableCommand parameter value. -- Remove a field from StableCommand's reply. -- Change the type of a field in StableCommand's reply, or expand the set of types it may be. -- Add a new value to a StableCommand reply field's enum-like fixed set of values, e.g. a new index - type (unless there's an opt-in mechanism besides API version). -- Change semantics of StableCommand in a manner that may cause existing applications to misbehave. -- Change an error code returned in a particular error scenario, if drivers rely on the code. -- Remove a label L from an error returned in a particular error scenario which had returned an error - labeled with L before. -- Prohibit any currently permitted CRUD syntax element, including but not limited to query and - aggregation operators, aggregation stages and expressions, and CRUD operators. -- Remove support for a BSON type, or any other BSON format change (besides adding a type). -- Drop support for a wire protocol message type. -- Making the authorization requirements for StableCommand more restrictive. -- Increase hello.minWireVersion (or decrease maxWireVersion, which we won't do). +- Remove StableCommand (where StableCommand is some command in V). +- Remove a documented StableCommand parameter. +- Prohibit a formerly permitted StableCommand parameter value. +- Remove a field from StableCommand's reply. +- Change the type of a field in StableCommand's reply, or expand the set of types it may be. +- Add a new value to a StableCommand reply field's enum-like fixed set of values, e.g. a new index + type (unless there's an opt-in mechanism besides API version). +- Change semantics of StableCommand in a manner that may cause existing applications to misbehave. +- Change an error code returned in a particular error scenario, if drivers rely on the code. +- Remove a label L from an error returned in a particular error scenario which had returned an error + labeled with L before. +- Prohibit any currently permitted CRUD syntax element, including but not limited to query and + aggregation operators, aggregation stages and expressions, and CRUD operators. +- Remove support for a BSON type, or any other BSON format change (besides adding a type). +- Drop support for a wire protocol message type. +- Making the authorization requirements for StableCommand more restrictive. +- Increase hello.minWireVersion (or decrease maxWireVersion, which we won't do). The following changes are permitted in V: -- Add a command. -- Add an optional command parameter. -- Permit a formerly prohibited command parameter or parameter value. -- Any change in an undocumented command parameter. -- Change any aspect of internal sharding/replication/etc. protocols. -- Add a command reply field. -- Add a new error code (provided this does not break compatibility with existing drivers and - applications). -- Add a label to an error. -- Change order of fields in reply docs and sub-docs. -- Add a CRUD syntax element. -- Making the authorization requirements for StableCommand less restrictive. -- Add and dropping support for an authentication mechanism. Authenticate mechanisms may need to be - removed due to security vulnerabilties and as such, there is no guarantee about their - compatibility. -- Deprecate a behavior -- Increase hello.maxWireVersion. -- Any change in behaviors not in V. -- Performance changes. - +- Add a command. +- Add an optional command parameter. +- Permit a formerly prohibited command parameter or parameter value. +- Any change in an undocumented command parameter. +- Change any aspect of internal sharding/replication/etc. protocols. +- Add a command reply field. +- Add a new error code (provided this does not break compatibility with existing drivers and + applications). +- Add a label to an error. +- Change order of fields in reply docs and sub-docs. +- Add a CRUD syntax element. +- Making the authorization requirements for StableCommand less restrictive. +- Add and dropping support for an authentication mechanism. Authenticate mechanisms may need to be + removed due to security vulnerabilties and as such, there is no guarantee about their + compatibility. +- Deprecate a behavior +- Increase hello.maxWireVersion. +- Any change in behaviors not in V. +- Performance changes. ### Enforcing Compatibility @@ -72,49 +71,55 @@ from 5.0.0 onwards. This compatibility checker script will run in evergreen patc and in the commit queue. The script that evergreen runs is [here](https://github.com/mongodb/mongo/blob/4594ea6598ce28d01c5c5d76164b1cfeeba1494f/evergreen/check_idl_compat.sh). ### Running the Compatibility Checker Locally + To run the compatibility checker locally, first run + ``` python buildscripts/idl/checkout_idl_files_from_past_releases.py -v idls ``` + This creates subfolders of past releases in the `idls` folder. Then, for the old release you want to check against, run + ``` python buildscripts/idl/idl_check_compatibility.py -v --old-include idls//src --old-include idls//src/mongo/db/modules/enterprise/src --new-include src --new-include src/mongo/db/modules/enterprise/src idls//src src ``` For example: + ``` python buildscripts/idl/idl_check_compatibility.py -v --old-include idls/r6.0.3/src --old-include idls/r6.0.3/src/mongo/db/modules/enterprise/src --new-include src --new-include src/mongo/db/modules/enterprise/src idls/r6.0.3/src src ``` ## Adding new commands and fields -***Any additions to the Stable API must be approved by the Stable API PM and code reviewed by the -Query Optimization Team.*** -Adding a new IDL command requires the `api_version` field, which indicates which Stable API version -this command is in. ***By default, the `api_version` field should be `""`.*** Only if you are adding the -command to the Stable API, then `api_version` should be the API version you are adding it to -(currently `"1"`). ***By adding it to the Stable API, this means you cannot remove this -command within this API version.*** +**_Any additions to the Stable API must be approved by the Stable API PM and code reviewed by the +Query Optimization Team._** + +Adding a new IDL command requires the `api_version` field, which indicates which Stable API version +this command is in. **_By default, the `api_version` field should be `""`._** Only if you are adding the +command to the Stable API, then `api_version` should be the API version you are adding it to +(currently `"1"`). **_By adding it to the Stable API, this means you cannot remove this +command within this API version._** Adding a new command parameter or reply field requires the `stability` field. This field indicates whether the command parameter/reply field is part of the Stable API. There are three options for -field: `unstable`, `internal`, and `stable`. If you are unsure what the `stability` field for the -new command parameter or reply field should be, it ***should be marked as `stability: unstable`***. +field: `unstable`, `internal`, and `stable`. If you are unsure what the `stability` field for the +new command parameter or reply field should be, it **_should be marked as `stability: unstable`_**. -Only if the field should be added to the Stable API, then you should mark the field as +Only if the field should be added to the Stable API, then you should mark the field as `stability: stable`in IDL. Additionally, in `idl_check_compatibility.py` you must add the field to the `ALLOWED_STABLE_FIELDS_LIST`. This list was added so that engineers are aware that by making a -field part of the stable API, ***the field cannot be changed in any way that would violate the -Stable API guidelines*** (see [above](https://github.com/mongodb/mongo/blob/master/src/mongo/db/STABLE_API_README.md#compatibility)). -Crucially, this means the field ***cannot be removed or changed to `stability: unstable` or -`stability: internal`*** while we are in the current API version. +field part of the stable API, **_the field cannot be changed in any way that would violate the +Stable API guidelines_** (see [above](https://github.com/mongodb/mongo/blob/master/src/mongo/db/STABLE_API_README.md#compatibility)). +Crucially, this means the field **_cannot be removed or changed to `stability: unstable` or +`stability: internal`_** while we are in the current API version. The format of adding a field to the list is `--`. ### `stability: unstable` vs. `stability: internal` -If the field should not be part of the Stable API, it should be marked as either +If the field should not be part of the Stable API, it should be marked as either `stability: unstable` or `stability: internal`. Both of these mean that the field will not be a part of the Stable API. The difference is that when we send commands from a mongos to a shard, the shard will perform parsing validation that checks that all the command fields are part of the Stable API, @@ -125,15 +130,16 @@ marked as `stability: unstable`, unless it will go through this parsing validati should be marked as `stability: internal`. ### `IGNORE_STABLE_TO_UNSTABLE_LIST` + The `IGNORE_STABLE_TO_UNSTABLE_LIST` exists because there have been cases where a field was added to the Stable API accidentally, and since the field was strictly internal / not documented to users, we changed the field to be unstable. (Note that these kinds of changes have to go through the same approval process.) Normally changing a field from `stability: stable` to `stability: unstable` or `stability: internal` would throw an error, so the `IGNORE_STABLE_TO_UNSTABLE_LIST` acts as an allow -list for these exceptions. +list for these exceptions. -***Additions to the `IGNORE_STABLE_TO_UNSTABLE_LIST` must be approved by the Stable API PM and code -reviewed by the Query Optimization Team.*** +**_Additions to the `IGNORE_STABLE_TO_UNSTABLE_LIST` must be approved by the Stable API PM and code +reviewed by the Query Optimization Team._** ### The BSON serialization `any` type @@ -141,7 +147,7 @@ The `bson_serialization_type` is used to define the BSON type that an IDL field In some cases, we need custom serializers defined in C++ to perform more complex logic, such as validating the given type or accepting multiple types for the field. If we use these custom serializers, we specify the `bson_serialization_type` to be `any`. However, the compatibility -checker script can’t type check `any` , since the main logic for the type exists outside of the +checker script can’t type check `any` , since the main logic for the type exists outside of the IDL file. As many commands have valid reasons for using type `any`, we do not restrict usage. Instead, the command must be added to an [allowlist](https://github.com/mongodb/mongo/blob/6aaad044a819a50a690b932afeda9aa278ba0f2e/buildscripts/idl/idl_check_compatibility.py#L52). This also applies to any fields marked as `stability: unstable`. This is to prevent unexpected @@ -191,7 +197,7 @@ Rules for feature compatibility version and API version: ### Rule 1 **The first release to support an API version W can add W in its upgraded FCV, but cannot add W in - its downgraded FCV.** +its downgraded FCV.** Some API versions will introduce behaviors that require disk format changes or intracluster protocol changes that don't take effect until setFCV("R"), so for consistency, we always wait for setFCV("R") @@ -200,7 +206,7 @@ before supporting a new API version. ### Rule 2 **So that applications can upgrade without downtime from V to W, at least one release must support - both V and W in its upgraded FCV.** +both V and W in its upgraded FCV.** This permits zero-downtime API version upgrades. If release R in its upgraded FCV "R" supports both V and W, the customer can first upgrade to R with FCV "R" while their application is running with diff --git a/src/mongo/db/auth/README.md b/src/mongo/db/auth/README.md index 5a2e1255aa4..4a7b257e90f 100644 --- a/src/mongo/db/auth/README.md +++ b/src/mongo/db/auth/README.md @@ -2,41 +2,41 @@ ## Table of Contents -- [High Level Overview](#high-level-overview) -- [Authentication](#authentication) - - [SASL](#sasl) - - [Speculative Auth](#speculative-authentication) - - [SASL Supported Mechs](#sasl-supported-mechs) - - [X509 Authentication](#x509-authentication) - - [Cluster Authentication](#cluster-authentication) - - [X509 Intracluster Auth](#x509-intracluster-auth-and-member-certificate-rotation) - - [Keyfile Intracluster Auth](#keyfile-intracluster-auth) - - [Localhost Auth Bypass](#localhost-auth-bypass) -- [Authorization](#authorization) - - [AuthName](#authname) (`UserName` and `RoleName`) - - [Users](#users) - - [User Roles](#user-roles) - - [User Credentials](#user-credentials) - - [User Authentication Restrictions](#user-authentication-restrictions) - - [Roles](#roles) - - [Role subordinate Roles](#role-subordinate-roles) - - [Role Privileges](#role-privileges) - - [Role Authentication Restrictions](#role-authentication-restrictions) - - [User and Role Management](#user-and-role-management) - - [UMC Transactions](#umc-transactions) - - [Privilege](#privilege) - - [ResourcePattern](#resourcepattern) - - [ActionType](#actiontype) - - [Command Execution](#command-execution) - - [Authorization Caching](#authorization-caching) - - [Authorization Manager External State](#authorization-manager-external-state) - - [Types of Authorization](#types-of-authorization) - - [Local Authorization](#local-authorization) - - [LDAP Authorization](#ldap-authorization) - - [X.509 Authorization](#x509-authorization) - - [Cursors and Operations](#cursors-and-operations) - - [Contracts](#contracts) -- [External References](#external-references) +- [High Level Overview](#high-level-overview) +- [Authentication](#authentication) + - [SASL](#sasl) + - [Speculative Auth](#speculative-authentication) + - [SASL Supported Mechs](#sasl-supported-mechs) + - [X509 Authentication](#x509-authentication) + - [Cluster Authentication](#cluster-authentication) + - [X509 Intracluster Auth](#x509-intracluster-auth-and-member-certificate-rotation) + - [Keyfile Intracluster Auth](#keyfile-intracluster-auth) + - [Localhost Auth Bypass](#localhost-auth-bypass) +- [Authorization](#authorization) + - [AuthName](#authname) (`UserName` and `RoleName`) + - [Users](#users) + - [User Roles](#user-roles) + - [User Credentials](#user-credentials) + - [User Authentication Restrictions](#user-authentication-restrictions) + - [Roles](#roles) + - [Role subordinate Roles](#role-subordinate-roles) + - [Role Privileges](#role-privileges) + - [Role Authentication Restrictions](#role-authentication-restrictions) + - [User and Role Management](#user-and-role-management) + - [UMC Transactions](#umc-transactions) + - [Privilege](#privilege) + - [ResourcePattern](#resourcepattern) + - [ActionType](#actiontype) + - [Command Execution](#command-execution) + - [Authorization Caching](#authorization-caching) + - [Authorization Manager External State](#authorization-manager-external-state) + - [Types of Authorization](#types-of-authorization) + - [Local Authorization](#local-authorization) + - [LDAP Authorization](#ldap-authorization) + - [X.509 Authorization](#x509-authorization) + - [Cursors and Operations](#cursors-and-operations) + - [Contracts](#contracts) +- [External References](#external-references) ## High Level Overview @@ -63,7 +63,7 @@ user credentials and roles. The authorization session is then used to check perm ## Authentication On a server with authentication enabled, all but a small handful of commands require clients to -authenticate before performing any action. This typically occurs with a 1 to 3 round trip +authenticate before performing any action. This typically occurs with a 1 to 3 round trip conversation using the `saslStart` and `saslContinue` commands, or though a single call to the `authenticate` command. See [SASL](#SASL) and [X.509](#X509) below for the details of these exchanges. @@ -109,8 +109,8 @@ encountered. To reduce connection overhead time, clients may begin and possibly complete their authentication exchange as part of the -[`CmdHello`]((https://github.com/mongodb/mongo/blob/r4.7.0/src/mongo/db/repl/replication_info.cpp#L234)) -exchange. In this mode, the body of the `saslStart` or `authenticate` command used for +[`CmdHello`](<(https://github.com/mongodb/mongo/blob/r4.7.0/src/mongo/db/repl/replication_info.cpp#L234)>) +exchange. In this mode, the body of the `saslStart` or `authenticate` command used for authentication may be embedded into the `hello` command under the field `{speculativeAuthenticate: $bodyOfAuthCmd}`. @@ -122,82 +122,82 @@ included in the `hello` command response. #### SASL Supported Mechs When using the [SASL](#SASL) authentication workflow, it is necessary to select a specific mechanism -to authenticate with (e.g. SCRAM-SHA-1, SCRAM-SHA-256, PLAIN, GSSAPI, etc...). If the user has not +to authenticate with (e.g. SCRAM-SHA-1, SCRAM-SHA-256, PLAIN, GSSAPI, etc...). If the user has not included the mechanism in the mongodb:// URI, then the client can ask the server what mechanisms are available on a per-user basis before attempting to authenticate. Therefore, during the initial handshake using [`CmdHello`](https://github.com/mongodb/mongo/blob/r4.7.0/src/mongo/db/repl/replication_info.cpp#L234), the client will notify the server of the user it is about to authenticate by including -`{saslSupportedMechs: 'username'}` with the `hello` command. The server will then include +`{saslSupportedMechs: 'username'}` with the `hello` command. The server will then include `{saslSupportedMechs: [$listOfMechanisms]}` in the `hello` command's response. This allows clients to proceed with authentication by choosing an appropriate mechanism. The different named SASL mechanisms are listed below. If a mechanism can use a different storage method, the storage mechanism is listed as a sub-bullet below. -- [**SCRAM-SHA-1**](https://tools.ietf.org/html/rfc5802) - - See the section on `SCRAM-SHA-256` for details on `SCRAM`. `SCRAM-SHA-1` uses `SHA-1` for the - hashing algorithm. -- [**SCRAM-SHA-256**](https://tools.ietf.org/html/rfc7677) - - `SCRAM` stands for Salted Challenge Response Authentication Mechanism. `SCRAM-SHA-256` implements - the `SCRAM` protocol and uses `SHA-256` as a hashing algorithm to complement it. `SCRAM` - involves four steps, a client and server first, and a client and server final. During the client - first, the client sends the username for lookup. The server uses the username to retrieve the - relevant authentication information for the client. This generally includes the salt, StoredKey, - ServerKey, and iteration count. The client then computes a set of values (defined in [section - 3](https://tools.ietf.org/html/rfc5802#section-3) of the `SCRAM` RFC), most notably the client - proof and the server signature. It sends the client proof (used to authenticate the client) to - the server, and the server then responds by sending the server proof. The hashing function used - to hash the client password that is stored by the server is what differentiates `SCRAM-SHA-1` vs - `SCRAM-SHA-256`, `SHA-1` is used in `SCRAM-SHA-1`. `SCRAM-SHA-256` is the preferred mechanism - over `SCRAM-SHA-1`. Note also that `SCRAM-SHA-256` performs [RFC 4013 SASLprep Unicode - normalization](https://tools.ietf.org/html/rfc4013) on all provided passwords before hashing, - while for backward compatibility reasons, `SCRAM-SHA-1` does not. -- [**PLAIN**](https://tools.ietf.org/html/rfc4616) - - The `PLAIN` mechanism involves two steps for authentication. First, the client concatenates a - message using the authorization id, the authentication id (also the username), and the password - for a user and sends it to the server. The server validates that the information is correct and - authenticates the user. For storage, the server hashes one copy using SHA-1 and another using - SHA-256 so that the password is not stored in plaintext. Even when using the PLAIN mechanism, - the same secrets as used for the SCRAM methods are stored and used for validation. The chief - difference between using PLAIN and SCRAM-SHA-256 (or SCRAM-SHA-1) is that using SCRAM provides - mutual authentication and avoids transmitting the password to the server. With PLAIN, it is - less difficult for a MitM attacker to compromise original credentials. - - **With local users** - - When the PLAIN mechanism is used with internal users, the user information is stored in the - [user - collection](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authorization_manager.cpp#L56) - on the database. See [authorization](#authorization) for more information. - - **With Native LDAP** - - When the PLAIN mechanism uses `Native LDAP`, the credential information is sent to and - received from LDAP when creating and authorizing a user. The mongo server sends user - credentials over the wire to the LDAP server and the LDAP server requests a password. The - mongo server sends the password in plain text and LDAP responds with whether the password is - correct. Here the communication with the driver and the mongod is the same, but the storage - mechanism for the credential information is different. - - **With Cyrus SASL / saslauthd** - - When using saslauthd, the mongo server communicates with a process called saslauthd running on - the same machine. The saslauthd process has ways of communicating with many other servers, - LDAP servers included. Saslauthd works in the same way as Native LDAP except that the - mongo process communicates using unix domain sockets. -- [**GSSAPI**](https://tools.ietf.org/html/rfc4752) - - GSSAPI is an authentication mechanism that supports [Kerberos](https://web.mit.edu/kerberos/) - authentication. GSSAPI is the communication method used to communicate with Kerberos servers and - with clients. When initializing this auth mechanism, the server tries to acquire its credential - information from the KDC by calling - [`tryAcquireServerCredential`](https://github.com/10gen/mongo-enterprise-modules/blob/r4.4.0/src/sasl/mongo_gssapi.h#L36). - If this is not approved, the server fasserts and the mechanism is not registered. On Windows, - SChannel provides a `GSSAPI` library for the server to use. On other platforms, the Cyrus SASL - library is used to make calls to the KDC (Kerberos key distribution center). +- [**SCRAM-SHA-1**](https://tools.ietf.org/html/rfc5802) + - See the section on `SCRAM-SHA-256` for details on `SCRAM`. `SCRAM-SHA-1` uses `SHA-1` for the + hashing algorithm. +- [**SCRAM-SHA-256**](https://tools.ietf.org/html/rfc7677) + - `SCRAM` stands for Salted Challenge Response Authentication Mechanism. `SCRAM-SHA-256` implements + the `SCRAM` protocol and uses `SHA-256` as a hashing algorithm to complement it. `SCRAM` + involves four steps, a client and server first, and a client and server final. During the client + first, the client sends the username for lookup. The server uses the username to retrieve the + relevant authentication information for the client. This generally includes the salt, StoredKey, + ServerKey, and iteration count. The client then computes a set of values (defined in [section + 3](https://tools.ietf.org/html/rfc5802#section-3) of the `SCRAM` RFC), most notably the client + proof and the server signature. It sends the client proof (used to authenticate the client) to + the server, and the server then responds by sending the server proof. The hashing function used + to hash the client password that is stored by the server is what differentiates `SCRAM-SHA-1` vs + `SCRAM-SHA-256`, `SHA-1` is used in `SCRAM-SHA-1`. `SCRAM-SHA-256` is the preferred mechanism + over `SCRAM-SHA-1`. Note also that `SCRAM-SHA-256` performs [RFC 4013 SASLprep Unicode + normalization](https://tools.ietf.org/html/rfc4013) on all provided passwords before hashing, + while for backward compatibility reasons, `SCRAM-SHA-1` does not. +- [**PLAIN**](https://tools.ietf.org/html/rfc4616) + - The `PLAIN` mechanism involves two steps for authentication. First, the client concatenates a + message using the authorization id, the authentication id (also the username), and the password + for a user and sends it to the server. The server validates that the information is correct and + authenticates the user. For storage, the server hashes one copy using SHA-1 and another using + SHA-256 so that the password is not stored in plaintext. Even when using the PLAIN mechanism, + the same secrets as used for the SCRAM methods are stored and used for validation. The chief + difference between using PLAIN and SCRAM-SHA-256 (or SCRAM-SHA-1) is that using SCRAM provides + mutual authentication and avoids transmitting the password to the server. With PLAIN, it is + less difficult for a MitM attacker to compromise original credentials. + - **With local users** + - When the PLAIN mechanism is used with internal users, the user information is stored in the + [user + collection](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authorization_manager.cpp#L56) + on the database. See [authorization](#authorization) for more information. + - **With Native LDAP** + - When the PLAIN mechanism uses `Native LDAP`, the credential information is sent to and + received from LDAP when creating and authorizing a user. The mongo server sends user + credentials over the wire to the LDAP server and the LDAP server requests a password. The + mongo server sends the password in plain text and LDAP responds with whether the password is + correct. Here the communication with the driver and the mongod is the same, but the storage + mechanism for the credential information is different. + - **With Cyrus SASL / saslauthd** + - When using saslauthd, the mongo server communicates with a process called saslauthd running on + the same machine. The saslauthd process has ways of communicating with many other servers, + LDAP servers included. Saslauthd works in the same way as Native LDAP except that the + mongo process communicates using unix domain sockets. +- [**GSSAPI**](https://tools.ietf.org/html/rfc4752) + - GSSAPI is an authentication mechanism that supports [Kerberos](https://web.mit.edu/kerberos/) + authentication. GSSAPI is the communication method used to communicate with Kerberos servers and + with clients. When initializing this auth mechanism, the server tries to acquire its credential + information from the KDC by calling + [`tryAcquireServerCredential`](https://github.com/10gen/mongo-enterprise-modules/blob/r4.4.0/src/sasl/mongo_gssapi.h#L36). + If this is not approved, the server fasserts and the mechanism is not registered. On Windows, + SChannel provides a `GSSAPI` library for the server to use. On other platforms, the Cyrus SASL + library is used to make calls to the KDC (Kerberos key distribution center). The specific properties that each SASL mechanism provides is outlined in this table below. -| | Mutual Auth | No Plain Text | -|---------------|-------------|---------------| -| SCRAM | X | X | -| PLAIN | | | -| GSS-API | X | X | +| | Mutual Auth | No Plain Text | +| ------- | ----------- | ------------- | +| SCRAM | X | X | +| PLAIN | | | +| GSS-API | X | X | ### X509 Authentication @@ -205,23 +205,24 @@ The specific properties that each SASL mechanism provides is outlined in this ta certificate key exchange. When the peer certificate validation happens during the SSL handshake, an [`SSLPeerInfo`](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/util/net/ssl_types.h#L113-L143) is created and attached to the transport layer SessionHandle. During `MONGODB-X509` auth, the server -first determines whether or not the client is a driver or a peer server. The server inspects the +first determines whether or not the client is a driver or a peer server. The server inspects the following criteria in this order to determine whether the connecting client is a peer server node: + 1. `net.tls.clusterAuthX509.attributes` is set on the server and the parsed certificate's subject name - contains all of the attributes and values specified in that option. + contains all of the attributes and values specified in that option. 2. `net.tls.clusterAuthX509.extensionValue` is set on the server and the parsed certificate contains - the OID 1.3.6.1.4.1.34601.2.1.2 with a value matching the one specified in that option. This OID - is reserved for the MongoDB cluster membership extension. + the OID 1.3.6.1.4.1.34601.2.1.2 with a value matching the one specified in that option. This OID + is reserved for the MongoDB cluster membership extension. 3. Neither of the above options are set on the server and the parsed certificate's subject name contains the same DC, O, and OU as the certificate the server presents to inbound connections (`tls.certificateKeyFile`). 4. `tlsClusterAuthX509Override.attributes` is set on the server and the parsed certificate's subject name - contains all of the attributes and values specified in that option. + contains all of the attributes and values specified in that option. 5. `tlsClusterAuthX509Override.extensionValue` is set on the server and the parsed certificate contains - the OID 1.3.6.1.4.1.34601.2.1.2 with a value matching the one specified in that option. -If all of these conditions fail, then the server grabs the client's username from the `SSLPeerInfo` -struct and verifies that the client name matches the username provided by the command object and exists -in the `$external` database. In that case, the client is authenticated as that user in `$external`. -Otherwise, authentication fails with ErrorCodes.UserNotFound. + the OID 1.3.6.1.4.1.34601.2.1.2 with a value matching the one specified in that option. + If all of these conditions fail, then the server grabs the client's username from the `SSLPeerInfo` + struct and verifies that the client name matches the username provided by the command object and exists + in the `$external` database. In that case, the client is authenticated as that user in `$external`. + Otherwise, authentication fails with ErrorCodes.UserNotFound. ### Cluster Authentication @@ -231,9 +232,10 @@ a server, they can use any of the authentication mechanisms described [below in section](#sasl). When a mongod or a mongos needs to authenticate to a mongodb server, it does not pass in distinguishing user credentials to authenticate (all servers authenticate to other servers as the `__system` user), so most of the options described below will not necessarily work. However, -two options are available for authentication - keyfile auth and X509 auth. +two options are available for authentication - keyfile auth and X509 auth. #### X509 Intracluster Auth and Member Certificate Rotation + `X509` auth is described in more detail above, but a precondition to using it is having TLS enabled. It is possible for customers to rotate their certificates or change the criteria that is used to determine X.509 cluster membership without any downtime. When the server uses the default criteria @@ -253,9 +255,10 @@ determine X.509 cluster membership without any downtime. When the server uses th An administrator can update the criteria the server uses to determine cluster membership alongside certificate rotation without downtime via the following procedure: + 1. Update server nodes' config files to contain the old certificate subject DN attributes or extension value in `setParameter.tlsClusterAuthX509Override` and the new certificate subject DN attributes - or extension value in `net.tls.clusterAuthX509.attributes` or `net.tls.clusterAuthX509.extensionValue`. + or extension value in `net.tls.clusterAuthX509.attributes` or `net.tls.clusterAuthX509.extensionValue`. 2. Perform a rolling restart of server nodes so that they all load in the override value and new config options. 3. Update server nodes' config files to contain the new certificates in `net.tls.clusterFile` @@ -268,6 +271,7 @@ certificate rotation without downtime via the following procedure: meeting the old criteria as peers. #### Keyfile Intracluster Auth + `keyfile` auth instructors servers to authenticate to each other using the `SCRAM-SHA-256` mechanism as the `local.__system` user who's password can be found in the named key file. A keyfile is a file stored on disk that servers load on startup, sending them when they behave as clients to another @@ -337,7 +341,7 @@ empty AuthorizedRoles set), and is thus "unauthorized", also known as "pre-auth" When a client connects to a database and authorization is enabled, authentication sends a request to get the authorization information of a specific user by calling addAndAuthorizeUser() on the -AuthorizationSession and passing in the `UserName` as an identifier. The `AuthorizationSession` calls +AuthorizationSession and passing in the `UserName` as an identifier. The `AuthorizationSession` calls functions defined in the [`AuthorizationManager`](https://github.com/mongodb/mongo/blob/r4.7.0/src/mongo/db/auth/authorization_manager.h) (described in the next paragraph) to both get the correct `User` object (defined below) from the @@ -357,11 +361,11 @@ The [AuthName](auth_name.h) template provides the generic implementation for `UserName` and `RoleName` implementations. Each of these objects is made up of three component pieces of information. -| Field | Accessor | Use | -| -- | -- | -- | -| `_name` | `getName()` | The symbolic name associated with the user or role, (e.g. 'Alice') | -| `_db` | `getDB()` | The authentication database associated with the named auth identifier (e.g. 'admin' or 'test') | -| `_tenant` | `getTenant()` | When used in multitenancy mode, this value retains a `TenantId` for authorization checking. | +| Field | Accessor | Use | +| --------- | ------------- | ---------------------------------------------------------------------------------------------- | +| `_name` | `getName()` | The symbolic name associated with the user or role, (e.g. 'Alice') | +| `_db` | `getDB()` | The authentication database associated with the named auth identifier (e.g. 'admin' or 'test') | +| `_tenant` | `getTenant()` | When used in multitenancy mode, this value retains a `TenantId` for authorization checking. | [`UserName`](user_name.h) and [`RoleName`](role_name.h) specializations are CRTP defined to include additional `getUser()` and `getRole()` accessors which proxy to `getName()`, @@ -369,8 +373,8 @@ and provide a set of `constexpr StringData` identifiers relating to their type. #### Serializations -* `getDisplayName()` and `toString()` create a new string of the form `name@db` for use in log messages. -* `getUnambiguousName()` creates a new string of the form `db.name` for use in generating `_id` fields for authzn documents and generating unique hashes for logical session identifiers. +- `getDisplayName()` and `toString()` create a new string of the form `name@db` for use in log messages. +- `getUnambiguousName()` creates a new string of the form `db.name` for use in generating `_id` fields for authzn documents and generating unique hashes for logical session identifiers. #### Multitenancy @@ -385,7 +389,7 @@ When a `TenantId` is associated with an `AuthName`, it will NOT be included in ` which is a cache value object from the ReadThroughCache (described in [Authorization Caching](#authorization-caching)). There can be multiple authenticated users for a single `Client` object. The most important elements of a `User` document are the username and the roles set that the -user has. While each `AuthorizationManagerExternalState` implementation may define its own +user has. While each `AuthorizationManagerExternalState` implementation may define its own storage mechanism for `User` object data, they all ultimately surface this data in a format compatible with the `Local` implementation, stored in the `admin.system.users` collection with a schema as follows: @@ -432,20 +436,20 @@ with a schema as follows: In order to define a set of privileges (see [role privileges](#role-privileges) below) granted to a given user, the user must be granted one or more `roles` on their user document, -or by their external authentication provider. Again, a user with no roles has no privileges. +or by their external authentication provider. Again, a user with no roles has no privileges. #### User Credentials The contents of the `credentials` field will depend on the configured authentication -mechanisms enabled for the user. For external authentication providers, -this will simply contain `$external: 1`. For `local` authentication providers, +mechanisms enabled for the user. For external authentication providers, +this will simply contain `$external: 1`. For `local` authentication providers, this will contain any necessary parameters for validating authentications such as the `SCRAM-SHA-256` example above. #### User Authentication Restrictions A user definition may optionally list any number of authentication restrictions. -Currently, only endpoint based restrictions are permitted. These require that a +Currently, only endpoint based restrictions are permitted. These require that a connecting client must come from a specific IP address range (given in [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing)) and/or connect to a specific server address. @@ -537,21 +541,21 @@ For users possessing a given set of roles, their effective privileges and #### Role Privileges Each role imparts privileges in the form of a set of `actions` permitted -against a given `resource`. The strings in the `actions` list correspond +against a given `resource`. The strings in the `actions` list correspond 1:1 with `ActionType` values as specified [here](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/db/auth/action_type.h). Resources may be specified in any of the following nine formats: -| `resource` | Meaning | -| --- | --- | -| {} | Any `normal` collection | -| { db: 'test', collection: '' } | All `normal` collections on the named DB | -| { db: '', collection: 'system.views' } | The specific named collection on all DBs | -| { db: 'test', collection: 'system.view' } | The specific namespace (db+collection) as written | -| { cluster: true } | Used only by cluster-level actions such as `replsetConfigure`. | -| { system_bucket: '' } | Any collection with a prefix of `system.buckets.` in any db| -| { db: '', system_buckets: 'example' } | A collection named `system.buckets.example` in any db| -| { db: 'test', system_buckets: '' } | Any collection with a prefix of `system.buckets.` in `test` db| -| { db: 'test', system_buckets: 'example' } | A collected named `system.buckets.example` in `test` db| +| `resource` | Meaning | +| ----------------------------------------- | -------------------------------------------------------------- | +| {} | Any `normal` collection | +| { db: 'test', collection: '' } | All `normal` collections on the named DB | +| { db: '', collection: 'system.views' } | The specific named collection on all DBs | +| { db: 'test', collection: 'system.view' } | The specific namespace (db+collection) as written | +| { cluster: true } | Used only by cluster-level actions such as `replsetConfigure`. | +| { system_bucket: '' } | Any collection with a prefix of `system.buckets.` in any db | +| { db: '', system_buckets: 'example' } | A collection named `system.buckets.example` in any db | +| { db: 'test', system_buckets: '' } | Any collection with a prefix of `system.buckets.` in `test` db | +| { db: 'test', system_buckets: 'example' } | A collected named `system.buckets.example` in `test` db | #### Normal resources @@ -563,7 +567,7 @@ All other collections are considered `normal` collections. #### Role Authentication Restrictions Authentication restrictions defined on a role have the same meaning as -those defined directly on users. The effective set of `authenticationRestrictions` +those defined directly on users. The effective set of `authenticationRestrictions` imposed on a user is the union of all direct and indirect authentication restrictions. ### Privilege @@ -577,21 +581,21 @@ the full set of privileges across all resource and actionype conbinations for th #### ResourcePattern -A resource pattern is a combination of a [MatchType](action_type.idl) with a `NamespaceString` to possibly narrow the scope of that `MatchType`. Most MatchTypes refer to some storage resource, such as a specific collection or database, however `kMatchClusterResource` refers to an entire host, replica set, or cluster. +A resource pattern is a combination of a [MatchType](action_type.idl) with a `NamespaceString` to possibly narrow the scope of that `MatchType`. Most MatchTypes refer to some storage resource, such as a specific collection or database, however `kMatchClusterResource` refers to an entire host, replica set, or cluster. -| MatchType | As encoded in a privilege doc | Usage | -| -- | -- | -- | -| `kMatchNever` | _Unexpressable_ | A base type only used internally to indicate that the privilege specified by the ResourcePattern can not match any real resource | -| `kMatchClusterResource` | `{ cluster : true }` | Commonly used with host and cluster management actions such as `ActionType::addShard`, `ActionType::setParameter`, or `ActionType::shutdown`. | -| `kMatchAnyResource` | `{ anyResource: true }` | Matches all storage resources, even [non-normal namespaces](#normal-namespace) such as `db.system.views`. | -| `kMatchAnyNormalResource` | `{ db: '', collection: '' }` | Matches all [normal](#normal-namespace) storage resources. Used with [builtin role](builtin_roles.cpp) `readWriteAnyDatabase`. | -| `kMatchDatabaseName` | `{ db: 'dbname', collection: '' }` | Matches all [normal](#normal-namespace) storage resources for a specific named database. Used with [builtin role](builtin_roles.cpp) `readWrite`. | -| `kMatchCollectionName` | `{ db: '', collection: 'collname' }` | Matches all storage resources, normal or not, which have the exact collection suffix '`collname`'. For example, to provide read-only access to `*.system.js`. | -| `kMatchExactNamespace` | `{ db: 'dbname', collection: 'collname' }` | Matches the exact namespace '`dbname`.`collname`'. | -| `kMatchAnySystemBucketResource` | `{ db: '', system_buckets: '' }` | Matches the namespace pattern `*.system.buckets.*`. | -| `kMatchAnySystemBucketInDBResource` | `{ db: 'dbname', system_buckets: '' }` | Matches the namespace pattern `dbname.system.buckets.*`. | -| `kMatchAnySystemBucketInAnyDBResource` | `{ db: '', system_buckets: 'suffix' }` | Matches the namespace pattern `*.system.buckets.suffix`. | -| `kMatchExactSystemBucketResource` | `{ db: 'dbname', system_buckets: 'suffix' }` | Matches the exact namespace `dbname.system.buckets.suffix`. | +| MatchType | As encoded in a privilege doc | Usage | +| -------------------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `kMatchNever` | _Unexpressable_ | A base type only used internally to indicate that the privilege specified by the ResourcePattern can not match any real resource | +| `kMatchClusterResource` | `{ cluster : true }` | Commonly used with host and cluster management actions such as `ActionType::addShard`, `ActionType::setParameter`, or `ActionType::shutdown`. | +| `kMatchAnyResource` | `{ anyResource: true }` | Matches all storage resources, even [non-normal namespaces](#normal-namespace) such as `db.system.views`. | +| `kMatchAnyNormalResource` | `{ db: '', collection: '' }` | Matches all [normal](#normal-namespace) storage resources. Used with [builtin role](builtin_roles.cpp) `readWriteAnyDatabase`. | +| `kMatchDatabaseName` | `{ db: 'dbname', collection: '' }` | Matches all [normal](#normal-namespace) storage resources for a specific named database. Used with [builtin role](builtin_roles.cpp) `readWrite`. | +| `kMatchCollectionName` | `{ db: '', collection: 'collname' }` | Matches all storage resources, normal or not, which have the exact collection suffix '`collname`'. For example, to provide read-only access to `*.system.js`. | +| `kMatchExactNamespace` | `{ db: 'dbname', collection: 'collname' }` | Matches the exact namespace '`dbname`.`collname`'. | +| `kMatchAnySystemBucketResource` | `{ db: '', system_buckets: '' }` | Matches the namespace pattern `*.system.buckets.*`. | +| `kMatchAnySystemBucketInDBResource` | `{ db: 'dbname', system_buckets: '' }` | Matches the namespace pattern `dbname.system.buckets.*`. | +| `kMatchAnySystemBucketInAnyDBResource` | `{ db: '', system_buckets: 'suffix' }` | Matches the namespace pattern `*.system.buckets.suffix`. | +| `kMatchExactSystemBucketResource` | `{ db: 'dbname', system_buckets: 'suffix' }` | Matches the exact namespace `dbname.system.buckets.suffix`. | As `ResourcePattern`s are based on `NamespaceString`, they naturally include an optional `TenantId`, which scopes the pattern to a specific tenant in serverless. A user with a given `TenantId` can only @@ -614,16 +618,16 @@ with no `TenantId`. A "normal" resource is a `namespace` which does not match either of the following patterns: -| Namespace pattern | Examples | Usage | -| -- | -- | -- | -| `local.replset.*` | `local.replset.initialSyncId` | Namespaces used by Replication to manage per-host state. | -| `*.system.*` | `admin.system.version` `myDB.system.views` | Collections used by the database to support user collections. | +| Namespace pattern | Examples | Usage | +| ----------------- | ------------------------------------------ | ------------------------------------------------------------- | +| `local.replset.*` | `local.replset.initialSyncId` | Namespaces used by Replication to manage per-host state. | +| `*.system.*` | `admin.system.version` `myDB.system.views` | Collections used by the database to support user collections. | See also: [NamespaceString::isNormalCollection()](../namespace_string.h) #### ActionType -An [ActionType](action_type.idl) is a task which a client may be expected to perform. These are combined with [ResourcePattern](#resourcepattern)s to produce a [Privilege](#privilege). Note that not all `ActionType`s make sense with all `ResourcePattern`s (e.g. `ActionType::shutdown` applied to `ResourcePattern` `{ db: 'test', collection: 'my.awesome.collection' }`), however the system will generally not prohibit declaring these combinations. +An [ActionType](action_type.idl) is a task which a client may be expected to perform. These are combined with [ResourcePattern](#resourcepattern)s to produce a [Privilege](#privilege). Note that not all `ActionType`s make sense with all `ResourcePattern`s (e.g. `ActionType::shutdown` applied to `ResourcePattern` `{ db: 'test', collection: 'my.awesome.collection' }`), however the system will generally not prohibit declaring these combinations. ### User and Role Management @@ -631,11 +635,11 @@ An [ActionType](action_type.idl) is a task which a client may be expected to per abstraction for mutating the contents of the local authentication database in the `admin.system.users` and `admin.system.roles` collections. These commands are implemented primarily for config and standalone nodes in -[user\_management\_commands.cpp](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/db/commands/user_management_commands.cpp), +[user_management_commands.cpp](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/db/commands/user_management_commands.cpp), and as passthrough proxies for mongos in -[cluster\_user\_management\_commands.cpp](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/s/commands/cluster_user_management_commands.cpp). +[cluster_user_management_commands.cpp](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/s/commands/cluster_user_management_commands.cpp). All command payloads and responses are defined via IDL in -[user\_management\_commands.idl](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/db/commands/user_management_commands.idl) +[user_management_commands.idl](https://github.com/mongodb/mongo/blob/92cc84b0171942375ccbd2312a052bc7e9f159dd/src/mongo/db/commands/user_management_commands.idl) #### UMC Transactions @@ -645,7 +649,7 @@ validating that the command's arguments refer to extant roles, actions, and other user-defined values. The `dropRole` and `dropAllRolesFromDatabase` commands can not be -expressed as a single CRUD op. Instead, they must issue all three of the following ops: +expressed as a single CRUD op. Instead, they must issue all three of the following ops: 1. `Update` the users collection to strip the role(s) from all users possessing it directly. 1. `Update` the roles collection to strip the role(s) from all other roles possessing it as a subordinate. @@ -826,6 +830,7 @@ and checks the current client's authorized users and authorized impersonated use Contracts](https://github.com/mongodb/mongo/blob/r4.9.0-rc0/src/mongo/db/auth/authorization_contract.h) were added in v5.0 to support API Version compatibility testing. Authorization contracts consist of three pieces: + 1. A list of privileges and checks a command makes against `AuthorizationSession` to check if a user is permitted to run the command. These privileges and checks are declared in an IDL file in the `access_check` section. The contract is compiled into the command definition and is available via @@ -843,18 +848,18 @@ exceptions are `getMore` and `explain` since they inherit their checks from othe Refer to the following links for definitions of the Classes referenced in this document: -| Class | File | Description | -| --- | --- | --- | -| `ActionType` | [mongo/db/auth/action\_type.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/action_type.h) | High level categories of actions which may be performed against a given resource (e.g. `find`, `insert`, `update`, etc...) | -| `AuthenticationSession` | [mongo/db/auth/authentication\_session.h](https://github.com/mongodb/mongo/blob/master/src/mongo/db/auth/authentication_session.h) | Session object to persist Authentication state | -| `AuthorizationContract` | [mongo/db/auth/authorization_contract.h](https://github.com/mongodb/mongo/blob/r4.9.0-rc0/src/mongo/db/auth/authorization_contract.h) | Contract generated by IDL| -| `AuthorizationManager` | [mongo/db/auth/authorization\_manager.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authorization_manager.h) | Interface to external state providers | -| `AuthorizationSession` | [mongo/db/auth/authorization\_session.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authorization_session.h) | Representation of currently authenticated and authorized users on the `Client` connection | -| `AuthzManagerExternalStateLocal` | [.../authz\_manager\_external\_state\_local.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authz_manager_external_state_local.h) | `Local` implementation of user/role provider | -| `AuthzManagerExternalStateLDAP` | [.../authz\_manager\_external\_state\_ldap.h](https://github.com/10gen/mongo-enterprise-modules/blob/r4.4.0/src/ldap/authz_manager_external_state_ldap.h) | `LDAP` implementation of users/role provider | -| `Client` | [mongo/db/client.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/client.h) | An active client session, typically representing a remote driver or shell | -| `Privilege` | [mongo/db/auth/privilege.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/privilege.h) | A set of `ActionType`s permitted on a particular `resource' | -| `ResourcePattern` | [mongo/db/auth/resource\_pattern.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/resource_pattern.h) | A reference to a namespace, db, collection, or cluster to apply a set of `ActionType` privileges to | -| `RoleName` | [mongo/db/auth/role\_name.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/role_name.h) | A typed tuple containing a named role on a particular database | -| `User` | [mongo/db/auth/user.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/user.h) | A representation of a authorization user, including all direct and subordinte roles and their privileges and authentication restrictions | -| `UserName` | [mongo/db/auth/user\_name.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/user_name.h) | A typed tuple containing a named user on a particular database | +| Class | File | Description | +| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | +| `ActionType` | [mongo/db/auth/action_type.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/action_type.h) | High level categories of actions which may be performed against a given resource (e.g. `find`, `insert`, `update`, etc...) | +| `AuthenticationSession` | [mongo/db/auth/authentication_session.h](https://github.com/mongodb/mongo/blob/master/src/mongo/db/auth/authentication_session.h) | Session object to persist Authentication state | +| `AuthorizationContract` | [mongo/db/auth/authorization_contract.h](https://github.com/mongodb/mongo/blob/r4.9.0-rc0/src/mongo/db/auth/authorization_contract.h) | Contract generated by IDL | +| `AuthorizationManager` | [mongo/db/auth/authorization_manager.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authorization_manager.h) | Interface to external state providers | +| `AuthorizationSession` | [mongo/db/auth/authorization_session.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authorization_session.h) | Representation of currently authenticated and authorized users on the `Client` connection | +| `AuthzManagerExternalStateLocal` | [.../authz_manager_external_state_local.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/authz_manager_external_state_local.h) | `Local` implementation of user/role provider | +| `AuthzManagerExternalStateLDAP` | [.../authz_manager_external_state_ldap.h](https://github.com/10gen/mongo-enterprise-modules/blob/r4.4.0/src/ldap/authz_manager_external_state_ldap.h) | `LDAP` implementation of users/role provider | +| `Client` | [mongo/db/client.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/client.h) | An active client session, typically representing a remote driver or shell | +| `Privilege` | [mongo/db/auth/privilege.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/privilege.h) | A set of `ActionType`s permitted on a particular `resource' | +| `ResourcePattern` | [mongo/db/auth/resource_pattern.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/resource_pattern.h) | A reference to a namespace, db, collection, or cluster to apply a set of `ActionType` privileges to | +| `RoleName` | [mongo/db/auth/role_name.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/role_name.h) | A typed tuple containing a named role on a particular database | +| `User` | [mongo/db/auth/user.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/user.h) | A representation of a authorization user, including all direct and subordinte roles and their privileges and authentication restrictions | +| `UserName` | [mongo/db/auth/user_name.h](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/auth/user_name.h) | A typed tuple containing a named user on a particular database | diff --git a/src/mongo/db/catalog/README.md b/src/mongo/db/catalog/README.md index a20e00e30f6..0dc9c803405 100644 --- a/src/mongo/db/catalog/README.md +++ b/src/mongo/db/catalog/README.md @@ -1,4 +1,5 @@ # Execution Internals + The storage execution layer provides an interface for higher level MongoDB components, including query, replication and sharding, to all storage engines compatible with MongoDB. It maintains a catalog, in-memory and on-disk, of collections and indexes. It also implements an additional (to @@ -352,40 +353,43 @@ of the `$indexStats` operator. The following is a sample test output from ``` - ## Collection Catalog + The `CollectionCatalog` class holds in-memory state about all collections in all databases and is a cache of the [durable catalog](#durable-catalog) state. It provides the following functionality: - * Register new `Collection` objects, taking ownership of them. - * Lookup `Collection` objects by their `UUID` or `NamespaceString`. - * Iterate over `Collection` objects in a database in `UUID` order. - * Deregister individual dropped `Collection` objects, releasing ownership. - * Allow closing/reopening the catalog while still providing limited `UUID` to `NamespaceString` - lookup to support rollback to a point in time. - * Ensures `Collection` objects are in-sync with opened storage snapshots. + +- Register new `Collection` objects, taking ownership of them. +- Lookup `Collection` objects by their `UUID` or `NamespaceString`. +- Iterate over `Collection` objects in a database in `UUID` order. +- Deregister individual dropped `Collection` objects, releasing ownership. +- Allow closing/reopening the catalog while still providing limited `UUID` to `NamespaceString` + lookup to support rollback to a point in time. +- Ensures `Collection` objects are in-sync with opened storage snapshots. ### Synchronization -Catalog access is synchronized using [Multiversion concurrency control] where readers operate on -immutable catalog, collection and index instances. Writes use [copy-on-write][] to create newer -versions of the catalog, collection and index instances to be changed, contents are copied from the -previous latest version. Readers holding on to a catalog instance will thus not observe any writes -that happen after requesting an instance. If it is desired to observe writes while holding a catalog + +Catalog access is synchronized using [Multiversion concurrency control] where readers operate on +immutable catalog, collection and index instances. Writes use [copy-on-write][] to create newer +versions of the catalog, collection and index instances to be changed, contents are copied from the +previous latest version. Readers holding on to a catalog instance will thus not observe any writes +that happen after requesting an instance. If it is desired to observe writes while holding a catalog instance then the reader must refresh it. Catalog writes are handled with the `CollectionCatalog::write(callback)` interface. It provides the necessary [copy-on-write][] abstractions. A writable catalog instance is created by making a shallow copy of the existing catalog. The actual write is implemented in the supplied callback which is allowed to throw. Execution of the write callbacks are serialized and may run on a different -thread than the thread calling `CollectionCatalog::write`. Users should take care of not performing +thread than the thread calling `CollectionCatalog::write`. Users should take care of not performing any blocking operations in these callbacks as it would block all other DDL writes in the system. To avoid a bottleneck in the case the catalog contains a large number of collections (being slow to -copy), immutable data structures are used, concurrent writes are also batched together. Any thread -that enters `CollectionCatalog::write` while a catalog instance is being copied or while executing -write callbacks is enqueued. When the copy finishes, all enqueued write jobs are run on that catalog +copy), immutable data structures are used, concurrent writes are also batched together. Any thread +that enters `CollectionCatalog::write` while a catalog instance is being copied or while executing +write callbacks is enqueued. When the copy finishes, all enqueued write jobs are run on that catalog instance by the copying thread. ### Collection objects + Objects of the `Collection` class provide access to a collection's properties between [DDL](#glossary) operations that modify these properties. Modifications are synchronized using [copy-on-write][]. Reads access immutable `Collection` instances. Writes, such as rename @@ -394,27 +398,29 @@ the new `Collection` instance in the catalog. It is possible for operations that points in time to use different `Collection` objects. Notable properties of `Collection` objects are: - * catalog ID - to look up or change information from the DurableCatalog. - * UUID - Identifier that remains for the lifetime of the underlying MongoDb collection, even across - DDL operations such as renames, and is consistent between different nodes and shards in a - cluster. - * NamespaceString - The current name associated with the collection. - * Collation and validation properties. - * Decorations that are either `Collection` instance specific or shared between all `Collection` - objects over the lifetime of the collection. + +- catalog ID - to look up or change information from the DurableCatalog. +- UUID - Identifier that remains for the lifetime of the underlying MongoDb collection, even across + DDL operations such as renames, and is consistent between different nodes and shards in a + cluster. +- NamespaceString - The current name associated with the collection. +- Collation and validation properties. +- Decorations that are either `Collection` instance specific or shared between all `Collection` + objects over the lifetime of the collection. In addition `Collection` objects have shared ownership of: - * An [`IndexCatalog`](#index-catalog) - an in-memory structure representing the `md.indexes` data - from the durable catalog. - * A `RecordStore` - an interface to access and manipulate the documents in the collection as stored - by the storage engine. + +- An [`IndexCatalog`](#index-catalog) - an in-memory structure representing the `md.indexes` data + from the durable catalog. +- A `RecordStore` - an interface to access and manipulate the documents in the collection as stored + by the storage engine. A writable `Collection` may only be requested in an active [WriteUnitOfWork](#WriteUnitOfWork). The -new `Collection` instance is installed in the catalog when the storage transaction commits as the -first `onCommit` [Changes](#Changes) that run. This means that it is not allowed to perform any -modification to catalog, collection or index instances in `onCommit` handlers. Such modifications -would break the immutability property of these instances for readers. If the storage transaction -rolls back then the writable `Collection` object is simply discarded and no change is ever made to +new `Collection` instance is installed in the catalog when the storage transaction commits as the +first `onCommit` [Changes](#Changes) that run. This means that it is not allowed to perform any +modification to catalog, collection or index instances in `onCommit` handlers. Such modifications +would break the immutability property of these instances for readers. If the storage transaction +rolls back then the writable `Collection` object is simply discarded and no change is ever made to the catalog. A writable `Collection` is a clone of the existing `Collection`, members are either deep or @@ -432,18 +438,20 @@ versioned query information per Collection instance. Additionally, there are between `Collection` instances across DDL operations. ### Collection lifetime + The `Collection` object is brought to existence in two ways: + 1. Any DDL operation is run. Non-create operations such as `collMod` clone the existing `Collection` object. 2. Using an existing durable catalog entry to instantiate an existing collection. This happens when we: - 1. Load the `CollectionCatalog` during startup or after rollback. - 2. When we need to instantiate a collection at an earlier point-in-time because the `Collection` - is not present in the `CollectionCatalog`, or the `Collection` is there, but incompatible with - the snapshot. See [here](#catalog-changes-versioning-and-the-minimum-valid-snapshot) how a - `Collection` is determined to be incompatible. - 3. When we read at latest concurrently with a DDL operation that is also performing multikey - changes. + 1. Load the `CollectionCatalog` during startup or after rollback. + 2. When we need to instantiate a collection at an earlier point-in-time because the `Collection` + is not present in the `CollectionCatalog`, or the `Collection` is there, but incompatible with + the snapshot. See [here](#catalog-changes-versioning-and-the-minimum-valid-snapshot) how a + `Collection` is determined to be incompatible. + 3. When we read at latest concurrently with a DDL operation that is also performing multikey + changes. For (1) and (2.1) the `Collection` objects are stored as shared pointers in the `CollectionCatalog` and available to all operations running in the database. These `Collection` objects are released @@ -462,16 +470,19 @@ same operation. concurrent multikey changes. Users of `Collection` instances have a few responsibilities to keep the object valid. + 1. Hold a collection-level lock. 2. Use an AutoGetCollection helper. 3. Explicitly hold a reference to the `CollectionCatalog`. ### Index Catalog + Each `Collection` object owns an `IndexCatalog` object, which in turn has shared ownership of `IndexCatalogEntry` objects that each again own an `IndexDescriptor` containing an in-memory presentation of the data stored in the [durable catalog](#durable-catalog). ## Catalog Changes, versioning and the Minimum Valid Snapshot + Every catalog change has a corresponding write with a commit time. When registered `OpObserver` objects observe catalog changes, they set the minimum valid snapshot of the `Collection` to the commit timestamp. The `CollectionCatalog` uses this timestamp to determine whether the `Collection` @@ -489,20 +500,22 @@ can guarantee that `Collection` objects are fully in-sync with the storage snaps With lock-free reads there may be ongoing concurrent DDL operations. In order to have a `CollectionCatalog` that's consistent with the snapshot, the following is performed when setting up a lock-free read: -* Get the latest version of the `CollectionCatalog`. -* Open a snapshot. -* Get the latest version of the `CollectionCatalog` and check if it matches the one obtained - earlier. If not, we need to retry this. Otherwise we'd have a `CollectionCatalog` that's - inconsistent with the opened snapshot. + +- Get the latest version of the `CollectionCatalog`. +- Open a snapshot. +- Get the latest version of the `CollectionCatalog` and check if it matches the one obtained + earlier. If not, we need to retry this. Otherwise we'd have a `CollectionCatalog` that's + inconsistent with the opened snapshot. ## Collection Catalog and Multi-document Transactions -* When we start the transaction we open a storage snapshot and stash a CollectionCatalog instance - similar to a regular lock-free read (but holding the RSTL as opposed to lock-free reads). -* User reads within this transaction lock the namespace and ensures we have a Collection instance - consistent with the snapshot (same as above). -* User writes do an additional step after locking to check if the collection instance obtained is - the latest instance in the CollectionCatalog, if it is not we treat this as a WriteConflict so the - transaction is retried. + +- When we start the transaction we open a storage snapshot and stash a CollectionCatalog instance + similar to a regular lock-free read (but holding the RSTL as opposed to lock-free reads). +- User reads within this transaction lock the namespace and ensures we have a Collection instance + consistent with the snapshot (same as above). +- User writes do an additional step after locking to check if the collection instance obtained is + the latest instance in the CollectionCatalog, if it is not we treat this as a WriteConflict so the + transaction is retried. The `CollectionCatalog` contains a mapping of `Namespace` and `UUID` to the `catalogId` for timestamps back to the oldest timestamp. These are used for efficient lookups into the durable @@ -550,34 +563,34 @@ and prevents the reaper from dropping the collection and index tables during the _Code spelunking starting points:_ -* [_The KVDropPendingIdentReaper - class_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/kv/kv_drop_pending_ident_reaper.h) - * Handles the second phase of collection/index drop. Runs when notified. -* [_The TimestampMonitor and TimestampListener - classes_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.h#L178-L313) - * The TimestampMonitor starts a periodic job to notify the reaper of the latest timestamp that is - okay to reap. -* [_Code that signals the reaper with a - timestamp_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.cpp#L932-L949) +- [_The KVDropPendingIdentReaper + class_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/kv/kv_drop_pending_ident_reaper.h) + - Handles the second phase of collection/index drop. Runs when notified. +- [_The TimestampMonitor and TimestampListener + classes_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.h#L178-L313) + - The TimestampMonitor starts a periodic job to notify the reaper of the latest timestamp that is + okay to reap. +- [_Code that signals the reaper with a + timestamp_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.cpp#L932-L949) # Storage Transactions Through the pluggable [storage engine API](https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/README.md), MongoDB executes reads and writes on its storage engine -with [snapshot isolation](#glossary). The structure used to achieve this is the [RecoveryUnit +with [snapshot isolation](#glossary). The structure used to achieve this is the [RecoveryUnit class](../storage/recovery_unit.h). ## RecoveryUnit Each pluggable storage engine for MongoDB must implement `RecoveryUnit` as one of the base classes -for the storage engine API. Typically, storage engines satisfy the `RecoveryUnit` requirements with +for the storage engine API. Typically, storage engines satisfy the `RecoveryUnit` requirements with some form of [snapshot isolation](#glossary) with [transactions](#glossary). Such transactions are called storage transactions elsewhere in this document, to differentiate them from the higher-level -_multi-document transactions_ accessible to users of MongoDB. The RecoveryUnit controls what -[snapshot](#glossary) a storage engine transaction uses for its reads. In MongoDB, a snapshot is defined by a +_multi-document transactions_ accessible to users of MongoDB. The RecoveryUnit controls what +[snapshot](#glossary) a storage engine transaction uses for its reads. In MongoDB, a snapshot is defined by a _timestamp_. A snapshot consists of all data committed with a timestamp less than or equal to the -snapshot's timestamp. No uncommitted data is visible in a snapshot, and data changes in storage +snapshot's timestamp. No uncommitted data is visible in a snapshot, and data changes in storage transactions that commit after a snapshot is created, regardless of their timestamps, are also not -visible. Generally, one uses a `RecoveryUnit` to perform transactional reads and writes by first +visible. Generally, one uses a `RecoveryUnit` to perform transactional reads and writes by first configuring the `RecoveryUnit` with the desired [ReadSource](https://github.com/mongodb/mongo/blob/b2c1fa4f121fdb6cdffa924b802271d68c3367a3/src/mongo/db/storage/recovery_unit.h#L391-L421) and then performing the reads and writes using operations on `RecordStore` or `SortedDataInterface`, @@ -586,32 +599,32 @@ and finally calling `commit()` on the `WriteUnitOfWork` (if performing writes). ## WriteUnitOfWork A `WriteUnitOfWork` is the mechanism to control how writes are transactionally performed on the -storage engine. All the writes (and reads) performed within its scope are part of the same storage -transaction. After all writes have been staged, one must call `commit()` in order to atomically -commit the transaction to the storage engine. It is illegal to perform writes outside the scope of -a WriteUnitOfWork since there would be no way to commit them. If the `WriteUnitOfWork` falls out of +storage engine. All the writes (and reads) performed within its scope are part of the same storage +transaction. After all writes have been staged, one must call `commit()` in order to atomically +commit the transaction to the storage engine. It is illegal to perform writes outside the scope of +a WriteUnitOfWork since there would be no way to commit them. If the `WriteUnitOfWork` falls out of scope before `commit()` is called, the storage transaction is rolled back and all the staged writes -are lost. Reads can be performed outside of a `WriteUnitOfWork` block; storage transactions outside +are lost. Reads can be performed outside of a `WriteUnitOfWork` block; storage transactions outside of a `WriteUnitOfWork` are always rolled back, since there are no writes to commit. ## Lazy initialization of storage transactions Note that storage transactions on WiredTiger are not started at the beginning of a `WriteUnitOfWork` -block. Instead, the transaction is started implicitly with the first read or write operation. To +block. Instead, the transaction is started implicitly with the first read or write operation. To explicitly start a transaction, one can use `RecoveryUnit::preallocateSnapshot()`. ## Changes -One can register a `Change` on a `RecoveryUnit` while in a `WriteUnitOfWork`. This allows extra -actions to be performed based on whether a `WriteUnitOfWork` commits or rolls back. These actions +One can register a `Change` on a `RecoveryUnit` while in a `WriteUnitOfWork`. This allows extra +actions to be performed based on whether a `WriteUnitOfWork` commits or rolls back. These actions will typically update in-memory state to match what was written in the storage transaction, in a -transactional way. Note that `Change`s are not executed until the destruction of the -`WriteUnitOfWork`, which can be long after the storage engine committed. Two-phase locking ensures +transactional way. Note that `Change`s are not executed until the destruction of the +`WriteUnitOfWork`, which can be long after the storage engine committed. Two-phase locking ensures that all locks are held while a Change's `commit()` or `rollback()` function runs. ## StorageUnavailableException -`StorageUnavailableException` indicates that a storage transaction rolled back due to +`StorageUnavailableException` indicates that a storage transaction rolled back due to resource contention in the storage engine. This exception is the base of exceptions related to concurrency (`WriteConflict`) and to those related to cache pressure (`TemporarilyUnavailable` and `TransactionTooLargeForCache`). @@ -675,13 +688,11 @@ TransactionTooLargeForCacheException is always converted to a WriteConflictExcep faster, to avoid stalling replication longer than necessary. Prior to 6.3, or when TransactionTooLargeForCacheException is disabled, multi-document -transactions always return a WriteConflictException, which may result in drivers retrying an +transactions always return a WriteConflictException, which may result in drivers retrying an operation indefinitely. For non-multi-document operations, there is a limited number of retries on TemporarilyUnavailableException, but it might still be beneficial to not retry operations which are unlikely to complete and are disruptive for concurrent operations. - - # Read Operations External reads via the find, count, distint, aggregation and mapReduce cmds do not take collection @@ -700,7 +711,7 @@ See [WiredTigerCursor](https://github.com/mongodb/mongo/blob/r4.4.0-rc13/src/mongo/db/storage/wiredtiger/wiredtiger_cursor.cpp#L48), [WiredTigerRecoveryUnit::getSession()](https://github.com/mongodb/mongo/blob/r4.4.0-rc13/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp#L303-L305), [~GlobalLock](https://github.com/mongodb/mongo/blob/r4.4.0-rc13/src/mongo/db/concurrency/d_concurrency.h#L228-L239), -[PlanYieldPolicy::_yieldAllLocks()](https://github.com/mongodb/mongo/blob/r4.4.0-rc13/src/mongo/db/query/plan_yield_policy.cpp#L182), +[PlanYieldPolicy::\_yieldAllLocks()](https://github.com/mongodb/mongo/blob/r4.4.0-rc13/src/mongo/db/query/plan_yield_policy.cpp#L182), [RecoveryUnit::abandonSnapshot()](https://github.com/mongodb/mongo/blob/r4.4.0-rc13/src/mongo/db/storage/recovery_unit.h#L217). ## Collection Reads @@ -750,8 +761,9 @@ the state from changing. When setting up the read, `AutoGetCollectionForRead` will trigger the instantiation of a `Collection` object when either: -* Reading at an earlier time than the minimum valid snapshot of the matching `Collection` from the `CollectionCatalog`. -* No matching `Collection` is found in the `CollectionCatalog`. + +- Reading at an earlier time than the minimum valid snapshot of the matching `Collection` from the `CollectionCatalog`. +- No matching `Collection` is found in the `CollectionCatalog`. In versions earlier than v7.0 this would error with `SnapshotUnavailable`. @@ -779,9 +791,9 @@ access to any collection. _Code spelunking starting points:_ -* [_AutoGetCollectionForReadLockFree preserves an immutable CollectionCatalog_](https://github.com/mongodb/mongo/blob/dcf844f384803441b5393664e500008fc6902346/src/mongo/db/db_raii.cpp#L141) -* [_AutoGetCollectionForReadLockFree returns early if already running lock-free_](https://github.com/mongodb/mongo/blob/dcf844f384803441b5393664e500008fc6902346/src/mongo/db/db_raii.cpp#L108-L112) -* [_The lock-free operation flag on the OperationContext_](https://github.com/mongodb/mongo/blob/dcf844f384803441b5393664e500008fc6902346/src/mongo/db/operation_context.h#L298-L300) +- [_AutoGetCollectionForReadLockFree preserves an immutable CollectionCatalog_](https://github.com/mongodb/mongo/blob/dcf844f384803441b5393664e500008fc6902346/src/mongo/db/db_raii.cpp#L141) +- [_AutoGetCollectionForReadLockFree returns early if already running lock-free_](https://github.com/mongodb/mongo/blob/dcf844f384803441b5393664e500008fc6902346/src/mongo/db/db_raii.cpp#L108-L112) +- [_The lock-free operation flag on the OperationContext_](https://github.com/mongodb/mongo/blob/dcf844f384803441b5393664e500008fc6902346/src/mongo/db/operation_context.h#L298-L300) ## Secondary Reads @@ -845,8 +857,6 @@ See and [WiredTigerRecordStore::insertRecords](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L1494). - - # Concurrency Control Theoretically, one could design a database that used only mutexes to maintain database consistency @@ -855,7 +865,7 @@ performance and would a be strain on the operating system. Therefore, databases complex method of coordinating operations. This design consists of Resources (lockable entities), some of which may be organized in a Hierarchy, and Locks (requests for access to a resource). A Lock Manager is responsible for keeping track of Resources and Locks, and for managing each Resource's -Lock Queue. The Lock Manager identifies Resources with a ResourceId. +Lock Queue. The Lock Manager identifies Resources with a ResourceId. ## Resource Hierarchy @@ -872,7 +882,7 @@ to be locked, one must first lock the one Global resource, and then lock the Dat is the parent of the Collection. Finally, the Collection resource is locked. In addition to these ResourceTypes, there also exists ResourceMutex, which is independent of this -hierarchy. One can use ResourceMutex instead of a regular mutex if one desires the features of the +hierarchy. One can use ResourceMutex instead of a regular mutex if one desires the features of the lock manager, such as fair queuing and the ability to have multiple simultaneous lock holders. ## Lock Modes @@ -882,12 +892,13 @@ Rather than the binary "locked-or-not" modes of a mutex, a MongoDB lock can have _modes_. Modes have different _compatibilities_ with other locks for the same resource. Locks with compatible modes can be simultaneously granted to the same resource, while locks with modes that are incompatible with any currently granted lock on a resource must wait in the waiting queue for that -resource until the conflicting granted locks are unlocked. The different types of modes are: +resource until the conflicting granted locks are unlocked. The different types of modes are: + 1. X (exclusive): Used to perform writes and reads on the resource. 2. S (shared): Used to perform only reads on the resource (thus, it is okay to Share with other compatible locks). 3. IX (intent-exclusive): Used to indicate that an X lock is taken at a level in the hierarchy below - this resource. This lock mode is used to block X or S locks on this resource. + this resource. This lock mode is used to block X or S locks on this resource. 4. IS (intent-shared): Used to indicate that an S lock is taken at a level in the hierarchy below this resource. This lock mode is used to block X locks on this resource. @@ -922,7 +933,7 @@ More information on the RSTL is contained in the [Replication Architecture Guide ### Global Lock -The resource known as the Global Lock is of ResourceType Global. It is currently used to +The resource known as the Global Lock is of ResourceType Global. It is currently used to synchronize shutdown, so that all operations are finished with the storage engine before closing it. Certain types of global storage engine operations, such as recoverToStableTimestamp(), also require this lock to be held in exclusive mode. @@ -938,7 +949,7 @@ or writes (IX) at the database or lower level. ### Database Lock Any resource of ResourceType Database protects certain database-wide operations such as database -drop. These operations are being phased out, in the hopes that we can eliminate this ResourceType +drop. These operations are being phased out, in the hopes that we can eliminate this ResourceType completely. ### Collection Lock @@ -946,12 +957,12 @@ completely. Any resource of ResourceType Collection protects certain collection-wide operations, and in some cases also protects the in-memory catalog structure consistency in the face of concurrent readers and writers of the catalog. Acquiring this resource with an intent lock is an indication that the -operation is doing explicit reads (IS) or writes (IX) at the document level. There is no Document +operation is doing explicit reads (IS) or writes (IX) at the document level. There is no Document ResourceType, as locking at this level is done in the storage engine itself for efficiency reasons. ### Document Level Concurrency Control -Each storage engine is responsible for locking at the document level. The WiredTiger storage engine +Each storage engine is responsible for locking at the document level. The WiredTiger storage engine uses MVCC [multiversion concurrency control][] along with optimistic locking in order to provide concurrency guarantees. @@ -959,15 +970,14 @@ concurrency guarantees. The lock manager automatically provides _two-phase locking_ for a given storage transaction. Two-phase locking consists of an Expanding phase where locks are acquired but not released, and a -subsequent Shrinking phase where locks are released but not acquired. By adhering to this protocol, +subsequent Shrinking phase where locks are released but not acquired. By adhering to this protocol, a transaction will be guaranteed to be serializable with other concurrent transactions. The WriteUnitOfWork class manages two-phase locking in MongoDB. This results in the somewhat unexpected behavior of the RAII locking types acquiring locks on resources upon their construction but not unlocking the lock upon their destruction when going out of scope. Instead, the responsibility of -unlocking the locks is transferred to the WriteUnitOfWork destructor. Note this is only true for +unlocking the locks is transferred to the WriteUnitOfWork destructor. Note this is only true for transactions that do writes, and therefore only for code that uses WriteUnitOfWork. - # Indexes An index is a storage engine data structure that provides efficient lookup on fields in a @@ -975,7 +985,7 @@ collection's data set. Indexes map document fields, keys, to documents such that scan is not required when querying on a specific field. All user collections have a unique index on the `_id` field, which is required. The oplog and some -system collections do not have an _id index. +system collections do not have an \_id index. Also see [MongoDB Manual - Indexes](https://docs.mongodb.com/manual/indexes/). @@ -987,11 +997,12 @@ A unique index maintains a constraint such that duplicate values are not allowed field(s). To convert a regular index to unique, one has to follow the two-step process: - * The index has to be first set to `prepareUnique` state using `collMod` command with the index - option `prepareUnique: true`. In this state, the index will start rejecting writes introducing - duplicate keys. - * The `collMod` command with the index option `unique: true` will then check for the uniqueness - constraint and finally update the index spec in the catalog under a collection `MODE_X` lock. + +- The index has to be first set to `prepareUnique` state using `collMod` command with the index + option `prepareUnique: true`. In this state, the index will start rejecting writes introducing + duplicate keys. +- The `collMod` command with the index option `unique: true` will then check for the uniqueness + constraint and finally update the index spec in the catalog under a collection `MODE_X` lock. If the index already has duplicate keys, the conversion in step two will fail and return all violating documents' ids grouped by the keys. Step two can be retried to finish the conversion after @@ -1014,7 +1025,7 @@ index in the durable catalog entry for the collection. Since this catalog entry across the entire collection, allowing any writer to modify the catalog entry would result in excessive WriteConflictExceptions for other writers. -To solve this problem, the multikey state is tracked in memory, and only persisted when it changes +To solve this problem, the multikey state is tracked in memory, and only persisted when it changes to `true`. Once `true`, an index is always multikey. See @@ -1029,20 +1040,20 @@ index must correctly map keys to all documents. At a high level, omitting details that will be elaborated upon in further sections, index builds have the following procedure: -* While holding a collection X lock, write a new index entry to the array of indexes included as - part of a durable catalog entry. This entry has a `ready: false` component. See [Durable - Catalog](#durable-catalog). -* Downgrade to a collection IX lock. -* Scan all documents on the collection to be indexed - * Generate [KeyString](#keystring) keys for the indexed fields for each document - * Periodically yield locks and storage engine snapshots - * Insert the generated keys into the [external sorter](#the-external-sorter) -* Read the sorted keys from the external sorter and [bulk + +- While holding a collection X lock, write a new index entry to the array of indexes included as + part of a durable catalog entry. This entry has a `ready: false` component. See [Durable + Catalog](#durable-catalog). +- Downgrade to a collection IX lock. +- Scan all documents on the collection to be indexed + - Generate [KeyString](#keystring) keys for the indexed fields for each document + - Periodically yield locks and storage engine snapshots + - Insert the generated keys into the [external sorter](#the-external-sorter) +- Read the sorted keys from the external sorter and [bulk load](http://source.wiredtiger.com/3.2.1/tune_bulk_load.html) into the storage engine index. Bulk-loading requires keys to be inserted in sorted order, but builds a B-tree structure that is more efficiently filled than with random insertion. -* While holding a collection X lock, make a final `ready: true` write to the durable catalog. - +- While holding a collection X lock, make a final `ready: true` write to the durable catalog. ## Hybrid Index Builds @@ -1065,11 +1076,12 @@ a deletion of the key `1` and an insertion of the key `2`. Once the collection scan and bulk-load phases of the index build are complete, these intercepted keys are applied directly to the index in three phases: -* While holding a collection IX lock to allow concurrent reads and writes - * Because writes are still accepted, new keys may appear at the end of the _side-writes_ table. - They will be applied in subsequent steps. -* While holding a collection S lock to block concurrent writes, but not reads -* While holding a collection X lock to block all reads and writes + +- While holding a collection IX lock to allow concurrent reads and writes + - Because writes are still accepted, new keys may appear at the end of the _side-writes_ table. + They will be applied in subsequent steps. +- While holding a collection S lock to block concurrent writes, but not reads +- While holding a collection X lock to block all reads and writes See [IndexBuildInterceptor::sideWrite](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/index/index_build_interceptor.cpp#L403) @@ -1139,13 +1151,13 @@ Manual](https://docs.mongodb.com/master/core/index-creation/#index-builds-in-rep Server 7.1 introduces the following improvements: -* Index builds abort immediately after detecting errors other than duplicate key -violations. Before 7.1, index builds aborted the index build close to -completion, potentially long after detection. -* A secondary member can abort a two-phase index build. Before 7.1, a secondary was forced -to crash instead. See the [Voting for Abort](#voting-for-abort) section. -* Index builds are cancelled if there isn't enough storage space available. See the - [Disk Space](#disk-space) section. +- Index builds abort immediately after detecting errors other than duplicate key + violations. Before 7.1, index builds aborted the index build close to + completion, potentially long after detection. +- A secondary member can abort a two-phase index build. Before 7.1, a secondary was forced + to crash instead. See the [Voting for Abort](#voting-for-abort) section. +- Index builds are cancelled if there isn't enough storage space available. See the + [Disk Space](#disk-space) section. ### Commit Quorum @@ -1183,7 +1195,7 @@ The `commitQuorum` for a running index build may be changed by the user via the server command. See -[IndexBuildsCoordinator::_waitForNextIndexBuildActionAndCommit](https://github.com/mongodb/mongo/blob/r4.4.0-rc9/src/mongo/db/index_builds_coordinator_mongod.cpp#L632). +[IndexBuildsCoordinator::\_waitForNextIndexBuildActionAndCommit](https://github.com/mongodb/mongo/blob/r4.4.0-rc9/src/mongo/db/index_builds_coordinator_mongod.cpp#L632). ### Voting for Abort @@ -1211,15 +1223,16 @@ which defaults to 500MB. On clean shutdown, index builds save their progress in internal idents that will be used for resuming the index builds when the server starts up. The persisted information includes: -* [Phase of the index build](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/resumable_index_builds.idl#L43) when it was interrupted for shutdown: - * initialized - * collection scan - * bulk load - * drain writes -* Information relevant to the phase for reconstructing the internal state of the index build at - startup. This may include: - * The internal state of the external sorter. - * Idents for side writes, duplicate keys, and skipped records. + +- [Phase of the index build](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/resumable_index_builds.idl#L43) when it was interrupted for shutdown: + - initialized + - collection scan + - bulk load + - drain writes +- Information relevant to the phase for reconstructing the internal state of the index build at + startup. This may include: + - The internal state of the external sorter. + - Idents for side writes, duplicate keys, and skipped records. During [startup recovery](#startup-recovery), the persisted information is used to reconstruct the in-memory state for the index build and resume from the phase that we left off in. If we fail to @@ -1228,10 +1241,11 @@ resume the index build for whatever reason, the index build will restart from th Not all incomplete index builds are resumable upon restart. The current criteria for index build resumability can be found in [IndexBuildsCoordinator::isIndexBuildResumable()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/index_builds_coordinator.cpp#L375). Generally, index builds are resumable under the following conditions: -* Storage engine is configured to be persistent with encryption disabled. -* The index build is running on a voting member of the replica set with the default [commit quorum](#commit-quorum) - `"votingMembers"`. -* Majority read concern is enabled. + +- Storage engine is configured to be persistent with encryption disabled. +- The index build is running on a voting member of the replica set with the default [commit quorum](#commit-quorum) + `"votingMembers"`. +- Majority read concern is enabled. The [Recover To A Timestamp (RTT) rollback algorithm](https://github.com/mongodb/mongo/blob/04b12743cbdcfea11b339e6ad21fc24dec8f6539/src/mongo/db/repl/README.md#rollback) supports resuming index builds interrupted at any phase. On entering rollback, the resumable @@ -1244,9 +1258,9 @@ collection scan phase. Index builds wait for the majority commit point to advanc the collection scan. The majority wait happens after installing the [side table for intercepting new writes](#temporary-side-table-for-new-writes). -See [MultiIndexBlock::_constructStateObject()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/catalog/multi_index_block.cpp#L900) +See [MultiIndexBlock::\_constructStateObject()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/catalog/multi_index_block.cpp#L900) for where we persist the relevant information necessary to resume the index build at shutdown -and [StorageEngineImpl::_handleInternalIdents()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/storage/storage_engine_impl.cpp#L329) +and [StorageEngineImpl::\_handleInternalIdents()](https://github.com/mongodb/mongo/blob/0d45dd9d7ba9d3a1557217a998ad31c68a897d47/src/mongo/db/storage/storage_engine_impl.cpp#L329) for where we search for and parse the resume information on startup. ## Single-Phase Index Builds @@ -1275,9 +1289,10 @@ thousands or millions of key-value pairs requires dozens of such comparisons. To make these comparisons fast, there exists a 1:1 mapping between `BSONObj` and `KeyString`, where `KeyString` is [binary comparable](#glossary). So, for a transformation function `t` converting `BSONObj` to `KeyString` and two `BSONObj` values `x` and `y`, the following holds: -* `x < y` ⇔ `memcmp(t(x),t(y)) < 0` -* `x > y` ⇔ `memcmp(t(x),t(y)) > 0` -* `x = y` ⇔ `memcmp(t(x),t(y)) = 0` + +- `x < y` ⇔ `memcmp(t(x),t(y)) < 0` +- `x > y` ⇔ `memcmp(t(x),t(y)) > 0` +- `x = y` ⇔ `memcmp(t(x),t(y)) = 0` ## Ordering @@ -1302,13 +1317,13 @@ secondary oplog application and [initial sync][] where the uniqueness constraint temporarily. Indexes store key value pairs where the key is the `KeyString`. Current WiredTiger secondary unique indexes may have a mix of the old and new representations described below. -| Index type | (Key, Value) | Data Format Version | -| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | -| `_id` index | (`KeyString` without `RecordId`, `RecordId` and optionally `TypeBits`) | index V1: 6
index V2: 8 | -| non-unique index | (`KeyString` with `RecordId`, optionally `TypeBits`) | index V1: 6
index V2: 8 | -| unique secondary index created before 4.2 | (`KeyString` without `RecordId`, `RecordId` and optionally `TypeBits`) | index V1: 6
index V2: 8 | +| Index type | (Key, Value) | Data Format Version | +| --------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | +| `_id` index | (`KeyString` without `RecordId`, `RecordId` and optionally `TypeBits`) | index V1: 6
index V2: 8 | +| non-unique index | (`KeyString` with `RecordId`, optionally `TypeBits`) | index V1: 6
index V2: 8 | +| unique secondary index created before 4.2 | (`KeyString` without `RecordId`, `RecordId` and optionally `TypeBits`) | index V1: 6
index V2: 8 | | unique secondary index created in 4.2 OR after upgrading to 4.2 | New keys: (`KeyString` with `RecordId`, optionally `TypeBits`)
Old keys:(`KeyString` without `RecordId`, `RecordId` and optionally `TypeBits`) | index V1: 11
index V2: 12 | -| unique secondary index created in 6.0 or later | (`KeyString` with `RecordId`, optionally `TypeBits`) | index V1: 13
index V2: 14 | +| unique secondary index created in 6.0 or later | (`KeyString` with `RecordId`, optionally `TypeBits`) | index V1: 13
index V2: 14 | The reason for the change in index format is that the secondary key uniqueness property can be temporarily violated during oplog application (because operations may be applied out of order). @@ -1325,16 +1340,17 @@ validation to check if there are keys in the old format in unique secondary inde ## Building KeyString values and passing them around There are three kinds of builders for constructing `KeyString` values: -* `key_string::Builder`: starts building using a small allocation on the stack, and - dynamically switches to allocating memory from the heap. This is generally preferable if the value - is only needed in the scope where it was created. -* `key_string::HeapBuilder`: always builds using dynamic memory allocation. This has advantage that - calling the `release` method can transfer ownership of the memory without copying. -* `key_string::PooledBuilder`: This class allow building many `KeyString` values tightly packed into - larger blocks. The advantage is fewer, larger memory allocations and no wasted space due to - internal fragmentation. This is a good approach when a large number of values is needed, such as - for index building. However, memory for a block is only released after _no_ references to that - block remain. + +- `key_string::Builder`: starts building using a small allocation on the stack, and + dynamically switches to allocating memory from the heap. This is generally preferable if the value + is only needed in the scope where it was created. +- `key_string::HeapBuilder`: always builds using dynamic memory allocation. This has advantage that + calling the `release` method can transfer ownership of the memory without copying. +- `key_string::PooledBuilder`: This class allow building many `KeyString` values tightly packed into + larger blocks. The advantage is fewer, larger memory allocations and no wasted space due to + internal fragmentation. This is a good approach when a large number of values is needed, such as + for index building. However, memory for a block is only released after _no_ references to that + block remain. The `key_string::Value` class holds a reference to a `SharedBufferFragment` with the `KeyString` and its `TypeBits` if any and can be used for passing around values. @@ -1361,7 +1377,7 @@ regulate when to write a chunk of sorted data out to disk in a temporary file. _Code spelunking starting points:_ -* [_The External Sorter Classes_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/sorter/sorter.h) +- [_The External Sorter Classes_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/sorter/sorter.h) # The TTLMonitor @@ -1370,18 +1386,20 @@ The TTLMonitor runs as a background job on each mongod. On a mongod primary, the The TTLMonitor exhibits different behavior pending on whether batched deletes are enabled. When enabled (the default), the TTLMonitor batches TTL deletions and also removes expired documents more fairly among TTL indexes. When disabled, the TTLMonitor falls back to legacy, doc-by-doc deletions and deletes all expired documents from a single TTL index before moving to the next one. The legacy behavior can lead to the TTLMonitor getting "stuck" deleting large ranges of documents on a single TTL index, starving other indexes of deletes at regular intervals. ### Fair TTL Deletion + If ['ttlMonitorBatchDeletes'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L48) is specified, the TTLMonitor will batch deletes and provides fair TTL deletion as follows: -* The TTL pass consists of one or more subpasses. -* Each subpass refreshes its view of TTL indexes in the system. It removes documents on each TTL index in a round-robin fashion until there are no more expired documents or ['ttlMonitorSubPassTargetSecs'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L58) is reached. - * The delete on each TTL index removes up to ['ttlIndexDeleteTargetDocs'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L84) or runs up to ['ttlIndexDeleteTargetTimeMS'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L72), whichever target is met first. The same TTL index can be queued up to be revisited in the same subpass if there are outstanding deletions. - * A TTL index is not visited any longer in a subpass once all documents are deleted. -* If there are outstanding deletions by the end of the subpass for any TTL index, a new subpass starts immediately within the same pass. + +- The TTL pass consists of one or more subpasses. +- Each subpass refreshes its view of TTL indexes in the system. It removes documents on each TTL index in a round-robin fashion until there are no more expired documents or ['ttlMonitorSubPassTargetSecs'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L58) is reached. + - The delete on each TTL index removes up to ['ttlIndexDeleteTargetDocs'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L84) or runs up to ['ttlIndexDeleteTargetTimeMS'](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl#L72), whichever target is met first. The same TTL index can be queued up to be revisited in the same subpass if there are outstanding deletions. + - A TTL index is not visited any longer in a subpass once all documents are deleted. +- If there are outstanding deletions by the end of the subpass for any TTL index, a new subpass starts immediately within the same pass. _Code spelunking starting points:_ -* [_The TTLMonitor Class_](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.h) -* [_The TTLCollectionCache Class_](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl_collection_cache.h) -* [_ttl.idl_](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl) +- [_The TTLMonitor Class_](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.h) +- [_The TTLCollectionCache Class_](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl_collection_cache.h) +- [_ttl.idl_](https://github.com/mongodb/mongo/blob/d88a892d5b18035bd0f5393a42690e705c2007d7/src/mongo/db/ttl.idl) # Repair @@ -1392,30 +1410,30 @@ power outages. MongoDB provides a command-line `--repair` utility that attempts to recover as much data as possible from an installation that fails to start up due to data corruption. -- [Types of Corruption](#types-of-corruption) -- [Repair Procedure](#repair-procedure) +- [Types of Corruption](#types-of-corruption) +- [Repair Procedure](#repair-procedure) ## Types of Corruption MongoDB repair attempts to address the following forms of corruption: -* Corrupt WiredTiger data files - * Includes all collections, `_mdb_catalog`, and `sizeStorer` -* Missing WiredTiger data files - * Includes all collections, `_mdb_catalog`, and `sizeStorer` -* Index inconsistencies - * Validate [repair mode](#repair-mode) attempts to fix index inconsistencies to avoid a full index - rebuild. - * Indexes are rebuilt on collections after they have been salvaged or if they fail validation and - validate repair mode is unable to fix all errors. -* Unsalvageable collection data files -* Corrupt metadata - * `WiredTiger.wt`, `WiredTiger.turtle`, and WT journal files -* “Orphaned” data files - * Collection files missing from the `WiredTiger.wt` metadata - * Collection files missing from the `_mdb_catalog` table - * We cannot support restoring orphaned files that are missing from both metadata sources -* Missing `featureCompatibilityVersion` document +- Corrupt WiredTiger data files + - Includes all collections, `_mdb_catalog`, and `sizeStorer` +- Missing WiredTiger data files + - Includes all collections, `_mdb_catalog`, and `sizeStorer` +- Index inconsistencies + - Validate [repair mode](#repair-mode) attempts to fix index inconsistencies to avoid a full index + rebuild. + - Indexes are rebuilt on collections after they have been salvaged or if they fail validation and + validate repair mode is unable to fix all errors. +- Unsalvageable collection data files +- Corrupt metadata + - `WiredTiger.wt`, `WiredTiger.turtle`, and WT journal files +- “Orphaned” data files + - Collection files missing from the `WiredTiger.wt` metadata + - Collection files missing from the `_mdb_catalog` table + - We cannot support restoring orphaned files that are missing from both metadata sources +- Missing `featureCompatibilityVersion` document ## Repair Procedure @@ -1427,11 +1445,11 @@ MongoDB repair attempts to address the following forms of corruption: 2. Initialize the StorageEngine and [salvage the `_mdb_catalog` table, if needed](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.cpp#L95). 3. Recover orphaned collections. - * If an [ident](#glossary) is known to WiredTiger but is not present in the `_mdb_catalog`, + - If an [ident](#glossary) is known to WiredTiger but is not present in the `_mdb_catalog`, [create a new collection](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.cpp#L145-L189) with the prefix `local.orphan.` that references this ident. - * If an ident is present in the `_mdb_catalog` but not known to WiredTiger, [attempt to recover + - If an ident is present in the `_mdb_catalog` but not known to WiredTiger, [attempt to recover the ident](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_impl.cpp#L197-L229). This [procedure for orphan @@ -1444,28 +1462,28 @@ MongoDB repair attempts to address the following forms of corruption: 4. [Verify collection data files](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L1195-L1226), and salvage if necessary. - * If call to WiredTiger - [verify()](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a0334da4c85fe8af4197c9a7de27467d3) - fails, call - [salvage()](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#ab3399430e474f7005bd5ea20e6ec7a8e), - which recovers as much data from a WT data file as possible. - * If a salvage is unsuccessful, rename the data file with a `.corrupt` suffix. - * If a data file is missing or a salvage was unsuccessful, [drop the original table from the + - If call to WiredTiger + [verify()](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a0334da4c85fe8af4197c9a7de27467d3) + fails, call + [salvage()](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#ab3399430e474f7005bd5ea20e6ec7a8e), + which recovers as much data from a WT data file as possible. + - If a salvage is unsuccessful, rename the data file with a `.corrupt` suffix. + - If a data file is missing or a salvage was unsuccessful, [drop the original table from the metadata, and create a new, empty table](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L1262-L1274) under the original name. This allows MongoDB to continue to start up despite present corruption. - * After any salvage operation, [all indexes are + - After any salvage operation, [all indexes are rebuilt](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/repair_database.cpp#L134-L149) for that collection. 5. Validate collection and index consistency - * [Collection validation](#collection-validation) checks for consistency between the collection + - [Collection validation](#collection-validation) checks for consistency between the collection and indexes. Validate repair mode attempts to fix any inconsistencies it finds. 6. Rebuild indexes - * If a collection's data has been salvaged or any index inconsistencies are not repairable by + - If a collection's data has been salvaged or any index inconsistencies are not repairable by validate repair mode, [all indexes are rebuilt](https://github.com/mongodb/mongo/blob/4406491b2b137984c2583db98068b7d18ea32171/src/mongo/db/repair.cpp#L273-L275). - * While a unique index is being rebuilt, if any documents are found to have duplicate keys, then + - While a unique index is being rebuilt, if any documents are found to have duplicate keys, then those documents are inserted into a lost and found collection with the format `local.lost_and_found.`. 7. [Invalidate the replica set @@ -1475,16 +1493,18 @@ MongoDB repair attempts to address the following forms of corruption: and threatening the consistency of its replica set. Additionally: -* When repair starts, it creates a temporary file, `_repair_incomplete` that is only removed when - repair completes. The server [will not start up - normally](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_init.cpp#L82-L86) - as long as this file is present. -* Repair [will restore a - missing](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/repair_database_and_check_version.cpp#L434) - `featureCompatibilityVersion` document in the `admin.system.version` to the lower FCV version - available. + +- When repair starts, it creates a temporary file, `_repair_incomplete` that is only removed when + repair completes. The server [will not start up + normally](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/storage_engine_init.cpp#L82-L86) + as long as this file is present. +- Repair [will restore a + missing](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/repair_database_and_check_version.cpp#L434) + `featureCompatibilityVersion` document in the `admin.system.version` to the lower FCV version + available. # Startup Recovery + There are three components to startup recovery. The first step, of course, is starting WiredTiger. WiredTiger will replay its log, if any, from a crash. While the WT log also contains entries that are specific to WT, most of its entries are to re-insert items into MongoDB's oplog @@ -1503,43 +1523,37 @@ their own WT table. [The appendix](#Collection-and-Index-to-Table-relationship) relationship between creating/dropping a collection and the underlying creation/deletion of a WT table which justifies the following logic. When reconciling, every WT table that is not "pointed to" by a MongoDB record store or index [gets -dropped](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L663-L676 -"Github"). A MongoDB record store that points to a WT table that doesn't exist is considered [a +dropped](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L663-L676 "Github"). A MongoDB record store that points to a WT table that doesn't exist is considered [a fatal -error](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L679-L693 -"Github"). An index that doesn't point to a WT table is [ignored and logged](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L734-L746 -"Github") because there are cetain cases where the catalog entry may reference an index ident which +error](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L679-L693 "Github"). An index that doesn't point to a WT table is [ignored and logged](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L734-L746 "Github") because there are cetain cases where the catalog entry may reference an index ident which is no longer present, such as when an unclean shutdown occurs before a checkpoint is taken during startup recovery. -The second step of recovering the catalog is [reconciling unfinished index builds](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L695-L699 -"Github"), that could have different outcomes: -* An [index build with a UUID](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L748-L751 "Github") -is an unfinished two-phase build and must be restarted, unless we are -[resuming it](#resumable-index-builds). This resume information is stored in an internal ident -written at (clean) shutdown. If we fail to resume the index build, we will clean up the internal -ident and restart the index build in the background. -* An [unfinished index build on standalone](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L792-L794 "Github") -will be discarded (no oplog entry was ever written saying the index exists). - +The second step of recovering the catalog is [reconciling unfinished index builds](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L695-L699 "Github"), that could have different outcomes: +- An [index build with a UUID](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L748-L751 "Github") + is an unfinished two-phase build and must be restarted, unless we are + [resuming it](#resumable-index-builds). This resume information is stored in an internal ident + written at (clean) shutdown. If we fail to resume the index build, we will clean up the internal + ident and restart the index build in the background. +- An [unfinished index build on standalone](https://github.com/mongodb/mongo/blob/6c9adc9a2d518fa046c7739e043a568f9bee6931/src/mongo/db/storage/storage_engine_impl.cpp#L792-L794 "Github") + will be discarded (no oplog entry was ever written saying the index exists). After storage completes its recovery, control is passed to [replication -recovery](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/README.md#startup-recovery -"Github"). While storage recovery is responsible for recovering the oplog to meet durability +recovery](https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/README.md#startup-recovery "Github"). While storage recovery is responsible for recovering the oplog to meet durability guarantees and getting the two catalogs in sync, replication recovery takes responsibility for getting collection data in sync with the oplog. Replication starts replaying oplog from the `recovery_timestamp + 1`. When WiredTiger takes a checkpoint, it uses the [`stable_timestamp`](https://github.com/mongodb/mongo/blob/87de9a0cb1/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L2011 "Github") (effectively a `read_timestamp`) for what data should be persisted in the -checkpoint. Every "data write" (collection/index contents, _mdb_catalog contents) corresponding to an oplog entry with a +checkpoint. Every "data write" (collection/index contents, \_mdb_catalog contents) corresponding to an oplog entry with a timestamp <= the `stable_timestamp` will be included in this checkpoint. None of the data writes later than the `stable_timestamp` are included in the checkpoint. When the checkpoint is completed, the `stable_timestamp` is known as the checkpoint's [`checkpoint_timestamp`](https://github.com/mongodb/mongo/blob/834a3c49d9ea9bfe2361650475158fc0dbb374cd/src/third_party/wiredtiger/src/meta/meta_ckpt.c#L921 "Github"). When WiredTiger starts up on a checkpoint, that checkpoint's timestamp is known as the -[`recovery_timestamp`](https://github.com/mongodb/mongo/blob/87de9a0cb1/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L684 -"Github"). +[`recovery_timestamp`](https://github.com/mongodb/mongo/blob/87de9a0cb1/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L684 "Github"). ## Recovery To A Stable Timestamp + Also known as rollback-to-stable, this is an operation that retains only modifications that are considered stable. In other words, we are rolling back to the latest checkpoint. @@ -1564,6 +1578,7 @@ See [here](https://github.com/mongodb/mongo/blob/5bd1d0880a7519e54678684b3d243f5 for more information on what happens in the replication layer during rollback-to-stable. # File-System Backups + Backups represent a full copy of the data files at a point-in-time. These copies of the data files can be used to recover data from a consistent state at an earlier time. This technique is commonly used after a disaster ensued in the database. @@ -1578,6 +1593,7 @@ backups in the case of data loss events. [Documentation for further reading.](https://docs.mongodb.com/manual/core/backups/) # Queryable Backup (Read-Only) + This is a feature provided by Ops Manager in which Ops Manager quickly and securely makes a given snapshot accessible over a MongoDB connection string. @@ -1612,13 +1628,14 @@ the data files. To avoid taking unnecessary checkpoints on an idle server, WiredTiger will only take checkpoints for the following scenarios: -* When the [stable timestamp](../repl/README.md#replication-timestamp-glossary) is greater than or - equal to the [initial data timestamp](../repl/README.md#replication-timestamp-glossary), we take a - stable checkpoint, which is a durable view of the data at a particular timestamp. This is for - steady-state replication. -* The [initial data timestamp](../repl/README.md#replication-timestamp-glossary) is not set, so we - must take a full checkpoint. This is when there is no consistent view of the data, such as during - initial sync. + +- When the [stable timestamp](../repl/README.md#replication-timestamp-glossary) is greater than or + equal to the [initial data timestamp](../repl/README.md#replication-timestamp-glossary), we take a + stable checkpoint, which is a durable view of the data at a particular timestamp. This is for + steady-state replication. +- The [initial data timestamp](../repl/README.md#replication-timestamp-glossary) is not set, so we + must take a full checkpoint. This is when there is no consistent view of the data, such as during + initial sync. Not only does checkpointing provide us with durability for the database, but it also enables us to take [backups of the data](#file-system-backups). @@ -1659,24 +1676,26 @@ apply (or not apply) T10 through T20. _Code spelunking starting points:_ -* [_The JournalFlusher class_](https://github.com/mongodb/mongo/blob/767494374cf12d76fc74911d1d0fcc2bbce0cd6b/src/mongo/db/storage/control/journal_flusher.h) - * Perioidically and upon request flushes the journal to disk. -* [_Code that ultimately calls flush journal on WiredTiger_](https://github.com/mongodb/mongo/blob/767494374cf12d76fc74911d1d0fcc2bbce0cd6b/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp#L241-L362) - * Skips flushing if ephemeral mode engine; may do a journal flush or take a checkpoint depending - on server settings. -* [_Control of whether journaling is enabled_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.h#L451) - * 'durable' confusingly means journaling is enabled. -* [_Whether WT journals a collection_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp#L560-L580) +- [_The JournalFlusher class_](https://github.com/mongodb/mongo/blob/767494374cf12d76fc74911d1d0fcc2bbce0cd6b/src/mongo/db/storage/control/journal_flusher.h) + - Perioidically and upon request flushes the journal to disk. +- [_Code that ultimately calls flush journal on WiredTiger_](https://github.com/mongodb/mongo/blob/767494374cf12d76fc74911d1d0fcc2bbce0cd6b/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp#L241-L362) + - Skips flushing if ephemeral mode engine; may do a journal flush or take a checkpoint depending + on server settings. +- [_Control of whether journaling is enabled_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.h#L451) + - 'durable' confusingly means journaling is enabled. +- [_Whether WT journals a collection_](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp#L560-L580) # Global Lock Admission Control + There are 2 separate ticketing mechanisms placed in front of the global lock acquisition. Both aim to limit the number of concurrent operations from overwhelming the system. Before an operation can acquire the global lock, it must acquire a ticket from one, or both, of the ticketing mechanisms. When both ticket mechanisms are necessary, the acquisition order is as follows: + 1. Flow Control - Required only for global lock requests in MODE_IX 2. Execution Control - Required for all global lock requests - Flow Control is in place to prevent a majority of secondaries from falling behind in replication, whereas Execution Control aims to limit the number of concurrent storage engine transactions on a single node. ## Admission Priority + Associated with every operation is an admission priority, stored as a part of the [AdmissionContext](https://github.com/mongodb/mongo/blob/r6.3.0-rc0/src/mongo/util/concurrency/admission_context.h#L40). By default, operations are 'normal' priority. In the Flow Control ticketing system, operations of 'immediate' priority bypass ticket acquisition regardless of ticket availability. Tickets that are not 'immediate' priority must throttle when there are no tickets available in both Flow Control and Execution Control. @@ -1684,13 +1703,15 @@ In the Flow Control ticketing system, operations of 'immediate' priority bypass Flow Control is only concerned whether an operation is 'immediate' priority and does not differentiate between 'normal' and 'low' priorities. The current version of Execution Control relies on admission priority to administer tickets when the server is under load. **AdmissionContext::Priority** -* `kImmediate` - Reserved for operations critical to availability (e.g replication workers), or observability (e.g. FTDC), and any operation releasing resources (e.g. committing or aborting prepared transactions). -* `kNormal` - An operation that should be throttled when the server is under load. If an operation is throttled, it will not affect availability or observability. Most operations, both user and internal, should use this priority unless they qualify as 'kLow' or 'kImmediate' priority. -* `kLow` - Reserved for background tasks that have no other operations dependent on them. The operation will be throttled under load and make significantly less progress compared to operations of higher priorities in the Execution Control. + +- `kImmediate` - Reserved for operations critical to availability (e.g replication workers), or observability (e.g. FTDC), and any operation releasing resources (e.g. committing or aborting prepared transactions). +- `kNormal` - An operation that should be throttled when the server is under load. If an operation is throttled, it will not affect availability or observability. Most operations, both user and internal, should use this priority unless they qualify as 'kLow' or 'kImmediate' priority. +- `kLow` - Reserved for background tasks that have no other operations dependent on them. The operation will be throttled under load and make significantly less progress compared to operations of higher priorities in the Execution Control. [See AdmissionContext::Priority for more details](https://github.com/mongodb/mongo/blob/r7.0.0-rc0/src/mongo/util/concurrency/admission_context.h#L45-L67). ### How to Set Admission Priority + The preferred method for setting an operation's priority is through the RAII type [ScopedAdmissionPriorityForLock](https://github.com/mongodb/mongo/blob/r7.0.0-rc0/src/mongo/db/concurrency/locker.h#L747). ``` @@ -1700,6 +1721,7 @@ ScopedAdmissionPriorityForLock priority(shard_role_details::getLocker(opCtx), Ad Since the GlobalLock may be acquired and released multiple times throughout an operation's lifetime, it's important to limit the scope of reprioritization to prevent unintentional side-effects. However, if there is a special circumstance where the RAII cannot possibly be used, the priority can be set directly through [Locker::setAdmissionPriority()](https://github.com/mongodb/mongo/blob/r7.0.0-rc0/src/mongo/db/concurrency/locker.h#L525). ### Developer Guidelines for Declaring Low Admission Priority + Developers must evaluate the consequences of each low priority operation from falling too far behind, and should try to implement safeguards to avoid any undesirable behaviors for excessive delays in low priority operations. Whenever possible, an operation should dynamically choose when to be deprioritized or re-prioritized. More @@ -1711,16 +1733,19 @@ priority. However, it's important they don't fall too far behind TTL inserts - o unbounded collection growth. To remedy this issue, TTL deletes on a collection [are reprioritized](https://github.com/mongodb/mongo/blob/d1a0e34e1e67d4a2b23104af2512d14290b25e5f/src/mongo/db/ttl.idl#L96) to normal priority if they can't catch up after n-subpasses. Examples of Deprioritized Operations: -* [TTL deletes](https://github.com/mongodb/mongo/blob/0ceb784512f81f77f0bc55001f83ca77d1aa1d84/src/mongo/db/ttl.cpp#L488) -* [Persisting sampled queries for analyze shard key](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/commands/write_commands.cpp#L295) -* [Unbounded Index Scans](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/query/planner_access.cpp#L1913) -* [Unbounded Collection Scans](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/query/planner_analysis.cpp#L1254) -* Index Builds [(1)](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/index_builds_coordinator.cpp#L3064), [(2)](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/index_builds_coordinator.cpp#L3105) + +- [TTL deletes](https://github.com/mongodb/mongo/blob/0ceb784512f81f77f0bc55001f83ca77d1aa1d84/src/mongo/db/ttl.cpp#L488) +- [Persisting sampled queries for analyze shard key](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/commands/write_commands.cpp#L295) +- [Unbounded Index Scans](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/query/planner_access.cpp#L1913) +- [Unbounded Collection Scans](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/query/planner_analysis.cpp#L1254) +- Index Builds [(1)](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/index_builds_coordinator.cpp#L3064), [(2)](https://github.com/mongodb/mongo/blob/0ef2c68f58ea20c2dde99e5ce3ea10b79e18453d/src/mongo/db/index_builds_coordinator.cpp#L3105) ## Execution Control + A ticketing mechanism that limits the number of concurrent storage engine transactions in a single mongod to reduce contention on storage engine resources. ### Ticket Management + There are 2 separate pools of available tickets: one pool for global lock read requests (MODE_S/MODE_IS), and one pool of tickets for global lock write requests (MODE_IX). As of v7.0, the size of each ticket pool is managed dynamically by the server to maximize throughput. Details of the algorithm can be found [here](https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/execution_control/README.md). This dynamic management can be disabled by specifying the size of each pool manually via server parameters `storageEngineConcurrentReadTransactions` (read ticket pool) and `storageEngineConcurrentWriteTransactions` (write ticket pool). @@ -1728,17 +1753,20 @@ As of v7.0, the size of each ticket pool is managed dynamically by the server to Each pool of tickets is maintained in a [TicketHolder](https://github.com/mongodb/mongo/blob/r6.3.0-rc0/src/mongo/util/concurrency/ticketholder.h#L52). Tickets distributed from a given TicketHolder will always be returned to the same TicketHolder (a write ticket will always be returned to the TicketHolder with the write ticket pool). ### Deprioritization + When resources are limited, its important to prioritize which operations are admitted to run first. The [PriorityTicketHolder](https://github.com/mongodb/mongo/blob/r6.3.0-rc0/src/mongo/util/concurrency/priority_ticketholder.h) enables deprioritization of low priority operations and is used by default on [linux machines](https://jira.mongodb.org/browse/SERVER-72616). If the server is not under load (there are tickets available for the global lock request mode), then tickets are handed out immediately, regardless of admission priority. Otherwise, operations wait until a ticket is available. Operations waiting for a ticket are assigned to a TicketQueue according to their priority. There are two queues, one manages low priority operations, the other normal priority operations. When a ticket is released to the PriorityTicketHolder, the default behavior for the PriorityTicketHolder is as follows: + 1. Attempt a ticket transfer through the normal priority TicketQueue. If unsuccessful (e.g there are no normal priority operations waiting for a ticket), continue to (2) 2. Attempt a ticket transfer through the the low priority TicketQueue 3. If no transfer can be made, return the ticket to the general ticket pool #### Preventing Low Priority Operations from Falling too Far Behind + If a server is consistently under load, and ticket transfers were always made through the normal priority TicketQueue first, then operations assigned to the low priority TicketQueue could starve. To remedy this, `lowPriorityAdmissionBypassThreshold` limits the number of consecutive ticket transfers to the normal priority TicketQueue before a ticket transfer is issued through the low priority TicketQueue. ## Flow Control @@ -1776,6 +1804,7 @@ should allow in order to address the majority committed lag. The Flow Control mechanism determines how many flow control tickets to replenish every period based on: + 1. The current majority committed replication lag with respect to the configured target maximum replication lag 1. How many operations the secondary sustaining the commit point has applied in the last period @@ -1824,7 +1853,6 @@ client or system operations unless they are part of an operation that is explici Flow Control. Writes that occur as part of replica set elections in particular are excluded. See SERVER-39868 for more details. - # Collection Validation Collection validation is used to check both the validity and integrity of the data, which in turn @@ -1832,121 +1860,125 @@ informs us whether there’s any data corruption present in the collection at th There are two forms of validation, foreground and background. -* Foreground validation requires exclusive access to the collection which prevents CRUD operations -from running. The benefit of this is that we're not validating a potentially stale snapshot and that -allows us to perform corrective operations such as fixing the collection's fast count. +- Foreground validation requires exclusive access to the collection which prevents CRUD operations + from running. The benefit of this is that we're not validating a potentially stale snapshot and that + allows us to perform corrective operations such as fixing the collection's fast count. -* Background validation runs lock-free on the collection and reads using a timestamp in -order to have a consistent view across the collection and its indexes. This mode allows CRUD -operations to be performed without being blocked. +- Background validation runs lock-free on the collection and reads using a timestamp in + order to have a consistent view across the collection and its indexes. This mode allows CRUD + operations to be performed without being blocked. Additionally, users can specify that they'd like to perform a `full` validation. -* Storage engines run custom validation hooks on the - [RecordStore](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/record_store.h#L445-L451) - and - [SortedDataInterface](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/sorted_data_interface.h#L130-L135) - as part of the storage interface. -* These hooks enable storage engines to perform internal data structure checks that MongoDB would - otherwise not be able to perform. -* More comprehensive and time-consuming checks will run to detect more types of non-conformant BSON - documents with duplicate field names, invalid UTF-8 characters, and non-decompressible BSON - Columns. -* Full validations are not compatible with background validation. + +- Storage engines run custom validation hooks on the + [RecordStore](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/record_store.h#L445-L451) + and + [SortedDataInterface](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/sorted_data_interface.h#L130-L135) + as part of the storage interface. +- These hooks enable storage engines to perform internal data structure checks that MongoDB would + otherwise not be able to perform. +- More comprehensive and time-consuming checks will run to detect more types of non-conformant BSON + documents with duplicate field names, invalid UTF-8 characters, and non-decompressible BSON + Columns. +- Full validations are not compatible with background validation. [Public docs on how to run validation and interpret the results.](https://docs.mongodb.com/manual/reference/command/validate/) ## Types of Validation -* Verifies the collection's durable catalog entry and in-memory state match. -* Indexes are marked as [multikey](#multikey-indexes) correctly. -* Index [multikey](#multikey-indexes) paths cover all of the records in the `RecordStore`. -* Indexes are not missing [multikey](#multikey-indexes) metadata information. -* Index entries are in increasing order if the sort order is ascending. -* Index entries are in decreasing order if the sort order is descending. -* Unique indexes do not have duplicate keys. -* Documents in the collection are valid and conformant `BSON`. -* Fast count matches the number of records in the `RecordStore`. - + For foreground validation only. -* The number of _id index entries always matches the number of records in the `RecordStore`. -* The number of index entries for each index is not greater than the number of records in the record - store. - + Not checked for indexed arrays and wildcard indexes. -* The number of index entries for each index is not less than the number of records in the record - store. - + Not checked for sparse and partial indexes. -* Time-series bucket collections are valid. + +- Verifies the collection's durable catalog entry and in-memory state match. +- Indexes are marked as [multikey](#multikey-indexes) correctly. +- Index [multikey](#multikey-indexes) paths cover all of the records in the `RecordStore`. +- Indexes are not missing [multikey](#multikey-indexes) metadata information. +- Index entries are in increasing order if the sort order is ascending. +- Index entries are in decreasing order if the sort order is descending. +- Unique indexes do not have duplicate keys. +- Documents in the collection are valid and conformant `BSON`. +- Fast count matches the number of records in the `RecordStore`. + - For foreground validation only. +- The number of \_id index entries always matches the number of records in the `RecordStore`. +- The number of index entries for each index is not greater than the number of records in the record + store. + - Not checked for indexed arrays and wildcard indexes. +- The number of index entries for each index is not less than the number of records in the record + store. + - Not checked for sparse and partial indexes. +- Time-series bucket collections are valid. ## Validation Procedure -* Instantiates the objects used throughout the validation procedure. - + [ValidateState](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_state.h) - maintains the state for the collection being validated, such as locking, cursor management - for the collection and each index, data throttling (for background validation), and general - information about the collection. - + [IndexConsistency](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.h) - descendents keep track of the number of keys detected in the record store and indexes. Detects when there - are index inconsistencies and maintains the information about the inconsistencies for - reporting. - + [ValidateAdaptor](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.h) - used to traverse the record store and indexes. Validates that the records seen are valid - `BSON` conformant to most [BSON specifications](https://bsonspec.org/spec.html). In `full` - and `checkBSONConformance` validation modes, all `BSON` checks, including the time-consuming - ones, will be enabled. -* If a `full` validation was requested, we run the storage engines validation hooks at this point to - allow a more thorough check to be performed. -* Validates the [collection’s in-memory](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/collection.h) - state with the [durable catalog](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/durable_catalog.h#L242-L243) - entry information to ensure there are [no mismatches](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/collection_validation.cpp#L363-L425) - between the two. -* [Initializes all the cursors](https://github.com/mongodb/mongo/blob/07765dda62d4709cddc9506ea378c0d711791b57/src/mongo/db/catalog/validate_state.cpp#L144-L205) - on the `RecordStore` and `SortedDataInterface` of each index in the `ValidateState` object. - + We choose a read timestamp (`ReadSource`) based on the validation mode: `kNoTimestamp` - for foreground validation and `kProvided` for background validation. -* Traverses the `RecordStore` using the `ValidateAdaptor` object. - + [Validates each record and adds the document's index key set to the IndexConsistency objects](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L61-L140) - for consistency checks at later stages. - + In an effort to reduce the memory footprint of validation, the `IndexConsistency` objects - [hashes](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L307-L309) - the keys (or paths) passed in to one of many buckets. - + Document keys (or paths) will - [increment](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L204-L214) - the respective bucket. - + Index keys (paths) will - [decrement](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L239-L248) - the respective bucket. - + Checks that the `RecordId` is in [increasing order](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L305-L308). - + [Adjusts the fast count](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L348-L353) - stored in the `RecordStore` (when performing a foreground validation only). -* Traverses the index entries for each index in the collection. - + [Validates the index key order to ensure that index entries are in increasing or decreasing order](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L144-L188). - + Adds the index key to the `IndexConsistency` objects for consistency checks at later stages. -* After the traversals are finished, the `IndexConsistency` objects are checked to detect any - inconsistencies between the collection and indexes. - + If a bucket has a `value of 0`, then there are no inconsistencies for the keys that hashed - there. - + If a bucket has a `value greater than 0`, then we are missing index entries. - + If a bucket has a `value less than 0`, then we have extra index entries. -* Upon detection of any index inconsistencies, the [second phase of validation](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/collection_validation.cpp#L186-L240) - is executed. If no index inconsistencies were detected, we’re finished and we report back to the - user. - + The second phase of validation re-runs the first phase and expands its memory footprint by - recording the detailed information of the keys that were inconsistent during the first phase - of validation (keys that hashed to buckets where the value was not 0 in the end). - + This is used to [pinpoint exactly where the index inconsistencies were detected](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L109-L202) - and to report them. + +- Instantiates the objects used throughout the validation procedure. + - [ValidateState](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_state.h) + maintains the state for the collection being validated, such as locking, cursor management + for the collection and each index, data throttling (for background validation), and general + information about the collection. + - [IndexConsistency](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.h) + descendents keep track of the number of keys detected in the record store and indexes. Detects when there + are index inconsistencies and maintains the information about the inconsistencies for + reporting. + - [ValidateAdaptor](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.h) + used to traverse the record store and indexes. Validates that the records seen are valid + `BSON` conformant to most [BSON specifications](https://bsonspec.org/spec.html). In `full` + and `checkBSONConformance` validation modes, all `BSON` checks, including the time-consuming + ones, will be enabled. +- If a `full` validation was requested, we run the storage engines validation hooks at this point to + allow a more thorough check to be performed. +- Validates the [collection’s in-memory](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/collection.h) + state with the [durable catalog](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/storage/durable_catalog.h#L242-L243) + entry information to ensure there are [no mismatches](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/collection_validation.cpp#L363-L425) + between the two. +- [Initializes all the cursors](https://github.com/mongodb/mongo/blob/07765dda62d4709cddc9506ea378c0d711791b57/src/mongo/db/catalog/validate_state.cpp#L144-L205) + on the `RecordStore` and `SortedDataInterface` of each index in the `ValidateState` object. + - We choose a read timestamp (`ReadSource`) based on the validation mode: `kNoTimestamp` + for foreground validation and `kProvided` for background validation. +- Traverses the `RecordStore` using the `ValidateAdaptor` object. + - [Validates each record and adds the document's index key set to the IndexConsistency objects](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L61-L140) + for consistency checks at later stages. + - In an effort to reduce the memory footprint of validation, the `IndexConsistency` objects + [hashes](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L307-L309) + the keys (or paths) passed in to one of many buckets. + - Document keys (or paths) will + [increment](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L204-L214) + the respective bucket. + - Index keys (paths) will + [decrement](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L239-L248) + the respective bucket. + - Checks that the `RecordId` is in [increasing order](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L305-L308). + - [Adjusts the fast count](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L348-L353) + stored in the `RecordStore` (when performing a foreground validation only). +- Traverses the index entries for each index in the collection. + - [Validates the index key order to ensure that index entries are in increasing or decreasing order](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/validate_adaptor.cpp#L144-L188). + - Adds the index key to the `IndexConsistency` objects for consistency checks at later stages. +- After the traversals are finished, the `IndexConsistency` objects are checked to detect any + inconsistencies between the collection and indexes. + - If a bucket has a `value of 0`, then there are no inconsistencies for the keys that hashed + there. + - If a bucket has a `value greater than 0`, then we are missing index entries. + - If a bucket has a `value less than 0`, then we have extra index entries. +- Upon detection of any index inconsistencies, the [second phase of validation](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/collection_validation.cpp#L186-L240) + is executed. If no index inconsistencies were detected, we’re finished and we report back to the + user. + - The second phase of validation re-runs the first phase and expands its memory footprint by + recording the detailed information of the keys that were inconsistent during the first phase + of validation (keys that hashed to buckets where the value was not 0 in the end). + - This is used to [pinpoint exactly where the index inconsistencies were detected](https://github.com/mongodb/mongo/blob/r4.5.0/src/mongo/db/catalog/index_consistency.cpp#L109-L202) + and to report them. ## Repair Mode Validate accepts a RepairMode flag that instructs it to attempt to fix certain index inconsistencies. Repair mode can fix inconsistencies by applying the following remediations: -* Missing index entries - * Missing keys are inserted into the index -* Extra index entries - * Extra keys are removed from the index -* Multikey documents are found for an index that is not marked multikey - * The index is marked as multikey -* Multikey documents are found that are not covered by an index's multikey paths - * The index's multikey paths are updated -* Corrupt documents - * Documents with invalid BSON are removed + +- Missing index entries + - Missing keys are inserted into the index +- Extra index entries + - Extra keys are removed from the index +- Multikey documents are found for an index that is not marked multikey + - The index is marked as multikey +- Multikey documents are found that are not covered by an index's multikey paths + - The index's multikey paths are updated +- Corrupt documents + - Documents with invalid BSON are removed Repair mode is used by startup repair to avoid rebuilding indexes. Repair mode may also be used on standalone nodes by passing `{ repair: true }` to the validate command. @@ -1954,12 +1986,15 @@ standalone nodes by passing `{ repair: true }` to the validate command. See [RepairMode](https://github.com/mongodb/mongo/blob/4406491b2b137984c2583db98068b7d18ea32171/src/mongo/db/catalog/collection_validation.h#L71). # Fast Truncation on Internal Collections + Logical deletes aren't always performant enough to keep up with inserts. To solve this, several internal collections use `CollectionTruncateMarkers` for fast, unreplicated and untimestamped [truncation](http://source.wiredtiger.com/1.4.2/classwiredtiger_1_1_session.html#a80a9ee8697a61a1ad13d893d67d981bb) of expired data, in lieu of logical document deletions. ## CollectionTruncateMarkers + CollectionTruncateMarkers are an in-memory tracking mechanism to support ranged truncates on a collection. A collection is broken up into a number of truncate markers. Each truncate marker tracks a range in the collection. Newer entries not captured by a truncate marker are tracked by an in-progress "partial marker". + ``` CollectionTruncateMarkers _______________________________________ @@ -1982,62 +2017,75 @@ Min RecordId <------------------------------------------------<--- Max RecordId Marks the end of the marker's range Most recent record at the time of marker creation ``` + A new truncate marker is created when either: + 1. An insert causes the in-progress "partial marker" segment to contain more than the minimum bytes needed for a truncate marker. - * The record inserted serves as the 'last record' of the newly created marker. + - The record inserted serves as the 'last record' of the newly created marker. 2. Partial marker expiration is supported, and an explicit call is made to transform the "partial marker" into a complete truncate marker. - * Partial marker expiration is supported for change stream collections and ensures that expired documents in a partial marker will eventually be truncated - even if writes to the namespace cease and the partial marker never meets the minimum bytes requirement. + - Partial marker expiration is supported for change stream collections and ensures that expired documents in a partial marker will eventually be truncated - even if writes to the namespace cease and the partial marker never meets the minimum bytes requirement. ### Requirements & Properties + CollectionTruncateMarkers support collections that meet the following requirements: -* Insert and truncate only. No updates or individual document deletes. -* [Clustered](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/catalog/README.md#clustered-collections) with no secondary indexes. -* RecordId's in Timestamp order. -* Deletion of content follows RecordId ordering. - * This is a general property of clustered capped collections. + +- Insert and truncate only. No updates or individual document deletes. +- [Clustered](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/catalog/README.md#clustered-collections) with no secondary indexes. +- RecordId's in Timestamp order. +- Deletion of content follows RecordId ordering. + - This is a general property of clustered capped collections. Collections who use CollectionTruncateMarkers share the following properties: -* Fast counts aren't expected to be accurate. - * Truncates don't track the count and size of documents truncated in exchange for performance gains. - * Markers are a best effort way to keep track of the size metrics and when to truncate expired data. -* Collections aren't expected to be consistent between replica set members. - * Truncates are unreplicated, and nodes may truncate ranges at different times. -* No snapshot read concern support (ex: [SERVER-78296](https://jira.mongodb.org/browse/SERVER-78296)). - * Deleting with untimestamped, unreplicated range truncation means point-in-time reads may see inconsistent data. + +- Fast counts aren't expected to be accurate. + - Truncates don't track the count and size of documents truncated in exchange for performance gains. + - Markers are a best effort way to keep track of the size metrics and when to truncate expired data. +- Collections aren't expected to be consistent between replica set members. + - Truncates are unreplicated, and nodes may truncate ranges at different times. +- No snapshot read concern support (ex: [SERVER-78296](https://jira.mongodb.org/browse/SERVER-78296)). + - Deleting with untimestamped, unreplicated range truncation means point-in-time reads may see inconsistent data. Each collection utilizing CollectionTruncateMarkers must implement its [own policy](https://github.com/mongodb/mongo/blob/r7.1.0-rc3/src/mongo/db/storage/collection_truncate_markers.h#L277) to determine when there are excess markers and it is time for truncation. ### In-Memory Initialization + At or shortly after startup, an initial set of CollectionTruncateMarkers are created for each collection. The collection is either scanned or sampled to generate initial markers. Initial truncate markers are best effort, and may hold incorrect estimates about the number of documents and bytes within each marker. Eventually, once the initial truncate markers expire, per truncate marker metrics will converge closer to the correct values. ### Collections that use CollectionTruncateMarkers -* [The oplog](#oplog-truncation) - `OplogTruncateMarkers` -* [Change stream change collections](#change-collection-truncation) - `ChangeCollectionTruncateMarkers` -* [Change stream pre images collections](#pre-images-collection-truncation) - `PreImagesTruncateMarkersPerNsUUID` + +- [The oplog](#oplog-truncation) - `OplogTruncateMarkers` +- [Change stream change collections](#change-collection-truncation) - `ChangeCollectionTruncateMarkers` +- [Change stream pre images collections](#pre-images-collection-truncation) - `PreImagesTruncateMarkersPerNsUUID` ### Change Stream Collection Truncation + Change stream collections which use CollectionTruncateMarkers -* change collection: `_config.system.change_collection`, exclusive to serverless environments. -* pre-images: `_config.system.preimages` in serverless, `config.system.preimages` in dedicated environments. + +- change collection: `_config.system.change_collection`, exclusive to serverless environments. +- pre-images: `_config.system.preimages` in serverless, `config.system.preimages` in dedicated environments. Both change stream collections have a periodic remover thread ([ChangeStreamExpiredPreImagesRemover](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/pipeline/change_stream_expired_pre_image_remover.cpp#L71), [ChangeCollectionExpiredDocumentsRemover](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_collection_expired_documents_remover.cpp)). Each remover thread: + 1. Creates the tenant's initial CollectionTruncateMarkers for the tenant if they do not yet exist - * Lazy initialization of the initial truncate markers is imperative so writes aren't blocked on startup + - Lazy initialization of the initial truncate markers is imperative so writes aren't blocked on startup 2. Iterates through each truncate marker. If a marker is expired, issues a truncate of all records older than the marker's last record, and removes the marker from the set. #### Cleanup After Unclean Shutdown + After an unclean shutdown, all expired pre-images are truncated at startup. WiredTiger truncate cannot guarantee a consistent view of previously truncated data on unreplicated, untimestamped ranges after a crash. Unlike the oplog, the change stream collections aren't logged, don't persist any special timestamps, and it's possible that previously truncated documents can resurface after shutdown. #### Change Collection Truncation + Change collections are per tenant - and there is one `ChangeCollectionTruncateMarkers` per tenant. The `ChangeStreamChangeCollectionManager` maps the UUID of a tenant's change collection to its corresponding 'ChangeCollectionTruncateMarkers'. Each tenant has a set 'expireAfterSeconds' parameter. An entry is expired if its 'wall time' is more than 'expireAfterSeconds' older than the node's current wall time. A truncate marker is expired if its last record is expired. #### Pre Images Collection Truncation + Each tenant has 1 pre-images collection. Each pre-images collection contains pre-images across all the tenant's pre-image enabled collections. -A pre-images collection is clustered by [ChangeStreamPreImageId](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/pipeline/change_stream_preimage.idl#L69), which implicitly orders pre-images first by their `'nsUUID'` (the UUID of the collection the pre-image is from), their `'ts'` (the timestamp associated with the pre-images oplog entry), and then by their `'applyOpsIndex'` (the index into the applyOps oplog entry which generated the pre-image, 0 if the pre-image isn't from an applyOps oplog entry). +A pre-images collection is clustered by [ChangeStreamPreImageId](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/pipeline/change_stream_preimage.idl#L69), which implicitly orders pre-images first by their `'nsUUID'` (the UUID of the collection the pre-image is from), their `'ts'` (the timestamp associated with the pre-images oplog entry), and then by their `'applyOpsIndex'` (the index into the applyOps oplog entry which generated the pre-image, 0 if the pre-image isn't from an applyOps oplog entry). There is a set of CollectionTruncateMarkers for each 'nsUUD' within a tenant's pre-images collection, `PreImagesTruncateMarkersPerNsUUID`. @@ -2048,16 +2096,17 @@ In a dedicated environment, a pre-image is expired if either (1) 'expireAfterSec For each tenant, `ChangeStreamExpiredPreImagesRemover` iterates over each set of `PreImagesTruncateMarkersPerNsUUID`, and issues a ranged truncate from the truncate marker's last record to the the minimum RecordId for the nsUUID when there is an expired truncate marker. ### Code spelunking starting points: -* [The CollectionTruncateMarkers class](https://github.com/mongodb/mongo/blob/r7.1.0-rc3/src/mongo/db/storage/collection_truncate_markers.h#L78) - * The main api for CollectionTruncateMarkers. -* [The OplogTruncateMarkers class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/storage/wiredtiger/wiredtiger_record_store_oplog_truncate_markers.h) - * Oplog specific truncate markers. -* [The ChangeCollectionTruncateMarkers class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_collection_truncate_markers.h#L47) - * Change stream change collection specific truncate markers. -* [The PreImagesTruncateMarkersPerNsUUID class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_stream_pre_images_truncate_markers_per_nsUUID.h#L62) - * Truncate markers for a given nsUUID captured within a pre-images collection. -* [The PreImagesTruncateManager class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_stream_pre_images_truncate_manager.h#L70) - * Manages pre image truncate markers for each tenant. + +- [The CollectionTruncateMarkers class](https://github.com/mongodb/mongo/blob/r7.1.0-rc3/src/mongo/db/storage/collection_truncate_markers.h#L78) + - The main api for CollectionTruncateMarkers. +- [The OplogTruncateMarkers class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/storage/wiredtiger/wiredtiger_record_store_oplog_truncate_markers.h) + - Oplog specific truncate markers. +- [The ChangeCollectionTruncateMarkers class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_collection_truncate_markers.h#L47) + - Change stream change collection specific truncate markers. +- [The PreImagesTruncateMarkersPerNsUUID class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_stream_pre_images_truncate_markers_per_nsUUID.h#L62) + - Truncate markers for a given nsUUID captured within a pre-images collection. +- [The PreImagesTruncateManager class](https://github.com/10gen/mongo/blob/r7.1.0-rc3/src/mongo/db/change_stream_pre_images_truncate_manager.h#L70) + - Manages pre image truncate markers for each tenant. # Oplog Collection @@ -2192,21 +2241,21 @@ pipeline stage or restarting the process. Metrics are not collected for all operations. The following limitations apply: -* Only operations from user connections collect metrics. For example, internal connections from - other replica set members do not collect metrics. -* Metrics are only collected for a specific set of commands. Those commands override the function - `Command::collectsResourceConsumptionMetrics()`. -* Metrics for write operations are only collected on primary nodes. - * This includes TTL index deletions. -* All attempted write operations collect metrics. This includes writes that fail or retry internally - due to write conflicts. -* Read operations are attributed to the replication state of a node. Read metrics are broken down - into whether they occurred in the primary or secondary replication states. -* Index builds collect metrics. Because index builds survive replication state transitions, they - only record aggregated metrics if the node is currently primary when the index build completes. -* Metrics are not collected on `mongos` and are not supported or tested in sharded environments. -* Storage engines other than WiredTiger do not implement metrics collection. -* Metrics are not adjusted after replication rollback. +- Only operations from user connections collect metrics. For example, internal connections from + other replica set members do not collect metrics. +- Metrics are only collected for a specific set of commands. Those commands override the function + `Command::collectsResourceConsumptionMetrics()`. +- Metrics for write operations are only collected on primary nodes. + - This includes TTL index deletions. +- All attempted write operations collect metrics. This includes writes that fail or retry internally + due to write conflicts. +- Read operations are attributed to the replication state of a node. Read metrics are broken down + into whether they occurred in the primary or secondary replication states. +- Index builds collect metrics. Because index builds survive replication state transitions, they + only record aggregated metrics if the node is currently primary when the index build completes. +- Metrics are not collected on `mongos` and are not supported or tested in sharded environments. +- Storage engines other than WiredTiger do not implement metrics collection. +- Metrics are not adjusted after replication rollback. ## Document and Index Entry Units @@ -2229,6 +2278,7 @@ tunable with the server parameters `documentUnitSizeBytes` and `indexEntryUnitSi For writes, the code also calculates a special combined document and index unit. The code attempts to associate index writes with an associated document write, and takes those bytes collectively to calculate units. For each set of bytes written, a unit is calculated as the following: + ``` units = ceil (set bytes / unit size in bytes) ``` @@ -2242,7 +2292,6 @@ following document bytes. The `totalUnitWriteSizeBytes` server parameter affects the unit calculation size for the above calculation. - ## CPU Time Operations that collect metrics will also collect the amount of active CPU time spent on the command @@ -2313,22 +2362,23 @@ Clustered collections store documents ordered by their cluster key on the Record key must currently be `{_id: 1}` and unique. Clustered collections may be created with the `clusteredIndex` collection creation option. The `clusteredIndex` option accepts the following formats: -* A document that specifies the clustered index configuration. - ``` - {clusteredIndex: {key: {_id: 1}, unique: true}} - ``` -* A legacy boolean parameter for backwards compatibility with 5.0 time-series collections. - ``` - {clusteredIndex: true} - ``` + +- A document that specifies the clustered index configuration. + ``` + {clusteredIndex: {key: {_id: 1}, unique: true}} + ``` +- A legacy boolean parameter for backwards compatibility with 5.0 time-series collections. + ``` + {clusteredIndex: true} + ``` Like a secondary TTL index, clustered collections can delete old data when created with the `expireAfterSeconds` collection creation option. Unlike regular collections, clustered collections do not require a separate index from cluster key -values to `RecordId`s, so they lack an index on _id. While a regular collection must access two +values to `RecordId`s, so they lack an index on \_id. While a regular collection must access two different tables to read or write to a document, a clustered collection requires a single table -access. Queries over the _id key use bounded collection scans when no other index is available. +access. Queries over the \_id key use bounded collection scans when no other index is available. ## Time Series Collections @@ -2337,7 +2387,7 @@ A time-series collection is a view of an internal clustered collection named values are ObjectId's. The TTL monitor will only delete data from a time-series bucket collection when a bucket's minimum -time, _id, is past the expiration plus the bucket maximum time span (default 1 hour). This +time, \_id, is past the expiration plus the bucket maximum time span (default 1 hour). This procedure avoids deleting buckets with data that is not older than the expiration time. For more information on time-series collections, see the [timeseries/README][]. @@ -2373,6 +2423,7 @@ right-to-left over up to 4 bytes, using the lower 7 bits of a byte, the high bit continuation bit. # Glossary + **binary comparable**: Two values are binary comparable if the lexicographical order over their byte representation, from lower memory addresses to higher addresses, is the same as the defined ordering for that type. For example, ASCII strings are binary comparable, but double precision floating point @@ -2395,7 +2446,7 @@ advancing past oplog holes. Tracks in-memory oplog holes. **oplogTruncateAfterPoint**: The timestamp after which oplog entries will be truncated during startup recovery after an unclean shutdown. Tracks persisted oplog holes. -**snapshot**: A snapshot consists of a consistent view of data in the database. In MongoDB, a +**snapshot**: A snapshot consists of a consistent view of data in the database. In MongoDB, a snapshot consists of all data committed with a timestamp less than or equal to the snapshot's timestamp. @@ -2404,7 +2455,7 @@ of the database, and that all writes in a transaction had no conflicts with othe if the transaction commits. **storage transaction**: A concept provided by a pluggable storage engine through which changes to -data in the database can be performed. In order to satisfy the MongoDB pluggable storage engine +data in the database can be performed. In order to satisfy the MongoDB pluggable storage engine requirements for atomicity, consistency, isolation, and durability, storage engines typically use some form of transaction. In contrast, a multi-document transaction in MongoDB is a user-facing feature providing similar guarantees across many nodes in a sharded cluster; a storage transaction @@ -2421,54 +2472,55 @@ only provides guarantees within one node. Creating a collection (record store) or index requires two WT operations that cannot be made atomic/transactional. A WT table must be created with -[WT_SESSION::create](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb -"WiredTiger Docs") and an insert/update must be made in the \_mdb\_catalog table (MongoDB's +[WT_SESSION::create](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb "WiredTiger Docs") and an insert/update must be made in the \_mdb_catalog table (MongoDB's catalog). MongoDB orders these as such: + 1. Create the WT table -1. Update \_mdb\_catalog to reference the table +1. Update \_mdb_catalog to reference the table Note that if the process crashes in between those steps, the collection/index creation never succeeded. Upon a restart, the WT table is dangling and can be safely deleted. Dropping a collection/index follows the same pattern, but in reverse. -1. Delete the table from the \_mdb\_catalog + +1. Delete the table from the \_mdb_catalog 1. [Drop the WT table](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#adf785ef53c16d9dcc77e22cc04c87b70 "WiredTiger Docs") -In this case, if a crash happens between these steps and the change to the \_mdb\_catalog was made -durable (in modern versions, only possible via a checkpoint; the \_mdb\_catalog is not logged), the +In this case, if a crash happens between these steps and the change to the \_mdb_catalog was made +durable (in modern versions, only possible via a checkpoint; the \_mdb_catalog is not logged), the WT table is once again dangling on restart. Note that in the absense of a history, this state is indistinguishable from the creation case, establishing a strong invariant. ## Cherry-picked WT log Details -- The WT log is a write ahead log. Before a [transaction commit](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a712226eca5ade5bd123026c624468fa2 "WiredTiger Docs") returns to the application, logged writes -must have their log entry bytes written into WiredTiger's log buffer. Depending on `sync` setting, -those bytes may or may not be on disk. -- MongoDB only chooses to log writes to a subset of WT's tables (e.g: the oplog). -- MongoDB does not `sync` the log on transaction commit. But rather uses the [log - flush](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a1843292630960309129dcfe00e1a3817 - "WiredTiger Docs") API. This optimization is two-fold. Writes that do not require to be - persisted do not need to wait for durability on disk. Second, this pattern allows for batching - of writes to go to disk for improved throughput. -- WiredTiger's log is similar to MongoDB's oplog in that multiple writers can concurrently copy - their bytes representing a log record into WiredTiger's log buffer similar to how multiple - MongoDB writes can concurrently generate oplog entries. -- MongoDB's optime generator for the oplog is analogous to WT's LSN (log sequence number) - generator. Both are a small critical section to ensure concurrent writes don't get the same - timestamp key/memory address to write an oplog entry value/log bytes into. -- While MongoDB's oplog writes are logical (the key is a timestamp), WT's are obviously more -physical (the key is a memory->disk location). WiredTiger is writing to a memory buffer. Thus before a -transaction commit can go to the log buffer to "request a slot", it must know how many bytes it's -going to write. Compare this to a multi-statement transaction replicating as a single applyOps -versus each statement generating an individual oplog entry for each write that's part of the -transaction. -- MongoDB testing sometimes uses a [WT debugging - option](https://github.com/mongodb/mongo/blob/a7bd84dc5ad15694864526612bceb3877672d8a9/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L601 - "Github") that will write "no-op" log entries for other operations performed on a - transaction. Such as setting a timestamp or writing to a table that is not configured to be - written to WT's log (e.g: a typical user collection and index). + +- The WT log is a write ahead log. Before a [transaction commit](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a712226eca5ade5bd123026c624468fa2 "WiredTiger Docs") returns to the application, logged writes + must have their log entry bytes written into WiredTiger's log buffer. Depending on `sync` setting, + those bytes may or may not be on disk. +- MongoDB only chooses to log writes to a subset of WT's tables (e.g: the oplog). +- MongoDB does not `sync` the log on transaction commit. But rather uses the [log + flush](https://source.wiredtiger.com/develop/struct_w_t___s_e_s_s_i_o_n.html#a1843292630960309129dcfe00e1a3817 "WiredTiger Docs") API. This optimization is two-fold. Writes that do not require to be + persisted do not need to wait for durability on disk. Second, this pattern allows for batching + of writes to go to disk for improved throughput. +- WiredTiger's log is similar to MongoDB's oplog in that multiple writers can concurrently copy + their bytes representing a log record into WiredTiger's log buffer similar to how multiple + MongoDB writes can concurrently generate oplog entries. +- MongoDB's optime generator for the oplog is analogous to WT's LSN (log sequence number) + generator. Both are a small critical section to ensure concurrent writes don't get the same + timestamp key/memory address to write an oplog entry value/log bytes into. +- While MongoDB's oplog writes are logical (the key is a timestamp), WT's are obviously more + physical (the key is a memory->disk location). WiredTiger is writing to a memory buffer. Thus before a + transaction commit can go to the log buffer to "request a slot", it must know how many bytes it's + going to write. Compare this to a multi-statement transaction replicating as a single applyOps + versus each statement generating an individual oplog entry for each write that's part of the + transaction. +- MongoDB testing sometimes uses a [WT debugging + option](https://github.com/mongodb/mongo/blob/a7bd84dc5ad15694864526612bceb3877672d8a9/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L601 "Github") that will write "no-op" log entries for other operations performed on a + transaction. Such as setting a timestamp or writing to a table that is not configured to be + written to WT's log (e.g: a typical user collection and index). The most important WT log entry for MongoDB is one that represents an insert into the oplog. + ``` { "lsn" : [1,57984], "hdr_flags" : "compressed", @@ -2495,31 +2547,33 @@ oplog. ] } ``` -- `lsn` is a log sequence number. The WiredTiger log files are named with numbers as a - suffix, e.g: `WiredTigerLog.0000000001`. In this example, the LSN's first value `1` maps to log - file `0000000001`. The second value `57984` is the byte offset in the file. -- `hdr_flags` stands for header flags. Think HTTP headers. MongoDB configures WiredTiger to use - snappy compression on its journal entries. Small journal entries (< 128 bytes?) won't be - compressed. -- `rec_len` is the number of bytes for the record -- `type` is...the type of journal entry. The type will be `commit` for application's committing a - transaction. Other types are typically for internal WT operations. Examples include `file_sync`, - `checkpoint` and `system`. -- `txnid` is WT's transaction id associated with the log record. -- `ops` is a list of operations that are part of the transaction. A transaction that inserts two - documents and removes a third will see three entries. Two `row_put` operations followed by a - `row_remove`. -- `ops.fileid` refers to the WT table that the operation is performed against. The fileid mapping - is held in the `WiredTiger.wt` file (a table within itself). This value is faked for WT's - logging debug mode for tables which MongoDB is not logging. -- `ops.key` and `ops.value` are the binary representations of the inserted document (`value` is omitted - for removal). -- `ops.key-hex` and `ops.value-bson` are specific to the pretty printing tool used. + +- `lsn` is a log sequence number. The WiredTiger log files are named with numbers as a + suffix, e.g: `WiredTigerLog.0000000001`. In this example, the LSN's first value `1` maps to log + file `0000000001`. The second value `57984` is the byte offset in the file. +- `hdr_flags` stands for header flags. Think HTTP headers. MongoDB configures WiredTiger to use + snappy compression on its journal entries. Small journal entries (< 128 bytes?) won't be + compressed. +- `rec_len` is the number of bytes for the record +- `type` is...the type of journal entry. The type will be `commit` for application's committing a + transaction. Other types are typically for internal WT operations. Examples include `file_sync`, + `checkpoint` and `system`. +- `txnid` is WT's transaction id associated with the log record. +- `ops` is a list of operations that are part of the transaction. A transaction that inserts two + documents and removes a third will see three entries. Two `row_put` operations followed by a + `row_remove`. +- `ops.fileid` refers to the WT table that the operation is performed against. The fileid mapping + is held in the `WiredTiger.wt` file (a table within itself). This value is faked for WT's + logging debug mode for tables which MongoDB is not logging. +- `ops.key` and `ops.value` are the binary representations of the inserted document (`value` is omitted + for removal). +- `ops.key-hex` and `ops.value-bson` are specific to the pretty printing tool used. [copy-on-write]: https://en.wikipedia.org/wiki/Copy-on-write [Multiversion concurrency control]: https://en.wikipedia.org/wiki/Multiversion_concurrency_control ## Table of MongoDB <-> WiredTiger <-> Log version numbers + | MongoDB | WiredTiger | Log | | ---------------------- | ---------- | --- | | 3.0.15 | 2.5.3 | 1 | diff --git a/src/mongo/db/exec/sbe/README.md b/src/mongo/db/exec/sbe/README.md index f1d8375b720..2bec03a504f 100644 --- a/src/mongo/db/exec/sbe/README.md +++ b/src/mongo/db/exec/sbe/README.md @@ -20,12 +20,12 @@ shared, nor do they have identity (i.e. variables with the same numeric value ar different entities). Some SBE values are [modeled off of BSONTypes](https://github.com/mongodb/mongo/blob/f2b093acd48aee3c63d1a0e80a101eeb9925834a/src/mongo/bson/bsontypes.h#L63-L114) while others represent internal C++ types such as -[collators](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L216-L217). +[collators](https://github.com/mongodb/mongo/blob/d19ea3f3ff51925e3b45c593217f8901373e4336/src/mongo/db/exec/sbe/values/value.h#L216-L217). One type that deserves a special mention is `Nothing`, which indicates the absence of a value. It is often used in SBE to indicate that a result cannot be computed instead of raising an exception (similar to the [Maybe -Monad](https://en.wikipedia.org/wiki/Monad_(functional_programming)#An_example:_Maybe) in many +Monad]() in many functional programming languages). Values are identified by a [1 byte @@ -46,39 +46,39 @@ class](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd EExpressions form a tree and their goal is to produce values during evaluation. It's worth noting that EExpressions aren't tied to expressions in the Mongo Query Language, rather, they are meant to be building blocks that can be combined to express arbitrary query language semantics. Below is an -overview of the different EExpression types: +overview of the different EExpression types: -- [EConstant](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L251-L279): - As the name suggests, this expression type stores a single, immutable SBE value. An `EConstant` - manages the value's lifetime (that is, it releases the value's memory on destruction if - necessary). -- [EPrimUnary and - EPrimBinary](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L324-L414): - These expressions represent basic logical, arithmetic, and comparison operations that take one and - two arguments, respectively. -- [EIf](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L440-L461): - Represents an 'if then else' expression. -- [EFunction](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L416-L438): - Represents a named, built-in function supported natively by the engine. At the time of writing, there are over [150 such - functions](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.cpp#L564-L567). - Note that function parameters are evaluated first and then are passed as arguments to the - function. -- [EFail](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L511-L534): - Represents an exception and produces a query fatal error if reached at query runtime. It supports numeric error codes and error strings. -- [ENumericConvert](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L536-L566): - Represents the conversion of an arbitrary value to a target numeric type. -- [EVariable](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L281-L319) - Provides the ability to reference a variable defined elsewhere. -- [ELocalBind](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L463-L485) - Provides the ability to define multiple variables in a local scope. They are particularly useful - when we want to reference some intermediate value multiple times. -- [ELocalLambda](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L487-L507) - Represents an anonymous function which takes a single input parameter. Many `EFunctions` accept - these as parameters. A good example of this is the [`traverseF` - function](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L1329-L1357): - it accepts 2 parameters: an input and an `ELocalLambda`. If the input is an array, the - `ELocalLambda` is applied to each element in the array, otherwise, it is applied to the input on - its own. +- [EConstant](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L251-L279): + As the name suggests, this expression type stores a single, immutable SBE value. An `EConstant` + manages the value's lifetime (that is, it releases the value's memory on destruction if + necessary). +- [EPrimUnary and + EPrimBinary](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L324-L414): + These expressions represent basic logical, arithmetic, and comparison operations that take one and + two arguments, respectively. +- [EIf](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L440-L461): + Represents an 'if then else' expression. +- [EFunction](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L416-L438): + Represents a named, built-in function supported natively by the engine. At the time of writing, there are over [150 such + functions](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.cpp#L564-L567). + Note that function parameters are evaluated first and then are passed as arguments to the + function. +- [EFail](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L511-L534): + Represents an exception and produces a query fatal error if reached at query runtime. It supports numeric error codes and error strings. +- [ENumericConvert](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L536-L566): + Represents the conversion of an arbitrary value to a target numeric type. +- [EVariable](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L281-L319) + Provides the ability to reference a variable defined elsewhere. +- [ELocalBind](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L463-L485) + Provides the ability to define multiple variables in a local scope. They are particularly useful + when we want to reference some intermediate value multiple times. +- [ELocalLambda](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L487-L507) + Represents an anonymous function which takes a single input parameter. Many `EFunctions` accept + these as parameters. A good example of this is the [`traverseF` + function](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/vm/vm.cpp#L1329-L1357): + it accepts 2 parameters: an input and an `ELocalLambda`. If the input is an array, the + `ELocalLambda` is applied to each element in the array, otherwise, it is applied to the input on + its own. EExpressions cannot be executed directly. Rather, [they are compiled](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/expressions/expression.h#L81-L84) @@ -102,6 +102,7 @@ For more details about the VM, including how `EExpression` compilation and `Byte in detail, please reference [the Virtual Machine section below](#virtual-machine). ## Slots + To make use of SBE values (either those produced by executing `ByteCode`, or those maintained elsewhere), we need a mechanism to reference them throughout query execution. This is where slots come into play: A slot is a mechanism for reading and writing values at query runtime. Each slot is @@ -109,10 +110,11 @@ come into play: A slot is a mechanism for reading and writing values at query ru SlotId](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L41-L48). Put another way, slots conceptually represent values that we care about during query execution, including: -- Records and RecordIds retrieved from a collection -- The result of evaluating an expression -- The individual components of a sort key (where each component is bound to its own slot) -- The result of executing some computation expressed in the input query + +- Records and RecordIds retrieved from a collection +- The result of evaluating an expression +- The individual components of a sort key (where each component is bound to its own slot) +- The result of executing some computation expressed in the input query SlotIds by themselves don't provide a means to access or set values, rather, [slots are associated with @@ -120,21 +122,21 @@ SlotAccessors](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5 which provide the API to read the values bound to slots as well as to write new values into slots. There are several types of SlotAccessors, but the most common are the following: -- The - [`OwnedValueAccessor`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L113-L212) - allows for ownership of values. That is, this accessor is responsible for constructing/destructing - values (in the case of deep values, this involves allocating/releasing memory). Note that an - `OwnedValueAccessor` _can_ own values, but is not required to do so. -- The - [`ViewOfValueAccessor`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L81-L111) - provides a way to read values that are owned elsewhere. +- The + [`OwnedValueAccessor`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L113-L212) + allows for ownership of values. That is, this accessor is responsible for constructing/destructing + values (in the case of deep values, this involves allocating/releasing memory). Note that an + `OwnedValueAccessor` _can_ own values, but is not required to do so. +- The + [`ViewOfValueAccessor`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/values/slot.h#L81-L111) + provides a way to read values that are owned elsewhere. While a value bound to a slot can only be managed by a single `OwnedValueAccessor`, any number of `ViewOfValueAccessors` can be initialized to read the value associated with that slot. A good example of the distinction between these two types of SlotAccessors is a query plan which performs a blocking sort over a collection scan. Suppose we are scanning a restaurants collection -and we wish to find the highest rated restaurants. Such a query execution plan might look like: +and we wish to find the highest rated restaurants. Such a query execution plan might look like: ``` sort [output = s1] [sortBy = s2] @@ -161,31 +163,34 @@ stages (as opposed to a push-based model where stages offer data to parent stage A single `PlanStage` may have any number of children and performs some action, implements some algorithm, or maintains some execution state, such as: -- Computing values bound to slots -- Managing the lifetime of values in slots -- Executing compiled `ByteCode` -- Buffering values into memory + +- Computing values bound to slots +- Managing the lifetime of values in slots +- Executing compiled `ByteCode` +- Buffering values into memory SBE PlanStages also follow an iterator model and perform query execution through the following steps: -- First, a caller prepares a PlanStage tree for execution by calling `prepare()`. -- Once the tree is prepared, the caller then calls `open()` to initialize the tree with any state - needed for query execution. Note that this may include performing actual execution work, as is - done by stages such as `HashAggStage` and `SortStage`. -- With the PlanStage tree initialized, query execution can now proceed through iterative calls to - `getNext()`. Note that the result set can be obtained in between calls to `getNext()` by reading - values from slots. -- Finally, `close()` is called to indicate that query execution is complete and release resources. -The following subsections describe [the PlanStage API](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/stages.h#L557-L651) introduced above in greater detail: +- First, a caller prepares a PlanStage tree for execution by calling `prepare()`. +- Once the tree is prepared, the caller then calls `open()` to initialize the tree with any state + needed for query execution. Note that this may include performing actual execution work, as is + done by stages such as `HashAggStage` and `SortStage`. +- With the PlanStage tree initialized, query execution can now proceed through iterative calls to + `getNext()`. Note that the result set can be obtained in between calls to `getNext()` by reading + values from slots. +- Finally, `close()` is called to indicate that query execution is complete and release resources. -### `virtual void prepare(CompileCtx& ctx) = 0;` +The following subsections describe [the PlanStage API](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/stages.h#L557-L651) introduced above in greater detail: + +### `virtual void prepare(CompileCtx& ctx) = 0;` This method prepares a `PlanStage` (and, recursively, its children) for execution by: - - Performing slot resolution, that is, obtaining `SlotAccessors` for all slots that this stage - references and verifying that all slot accesses are valid. Typically, this is done by asking - child stages for a `SlotAccessor*` via `getAccessor()`. - - Compiling `EExpressions` into executable `ByteCode`. Note that `EExpressions` can reference slots - through the `ctx` parameter. + +- Performing slot resolution, that is, obtaining `SlotAccessors` for all slots that this stage + references and verifying that all slot accesses are valid. Typically, this is done by asking + child stages for a `SlotAccessor*` via `getAccessor()`. +- Compiling `EExpressions` into executable `ByteCode`. Note that `EExpressions` can reference slots + through the `ctx` parameter. ### `virtual value::SlotAccessor* getAccessor(CompileCtx& ctx, value::SlotId slot) = 0;` @@ -203,18 +208,20 @@ slots in said subtree invalid. For more details on slot resolution, consult [the section](#slot-resolution). ### `virtual void open(bool reOpen) = 0;` + ### `virtual void close() = 0;` These two methods mirror one another. While `open()` acquires necessary resources before `getNext()` can be called (that is, before `PlanStage` execution can begin), `close()` releases any resources acquired during `open()`. Acquiring resources for query execution can include actions such as: -- Opening storage engine cursors. -- Allocating memory. -- Populating a buffer with results by exhausting child stages. This is often done by blocking stages - which require processing their input to produce results. For example, the - [`SortStage`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/sort.cpp#L340-L349) - needs to sort all of the values produced by its children (either in memory or on disk) before it - can produce results in sorted order. + +- Opening storage engine cursors. +- Allocating memory. +- Populating a buffer with results by exhausting child stages. This is often done by blocking stages + which require processing their input to produce results. For example, the + [`SortStage`](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/sort.cpp#L340-L349) + needs to sort all of the values produced by its children (either in memory or on disk) before it + can produce results in sorted order. It is only legal to call `close()` on PlanStages that have called `open()`, and to call `open()` on PlanStages that are closed. In some cases (such as in @@ -222,7 +229,7 @@ PlanStages that are closed. In some cases (such as in a parent stage may `open()` and `close()` a child stage repeatedly. However, doing so may be expensive and ultimately redundant. This is where the `reOpen` parameter of `open()` comes in: when set to `true`, it provides the opportunity to execute an optimized a sequence of `close()` and -`open()` calls. +`open()` calls. A good example of this is the [HashAgg stage](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/hash_agg.cpp#L426): @@ -239,8 +246,8 @@ on child stages as needed and update the values held in slots that belong to it. `ADVANCED` to indicate that `getNext()` can still be called and `IS_EOF` to indicate that no more calls to `getNext()` can be made (that is, this `PlanStage` has completed execution). Importantly, `PlanStage::getNext()` does _not_ return any results directly. Rather, it updates the state of -slots, which can then be read to obtain the results of the query. - +slots, which can then be read to obtain the results of the query. + At the time of writing, there are 36 PlanStages. As such, only a handful of common stages' `getNext()` implementations are described below: @@ -285,7 +292,7 @@ subsequent `getNext()` calls until `IS_EOF` is returned. This stage supports [Ri Joins](https://github.com/mongodb/mongo/blob/dbbabbdc0f3ef6cbb47500b40ae235c1258b741a/src/mongo/db/exec/sbe/stages/loop_join.h#L47). Note that slots from the outer stage can be made visible to [inner stage via -LoopJoinStage::_outerCorrelated](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/loop_join.cpp#L105-L107), +LoopJoinStage::\_outerCorrelated](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933bd522d3/src/mongo/db/exec/sbe/stages/loop_join.cpp#L105-L107), which adds said slots to the `CompileCtx` during `prepare()`. Conceptually, this is similar to the rules around scoped variables in for loops in many programming languages: @@ -297,9 +304,10 @@ for (let [outerSlot1, outerSlot2] of outerSlots) { } } ``` - In the example above, the declaration of `res1` is invalid because values on the inner side are not - visible outside of the inner loop, while the declaration of `res2` is valid because values on the - outer side are visible to the inner side. + +In the example above, the declaration of `res1` is invalid because values on the inner side are not +visible outside of the inner loop, while the declaration of `res2` is valid because values on the +outer side are visible to the inner side. Also note that in the example above, the logical result of `LoopJoinStage` is a pairing of the tuple of slots visible from the outer side along with the tuple of slots from the inner side. @@ -313,7 +321,7 @@ Documents: { "_id" : 0, "name" : "Mihai Andrei", "major" : "Computer Science", "year": 2019} { "_id" : 1, "name" : "Jane Doe", "major" : "Computer Science", "year": 2020} -Indexes: +Indexes: {"major": 1} ``` @@ -324,6 +332,7 @@ db.alumni.find({"major" : "Computer Science", "year": 2020}); ``` The query plan chosen by the classic optimizer, represented as a `QuerySolution` tree, to answer this query is as follows: + ``` { "stage" : "FETCH", @@ -357,9 +366,11 @@ The query plan chosen by the classic optimizer, represented as a `QuerySolution` } } ``` + In particular, it is an `IXSCAN` over the `{"major": 1}` index, followed by a `FETCH` and a filter of `year = 2020`. The SBE plan (generated by the [SBE stage builder](#sbe-stage-builder) with the [plan cache](#sbe-plan-cache) disabled) for this query plan is as follows: + ``` *** SBE runtime environment slots *** $$RESULT=s7 env: { s1 = Nothing (nothing), s6 = {"major" : 1} } @@ -380,15 +391,15 @@ the numbers in brackets correspond to the `QuerySolutionNode` that each SBE `Pla `FETCH`). We can represent the state of query execution in SBE by a table that shows the values bound to slots -at a point in time: +at a point in time: -|Slot |Name |Value|Owned by| -|-----|-----|-----|--------------| -s2 | Seek RID slot | `Nothing` | `ixseek` -s5 | Index key slot | `Nothing` | `ixseek` -s7 | Record slot | `Nothing` | `seek` -s8 | RecordId slot | `Nothing` | `seek` -s9 | Slot for the field 'year' | `Nothing` | `seek` +| Slot | Name | Value | Owned by | +| ---- | ------------------------- | --------- | -------- | +| s2 | Seek RID slot | `Nothing` | `ixseek` | +| s5 | Index key slot | `Nothing` | `ixseek` | +| s7 | Record slot | `Nothing` | `seek` | +| s8 | RecordId slot | `Nothing` | `seek` | +| s9 | Slot for the field 'year' | `Nothing` | `seek` | Initially, all slots hold a value of `Nothing`. Note also that some slots have been omitted for brevity, namely, s3, s4, and s6 (which correspond to a `SnapshotId`, an index identifier and an @@ -402,13 +413,13 @@ calling `getNext()` on the inner `seek` stage. Following the specified index bou will seek to the `{"": "Computer Science"}` index key and fill out slots `s2` and `s5`. At this point, our slots are bound to the following values: -|Slot |Name |Value|Owned by| -|-----|-----|-----|--------------| -s2 | Seek RID slot | `` | `ixseek` -s5 | Index key slot | `{"": "Computer Science"}` | `ixseek` -s7 | Record slot | `Nothing` | `seek` -s8 | RecordId slot | `Nothing` | `seek` -s9 | Slot for the field 'year' | `Nothing` | `seek` +| Slot | Name | Value | Owned by | +| ---- | ------------------------- | -------------------------- | -------- | +| s2 | Seek RID slot | `` | `ixseek` | +| s5 | Index key slot | `{"": "Computer Science"}` | `ixseek` | +| s7 | Record slot | `Nothing` | `seek` | +| s8 | RecordId slot | `Nothing` | `seek` | +| s9 | Slot for the field 'year' | `Nothing` | `seek` | After `ixseek` returns `ADVANCED`, `nlj` will call `getNext` on the child `limit` stage, which will return `IS_EOF` after one call to `getNext()` on `seek` (in this way, a `limit 1 + seek` plan @@ -416,13 +427,13 @@ executes a logical `FETCH`). The ScanStage will seek its cursor to the RecordId that RID is not the same as `_id`), bind values to slots for the RecordId, Record, and the value for the field 'year', and finally return `ADVANCED`. Our slots now look like so: -|Slot |Name |Value|Owned by| -|-----|-----|-----|--------------| -s2 | Seek RID slot | `` | `ixseek` -s5 | Index key slot | `{"": "Computer Science"}` | `ixseek` -s7 | Record slot | `{ "_id" : 0, "name" : "Mihai Andrei",`
`"major" : "Computer Science", "year": 2019}` | `seek` -s8 | RecordId slot | `` | `seek` -s9 | Slot for the field 'year' | `2019` | `seek` +| Slot | Name | Value | Owned by | +| ---- | ------------------------- | ------------------------------------------------------------------------------------------ | -------- | +| s2 | Seek RID slot | `` | `ixseek` | +| s5 | Index key slot | `{"": "Computer Science"}` | `ixseek` | +| s7 | Record slot | `{ "_id" : 0, "name" : "Mihai Andrei",`
`"major" : "Computer Science", "year": 2019}` | `seek` | +| s8 | RecordId slot | `` | `seek` | +| s9 | Slot for the field 'year' | `2019` | `seek` | Note that although `s8` and `s2` hold the same value (the RecordId for `_id: 0`), they represent different entities. `s2` holds the starting point for our `seek` stage (provided by `ixseek`), @@ -432,7 +443,7 @@ read, which is also surfaced externally as the query result (provided that our ` `seek` will return control to `nlj`, which returns control to `filter`. We now have a value for `s9` and can execute our filter expression. When executing the `ByteCode` for our filter expression with `s9` bound to 2019, the result is `false` because 2019 is not equal to 2020. As such, the `filter` -stage must call `getNext()` on `nlj` once more. +stage must call `getNext()` on `nlj` once more. The slot tables which result from the next call to `FilterStage::getNext()` are left as an exercise to the reader. @@ -446,34 +457,41 @@ README](https://github.com/mongodb/mongo/blob/06a931ffadd7ce62c32288d03e5a38933b for more details. ## Incomplete Sections Below (TODO) -## Runtime Planners + +## Runtime Planners Outline: + ### `MultiPlanner` + ### `CachedSolutionPlanner` + ### `SubPlanner` ## Virtual Machine Outline: -- Compilation of EExpressions - - Frames/Labels -- ByteCode Execution - - Dispatch of instructions - - Parameter resolution - - Management of values + +- Compilation of EExpressions + - Frames/Labels +- ByteCode Execution + - Dispatch of instructions + - Parameter resolution + - Management of values ## Slot Resolution Outline: -- Binding reflectors -- Other SlotAccessor types (`SwitchAccessor`, `MaterializedRowAccessor`) + +- Binding reflectors +- Other SlotAccessor types (`SwitchAccessor`, `MaterializedRowAccessor`) ## Yielding Outline: -- What is yielding and why we yield -- `doSaveState()/doRestoreState()` -- Index Key Consistency/Corruption checks -## Block Processing \ No newline at end of file +- What is yielding and why we yield +- `doSaveState()/doRestoreState()` +- Index Key Consistency/Corruption checks + +## Block Processing diff --git a/src/mongo/db/ftdc/README.md b/src/mongo/db/ftdc/README.md index 64ca3b63f67..fa91c732ea8 100644 --- a/src/mongo/db/ftdc/README.md +++ b/src/mongo/db/ftdc/README.md @@ -2,7 +2,7 @@ ## Table of Contents -- [High Level Overview](#high-level-overview) +- [High Level Overview](#high-level-overview) ## High Level Overview @@ -23,9 +23,9 @@ process information for the controller. These sets of collector objects are stor object, allowing all the data to be collected through one call to collect on the `FTDCCollectorCollection`. There are two sets of `FTDCCollectorCollection` objects on the controller: -[_periodicCollectors](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/ftdc/controller.h#L200-L201) +[\_periodicCollectors](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/ftdc/controller.h#L200-L201) that collects data at a specified time interval, and -[_rotateCollectors](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/ftdc/controller.h#L207-L208) +[\_rotateCollectors](https://github.com/mongodb/mongo/blob/r4.4.0/src/mongo/db/ftdc/controller.h#L207-L208) that collects one set of data every time a file is created. At specified time intervals, the FTDC Controller calls collect on the `_periodicCollectors` diff --git a/src/mongo/db/process_health/README.md b/src/mongo/db/process_health/README.md index c5e6f06e1ca..c20ebc76c29 100644 --- a/src/mongo/db/process_health/README.md +++ b/src/mongo/db/process_health/README.md @@ -2,72 +2,79 @@ This module is capable to run server health checks and crash an unhealthy server. -*Note:* in 4.4 release only the mongos proxy server is supported +_Note:_ in 4.4 release only the mongos proxy server is supported ## Health Observers -*Health Observers* are designed for every particular check to run. Each observer can be configured to be on/off and critical or not to be able to crash the serer on error. Each observer has a configurable interval of how often it will run the checks. +_Health Observers_ are designed for every particular check to run. Each observer can be configured to be on/off and critical or not to be able to crash the serer on error. Each observer has a configurable interval of how often it will run the checks. ## Health Observers Parameters -- healthMonitoringIntensities: main configuration for each observer. Can be set at startup and changed at runtime. Valid values: - - off: this observer if off - - critical: if the observer detects a failure, the process will crash - - non-critical: if the observer detects a failure, the error will be logged and the process will not crash +- healthMonitoringIntensities: main configuration for each observer. Can be set at startup and changed at runtime. Valid values: + - off: this observer if off + - critical: if the observer detects a failure, the process will crash + - non-critical: if the observer detects a failure, the error will be logged and the process will not crash - Example as startup parameter: - ``` - mongos --setParameter "healthMonitoringIntensities={ \"values\" : [{ \"type\" : \"ldap\", \"intensity\" : \"critical\" } ]}" - ``` + Example as startup parameter: - Example as runtime change command: - ``` - db.adminCommand({ "setParameter": 1, - healthMonitoringIntensities: {values: - [{type: "ldap", intensity: "critical"}] } }); - ``` + ``` + mongos --setParameter "healthMonitoringIntensities={ \"values\" : [{ \"type\" : \"ldap\", \"intensity\" : \"critical\" } ]}" + ``` -- healthMonitoringIntervals: how often this health observer will run, in milliseconds. + Example as runtime change command: - Example as startup parameter: - ``` - mongos --setParameter "healthMonitoringIntervals={ \"values\" : [ { \"type\" : \"ldap\", \"interval\" : 30000 } ] }" - ``` - here LDAP health observer is configured to run every 30 seconds. + ``` + db.adminCommand({ "setParameter": 1, + healthMonitoringIntensities: {values: + [{type: "ldap", intensity: "critical"}] } }); + ``` - Example as runtime change command: - ``` - db.adminCommand({"setParameter": 1, "healthMonitoringIntervals":{"values": [{"type":"ldap", "interval": 30000}]} }); - ``` +- healthMonitoringIntervals: how often this health observer will run, in milliseconds. + + Example as startup parameter: + + ``` + mongos --setParameter "healthMonitoringIntervals={ \"values\" : [ { \"type\" : \"ldap\", \"interval\" : 30000 } ] }" + ``` + + here LDAP health observer is configured to run every 30 seconds. + + Example as runtime change command: + + ``` + db.adminCommand({"setParameter": 1, "healthMonitoringIntervals":{"values": [{"type":"ldap", "interval": 30000}]} }); + ``` ## LDAP Health Observer LDAP Health Observer checks all configured LDAP servers that at least one of them is up and running. At every run, it creates new connection to every configured LDAP server and runs a simple query. The LDAP health observer is using the same parameters as described in the **LDAP Authorization** section of the manual. -To enable this observer, use the *healthMonitoringIntensities* and *healthMonitoringIntervals* parameters as described above. The recommended value for the LDAP monitoring interval is 30 seconds. - +To enable this observer, use the _healthMonitoringIntensities_ and _healthMonitoringIntervals_ parameters as described above. The recommended value for the LDAP monitoring interval is 30 seconds. ## Active Fault -When a failure is detected, and the observer is configured as *critical*, the server will wait for the configured interval before crashing. The interval from the failure detection and crash is configured with *activeFaultDurationSecs* parameter: +When a failure is detected, and the observer is configured as _critical_, the server will wait for the configured interval before crashing. The interval from the failure detection and crash is configured with _activeFaultDurationSecs_ parameter: -- activeFaultDurationSecs: how long to wait from the failure detection to crash, in seconds. This can be configured at startup and changed at runtime. +- activeFaultDurationSecs: how long to wait from the failure detection to crash, in seconds. This can be configured at startup and changed at runtime. - Example: - ``` - db.adminCommand({"setParameter": 1, activeFaultDurationSecs: 300}); - ``` + Example: + + ``` + db.adminCommand({"setParameter": 1, activeFaultDurationSecs: 300}); + ``` ## Progress Monitor -*Progress Monitor* detects that every health check is not stuck, without returning either success or failure. If a health check starts and does not complete the server will crash. This behavior could be configured with: +_Progress Monitor_ detects that every health check is not stuck, without returning either success or failure. If a health check starts and does not complete the server will crash. This behavior could be configured with: -- progressMonitor: configure the progress monitor. Values: - - *interval*: how often to run the liveness check, in milliseconds - - *deadline*: timeout before crashing the server if a health check is not making progress, in seconds +- progressMonitor: configure the progress monitor. Values: - Example: - ``` - mongos --setParameter "progressMonitor={ \"interval\" : 1000, \"deadline\" : 300 }" - ``` + - _interval_: how often to run the liveness check, in milliseconds + - _deadline_: timeout before crashing the server if a health check is not making progress, in seconds + + Example: + + ``` + mongos --setParameter "progressMonitor={ \"interval\" : 1000, \"deadline\" : 300 }" + ``` diff --git a/src/mongo/db/query/README.md b/src/mongo/db/query/README.md index 8c3ad590f9e..28b94b56161 100644 --- a/src/mongo/db/query/README.md +++ b/src/mongo/db/query/README.md @@ -1,6 +1,6 @@ # Query System Internals -*Disclaimer*: This is a work in progress. It is not complete and we will +_Disclaimer_: This is a work in progress. It is not complete and we will do our best to complete it in a timely manner. # Overview @@ -13,29 +13,29 @@ distinct, and mapReduce. Here we will divide it into the following phases and topics: - * **Command Parsing & Validation:** Which arguments to the command are - recognized and do they have the right types? - * **Query Language Parsing & Validation:** More complex parsing of - elements like query predicates and aggregation pipelines, which are - skipped in the first section due to complexity of parsing rules. - * **Query Optimization** - * **Normalization and Rewrites:** Before we try to look at data - access paths, we perform some simplification, normalization and - "canonicalization" of the query. - * **Index Tagging:** Figure out which indexes could potentially be - helpful for which query predicates. - * **Plan Enumeration:** Given the set of associated indexes and - predicates, enumerate all possible combinations of assignments - for the whole query tree and output a draft query plan for each. - * **Plan Compilation:** For each of the draft query plans, finalize - the details. Pick index bounds, add any necessary sorts, fetches, - or projections - * **Plan Selection:** Compete the candidate plans against each other - and select the winner. - * [**Plan Caching:**](#plan-caching) Attempt to skip the expensive steps above by - caching the previous winning solution. - * **Query Execution:** Iterate the winning plan and return results to the - client. +- **Command Parsing & Validation:** Which arguments to the command are + recognized and do they have the right types? +- **Query Language Parsing & Validation:** More complex parsing of + elements like query predicates and aggregation pipelines, which are + skipped in the first section due to complexity of parsing rules. +- **Query Optimization** + - **Normalization and Rewrites:** Before we try to look at data + access paths, we perform some simplification, normalization and + "canonicalization" of the query. + - **Index Tagging:** Figure out which indexes could potentially be + helpful for which query predicates. + - **Plan Enumeration:** Given the set of associated indexes and + predicates, enumerate all possible combinations of assignments + for the whole query tree and output a draft query plan for each. + - **Plan Compilation:** For each of the draft query plans, finalize + the details. Pick index bounds, add any necessary sorts, fetches, + or projections + - **Plan Selection:** Compete the candidate plans against each other + and select the winner. + - [**Plan Caching:**](#plan-caching) Attempt to skip the expensive steps above by + caching the previous winning solution. +- **Query Execution:** Iterate the winning plan and return results to the + client. In this documentation we focus on the process for a single node or replica set where all the data is expected to be found locally. We plan @@ -46,14 +46,15 @@ directory later. The following commands are generally maintained by the query team, with the majority of our focus given to the first two. -* find -* aggregate -* count -* distinct -* mapReduce -* update -* delete -* findAndModify + +- find +- aggregate +- count +- distinct +- mapReduce +- update +- delete +- findAndModify The code path for each of these starts in a Command, named something like MapReduceCommand or FindCmd. You can generally find these in @@ -105,7 +106,7 @@ This file (specified in a YAML format) is used to generate C++ code. Our build system will run a python tool to parse this YAML and spit out C++ code which is then compiled and linked. This code is left in a file ending with '\_gen.h' or '\_gen.cpp', for example -'count\_command\_gen.cpp'. You'll notice that things like whether it is +'count_command_gen.cpp'. You'll notice that things like whether it is optional, the type of the field, and any defaults are included here, so we don't have to write any code to handle that. @@ -173,12 +174,12 @@ further to a point where we understand which stages are involved. This is actually a special case, and we use a class called the `LiteParsedPipeline` for this and other similar purposes. - The `LiteParsedPipeline` class is constructed via a semi-parse which +The `LiteParsedPipeline` class is constructed via a semi-parse which only goes so far as to tease apart which stages are involved. It is a very simple model of an aggregation pipeline, and is supposed to be cheaper to construct than doing a full parse. As a general rule of thumb, we try to keep expensive things from happening until after we've -verified the user has the required privileges to do those things. +verified the user has the required privileges to do those things. This simple model can be used for requests we want to inspect before proceeding and building a full model of the user's query or request. As @@ -215,6 +216,7 @@ Once we have parsed the command and checked authorization, we move on to parsing parts of the query. Once again, we will focus on the find and aggregate commands. ## Find command parsing + The find command is parsed entirely by the IDL. The IDL parser first creates a FindCommandRequest. As mentioned above, the IDL parser does all of the required type checking and stores all options for the query. The FindCommandRequest is then turned into a CanonicalQuery. The CanonicalQuery @@ -231,12 +233,14 @@ both done here. ## Aggregate Command Parsing ### LiteParsedPipeline + In the process of parsing an aggregation we create two versions of the pipeline: a LiteParsedPipeline (that contains LiteParsedDocumentSource objects) and the Pipeline (that contains -DocumentSource objects) that is eventually used for execution. See the above section on +DocumentSource objects) that is eventually used for execution. See the above section on authorization checking for more details. ### DocumentSource + Before talking about the aggregate command as a whole, we will first briefly discuss the concept of a DocumentSource. A DocumentSource represents one stage in the an aggregation pipeline. For each stage in the pipeline, we create another DocumentSource. A DocumentSource @@ -248,6 +252,7 @@ validation of its internal fields and arguments and then generates the DocumentS added to the final pipeline. ### Pipeline + The pipeline parser uses the individual document source parsers to parse the entire pipeline argument of the aggregate command. The parsing process is fairly simple -- for each object in the user specified pipeline lookup the document source parser for the stage name, and then parse the @@ -255,6 +260,7 @@ object using that parser. The final pipeline is composed of the DocumentSources individual parsers. ### Aggregation Command + When an aggregation is run, the first thing that happens is the request is parsed into a LiteParsedPipeline. As mentioned above, the LiteParsedPipeline is used to check options and permissions on namespaces. More checks are done in addition to those performed by the @@ -264,22 +270,23 @@ above. Note that we use the original BSON for parsing the pipeline and DocumentS to continuing from the LiteParsedPipeline. This could be improved in the future. ## Other command parsing + As mentioned above, there are several other commands maintained by the query team. We will quickly give a summary of how each is parsed, but not get into the same level of detail. -* count : Parsed by IDL and then turned into a CountStage which can be executed in a similar way to - a find command. -* distinct : The distinct specific arguments are parsed by IDL into DistinctCommandRequest. Generic - command arguments and 'query' field are parsed by custom code into ParsedDistinctCommand, then - being used to construct CanonicalDistinct and eventually turned into executable stage. -* mapReduce : Parsed by IDL and then turned into an equivalent aggregation command. -* update : Parsed by IDL. An update command can contain both query (find) and pipeline syntax - (for updates) which each get delegated to their own parsers. -* delete : Parsed by IDL. The filter portion of the of the delete command is delegated to the find - parser. -* findAndModify : Parsed by IDL. The findAndModify command can contain find and update syntax. The - query portion is delegated to the query parser and if this is an update (rather than a delete) it - uses the same parser as the update command. +- count : Parsed by IDL and then turned into a CountStage which can be executed in a similar way to + a find command. +- distinct : The distinct specific arguments are parsed by IDL into DistinctCommandRequest. Generic + command arguments and 'query' field are parsed by custom code into ParsedDistinctCommand, then + being used to construct CanonicalDistinct and eventually turned into executable stage. +- mapReduce : Parsed by IDL and then turned into an equivalent aggregation command. +- update : Parsed by IDL. An update command can contain both query (find) and pipeline syntax + (for updates) which each get delegated to their own parsers. +- delete : Parsed by IDL. The filter portion of the of the delete command is delegated to the find + parser. +- findAndModify : Parsed by IDL. The findAndModify command can contain find and update syntax. The + query portion is delegated to the query parser and if this is an update (rather than a delete) it + uses the same parser as the update command. # Plan caching @@ -428,7 +435,7 @@ queries that project a known limited subset of fields, run in SBE and aren't eli indexes. For other heuristics limiting use of CSI see [`querySatisfiesCsiPlanningHeuristics()`](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/query/query_planner.cpp#L250). -Scanning of CSI is implemented by the +Scanning of CSI is implemented by the [`columnscan`](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/exec/sbe/stages/column_scan.h) SBE stage. Unlike `ixscan` the plans that use `columnscan` don't include a separate fetch stage as the columnstore indexes are optimistically assumed to be covering for the scenarios when they are @@ -447,7 +454,7 @@ which filters can be pushed down see ### Tests -*JS Tests:* Most CSI related tests can be found in `jstests/core/columnstore` folder or by searching +_JS Tests:_ Most CSI related tests can be found in `jstests/core/columnstore` folder or by searching for tests that create an index with the "columnstore" tag. There are also [`core_column_store_indexes`](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/buildscripts/resmokeconfig/suites/core_column_store_indexes.yml) and [`aggregation_column_store_index_passthrough`](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/buildscripts/resmokeconfig/suites/aggregation_column_store_index_passthrough.yml) @@ -470,6 +477,7 @@ does not sort by value but rather by path with `RecordId` postfix. A `ColumnStor entries per path in a collection document: Example input documents: + ``` { _id: new ObjectId("..."), @@ -489,7 +497,9 @@ Example input documents: viewed: true } ``` + High-level view of the column store data format: + ``` (_id\01, {vals: [ ObjectId("...") ]}) (_id\02, {vals: [ ObjectId("...") ]}) @@ -531,24 +541,24 @@ that the storage engine will receive. _Code spelunking entry points:_ -* The [IndexAccessMethod](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/index_access_method.h) -is invoked by the [IndexCatalogImpl](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/catalog/index_catalog_impl.cpp#L1714-L1715). +- The [IndexAccessMethod](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/index_access_method.h) + is invoked by the [IndexCatalogImpl](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/catalog/index_catalog_impl.cpp#L1714-L1715). -* The [ColumnStoreAccessMethod](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.h#L39), -note the [write paths](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.cpp#L269-L286) -that use the ColumnKeyGenerator. +- The [ColumnStoreAccessMethod](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.h#L39), + note the [write paths](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.cpp#L269-L286) + that use the ColumnKeyGenerator. -* The [ColumnKeyGenerator](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.h#L146) -produces many [UnencodedCellView](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.h#L111) -via the [ColumnShredder](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.cpp#L163-L176) -with the [ColumnProjectionTree & ColumnProjectionNode](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.h#L46-L101) -classes defining the desired path projections. +- The [ColumnKeyGenerator](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.h#L146) + produces many [UnencodedCellView](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.h#L111) + via the [ColumnShredder](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.cpp#L163-L176) + with the [ColumnProjectionTree & ColumnProjectionNode](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_key_generator.h#L46-L101) + classes defining the desired path projections. -* The [column_cell.h/cpp](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_cell.h#L44-L50) -helpers are leveraged throughout -[ColumnStoreAccessMethod write methods](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.cpp#L281) -to encode ColumnKeyGenerator UnencodedCellView cells into final buffers for storage write. +- The [column_cell.h/cpp](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/column_cell.h#L44-L50) + helpers are leveraged throughout + [ColumnStoreAccessMethod write methods](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.cpp#L281) + to encode ColumnKeyGenerator UnencodedCellView cells into final buffers for storage write. -* The ColumnStoreAccessMethod -[invokes the WiredTigerColumnStore](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.cpp#L318) -with the final encoded path-cell (key-value) entries for storage. +- The ColumnStoreAccessMethod + [invokes the WiredTigerColumnStore](https://github.com/mongodb/mongo/blob/r6.3.0-alpha/src/mongo/db/index/columns_access_method.cpp#L318) + with the final encoded path-cell (key-value) entries for storage. diff --git a/src/mongo/db/query/optimizer/README.md b/src/mongo/db/query/optimizer/README.md index 22c7ccb4898..b618c1dcd42 100644 --- a/src/mongo/db/query/optimizer/README.md +++ b/src/mongo/db/query/optimizer/README.md @@ -1,52 +1,62 @@ # About + This directory contains all the logic related to query optimization in the new common query framework. It contains the models for representing a query and the logic for implementing optimization via a cascades framework. # Testing + Developers working on the new optimizer may wish to run a subset of the tests which is exclusively focused on this codebase. This section details the relevant -tests. +tests. + ## Unit Tests + The following C++ unit tests exercise relevant parts of the codebase: -- algebra_test (src/mongo/db/query/optimizer/algebra/) -- db_pipeline_test (src/mongo/db/pipeline/) -- - This test suite includes many unrelated test cases, but - 'abt/abt_translation_test.cpp' and 'abt/abt_optimization_test.cpp' are the relevant ones. -- optimizer_test (src/mongo/db/query/optimizer/) -- sbe_abt_test (src/mongo/db/exec/sbe/abt/) +- algebra_test (src/mongo/db/query/optimizer/algebra/) +- db_pipeline_test (src/mongo/db/pipeline/) +- - This test suite includes many unrelated test cases, but + 'abt/abt_translation_test.cpp' and 'abt/abt_optimization_test.cpp' are the relevant ones. +- optimizer_test (src/mongo/db/query/optimizer/) +- sbe_abt_test (src/mongo/db/exec/sbe/abt/) These can be compiled with targets like 'build/install/bin/algebra_test', although the exact name will depend on your 'installDir' which you have configured with SCons. It may look more like 'build/opt/install/bin/algebra_test'. If you want to build and run at once, you can use the '+' shortcut to ninja, like so: + ``` ninja +algebra_test +db_pipeline_test +optimizer_test +sbe_abt_test ``` ## JS Integration Tests + In addition to the above unit tests, the following JS suites are helpful in exercising this codebase: -- **cqf**: [buildscripts/resmokeconfig/suites/cqf.yml](/buildscripts/resmokeconfig/suites/cqf.yml) -- **cqf_disabled_pipeline_opt**: + +- **cqf**: [buildscripts/resmokeconfig/suites/cqf.yml](/buildscripts/resmokeconfig/suites/cqf.yml) +- **cqf_disabled_pipeline_opt**: [buildscripts/resmokeconfig/suites/cqf_disabled_pipeline_opt.yml](/buildscripts/resmokeconfig/suites/cqf_disabled_pipeline_opt.yml) -- **cqf_parallel**: [buildscripts/resmokeconfig/suites/cqf_parallel.yml](/buildscripts/resmokeconfig/suites/cqf_parallel.yml) -- **query_golden_cqf**: [buildscripts/resmokeconfig/suites/query_golden_cqf.yml](/buildscripts/resmokeconfig/suites/query_golden_cqf.yml) +- **cqf_parallel**: [buildscripts/resmokeconfig/suites/cqf_parallel.yml](/buildscripts/resmokeconfig/suites/cqf_parallel.yml) +- **query_golden_cqf**: [buildscripts/resmokeconfig/suites/query_golden_cqf.yml](/buildscripts/resmokeconfig/suites/query_golden_cqf.yml) Desriptions of these suites can be found in [buildscripts/resmokeconfig/evg_task_doc/evg_task_doc.yml](/buildscripts/resmokeconfig/evg_task_doc/evg_task_doc.yml). You may run these like so, adjusting the `-j` flag for the appropriate level of parallel execution for your machine. + ``` ./buildscripts/resmoke.py run -j4 \ --suites=cqf,cqf_disabled_pipeline_opt,cqf_parallel,query_golden_cqf ``` ## Local Testing Recommendation + Something like this command may be helpful for local testing: + ``` ninja install-devcore build/install/bin/algebra_test \ build/install/bin/db_pipeline_test build/install/bin/optimizer_test \ @@ -57,15 +67,17 @@ build/install/bin/sbe_abt_test \ && ./build/install/bin/sbe_abt_test \ && ./buildscripts/resmoke.py run --suites=cqf,cqf_parallel,cqf_disabled_pipeline_opt,query_golden_cqf -j4 ``` + **Note:** You may need to adjust the path to the unit test binary targets if your SCons install directory is something more like `build/opt/install/bin`. ## Evergreen Testing Recommendation + In addition to the above suites, there is a patch-only variant which enables the CQF feature flag on a selection of existing suites. The variant, "Query (all feature flags and CQF enabled)", runs all the tasks from the recommended all-feature-flags variants. When testing on evergreen, you may want a combination of passthrough tests from the CQF variant and CQF-targeted tests (like the -integration suites mentioned above and unit tests) on interesting variants such as ASAN. +integration suites mentioned above and unit tests) on interesting variants such as ASAN. You can define local evergreen aliases to make scheduling these tasks easier and faster than selecting them individually on each evergreen patch. A discussion of evergreen aliases can be diff --git a/src/mongo/db/query/query_shape/README.md b/src/mongo/db/query/query_shape/README.md index b3c02d28e69..1723cb51641 100644 --- a/src/mongo/db/query/query_shape/README.md +++ b/src/mongo/db/query/query_shape/README.md @@ -1,19 +1,24 @@ # Query Shape + A query shape is a transformed version of a command with literal values replaced by a "canonical" BSON Type placeholder. Hence, different instances of a command would be considered to have the same query shape if they are identical once their literal values are abstracted. For example, these two queries would have the same shape: + ```js db.example.findOne({x: 24}); db.example.findOne({x: 53}); ``` + While these queries would each have a distinct shape: + ```js db.example.findOne({x: 53, y: 1}); db.example.findOne({x: 53}); db.example.findOne({x: "string"}); ``` + While different literal _values_ result in the same shape (matching `x` for 23 vs 53), different BSON _types_ of the literal are considered distinct shapes (matching `x` for 53 vs "string"). @@ -28,33 +33,35 @@ You can see which components are considered part of the query shape or not for e type in their respective "shape component" classes, whose purpose is to determine which components are relevant and should be included for determining the shape for specific type of command. The structure is as follows: -- [`CmdSpecificShapeComponents`](query_shape.h#L65) - - [`LetShapeComponent`](cmd_with_let_shape.h#L48) - - [`AggCmdShapeComponents`](agg_cmd_shape.h#L82) - - [`FindCmdShapeComponents`](find_cmd_shape.h#L48) + +- [`CmdSpecificShapeComponents`](query_shape.h#L65) + - [`LetShapeComponent`](cmd_with_let_shape.h#L48) + - [`AggCmdShapeComponents`](agg_cmd_shape.h#L82) + - [`FindCmdShapeComponents`](find_cmd_shape.h#L48) See more information for the different shapes in their respective classes, structured as follows: -- [`Shape`](query_shape.h) - - [`CmdWithLetShape`](cmd_with_let_shape.h) - - [`AggCmdShape`](agg_cmd_shape.h) - - [`FindCmdShape`](find_cmd_shape.h) + +- [`Shape`](query_shape.h) + - [`CmdWithLetShape`](cmd_with_let_shape.h) + - [`AggCmdShape`](agg_cmd_shape.h) + - [`FindCmdShape`](find_cmd_shape.h) ## Serialization Options + `SerializationOptions` describes the way we serialize literal values. There are 3 different serialization options: -- `kUnchanged`: literals are serialized unmodified - - `{x: 5, y: "hello"}` -> `{x: 5, y: "hello"}` -- `kToDebugTypeString`: human readable format, type string of the literal is serialized - - `{x: 5, y: "hello"}` -> `{x: "?number", y: "?string"}` -- `kToRepresentativeParseableValue`: literal serialized to one canonical value for given type, which - must be parseable - - `{x: 5, y: "hello"}` -> `{x: 1, y: "?"}` - - An example of a query which is serialized differently due to the parseable requirement is `{x: - {$regex: "^p.*"}}`. If we serialized the pattern as if it were a normal string we would end up - with `{x: {$regex: "?"}}` however `"?"` is not a valid regex pattern, so this would fail - parsing. Instead we will serialize it this way to maintain parseability, `{x: {$regex: - "\\?"}}`, since `"\\?"` is valid regex. + +- `kUnchanged`: literals are serialized unmodified + - `{x: 5, y: "hello"}` -> `{x: 5, y: "hello"}` +- `kToDebugTypeString`: human readable format, type string of the literal is serialized + - `{x: 5, y: "hello"}` -> `{x: "?number", y: "?string"}` +- `kToRepresentativeParseableValue`: literal serialized to one canonical value for given type, which + must be parseable - `{x: 5, y: "hello"}` -> `{x: 1, y: "?"}` - An example of a query which is serialized differently due to the parseable requirement is `{x: +{$regex: "^p.*"}}`. If we serialized the pattern as if it were a normal string we would end up + with `{x: {$regex: "?"}}` however `"?"` is not a valid regex pattern, so this would fail + parsing. Instead we will serialize it this way to maintain parseability, `{x: {$regex: +"\\?"}}`, since `"\\?"` is valid regex. See [serialization_options.h](serialization_options.h) for more details. diff --git a/src/mongo/db/query/query_stats/README.md b/src/mongo/db/query/query_stats/README.md index 6f2667fbfd9..f8f34e67617 100644 --- a/src/mongo/db/query/query_stats/README.md +++ b/src/mongo/db/query/query_stats/README.md @@ -1,4 +1,5 @@ # Query Stats + This directory is the home of the infrastructure related to recording runtime query statistics for the database. It is not to be confused with `src/mongo/db/query/stats/` which is the home of the logic for computing and maintaining statistics about a collection or index's data distribution - for @@ -11,18 +12,22 @@ query stats key and will be collected on any mongod or mongos process for which including primaries and secondaries. ## QueryStatsStore + At the center of everything here is the [`QueryStatsStore`](query_stats.h#93-97), which is a partitioned hash table that maps the hash of a [Query Stats Key](#glossary) (also known as the _Query Stats Store Key_) to some metrics about how often each one occurs. ### Computing the Query Stats Store Key + A query stats store key contains various dimensions that distinctify a specific query. One main attribute to the query stats store key, is the query shape (`query_shape::Shape`). For example, if the client does this: + ```js db.example.findOne({x: 24}); db.example.findOne({x: 53}); ``` + then the `QueryStatsStore` should contain an entry for a single query shape which would record 2 executions and some related statistics (see [`QueryStatsEntry`](query_stats_entry.h) for details). @@ -31,11 +36,13 @@ For more information on query shape, see the [query_shape](../query_shape/README The query stats store has _more_ dimensions (i.e. more granularity) to group incoming queries than just the query shape. For example, these queries would all three have the same shape but the first would have a different query stats store entry from the other two: + ```js db.example.find({x: 55}); db.example.find({x: 55}).batchSize(2); db.example.find({x: 55}).batchSize(3); ``` + There are two distinct query stats store entries here - both the examples which include the batch size will be treated separately from the example which does not specify a batch size. @@ -46,6 +53,7 @@ we accumulate statistics. As one example, you can find the `FindCmdQueryStatsStoreKeyComponents` (including `batchSize` shown in this example). ### Query Stats Store Cache Size + The size of the`QueryStatsStore` can be set by the server parameter [`internalQueryStatsCacheSize`](#server-parameters), and the partitions will be created based off that. See [`queryStatsStoreManagerRegisterer`](query_stats.cpp#L138-L154) for more details about how @@ -55,8 +63,9 @@ will be evicted to drop below the max size. Eviction will be tracked in the new metrics](#server-status-metrics) for queryStats. ## Metric Collection + At a high level, when a query is run and collection of query stats is enabled, during planning we -call [`registerRequest`]((query_stats.h#L195-L198)) in which the query stats store key will be +call [`registerRequest`](<(query_stats.h#L195-L198)>) in which the query stats store key will be generated based on the query's shape and the various other dimensions. The key will always be serialized and stored on the `opDebug`, and also on the cursor in the case that there are `getMore`s, so that we can continue to aggregate the operation's metrics. Once the query execution is fully complete, @@ -65,6 +74,7 @@ the key from the store if it exists and update it, or create a new one and add i more details in the [comments](query_stats.h#L158-L216). ### Rate Limiting + Whether or not query stats will be recorded for a specific query execution depends on a Rate Limiter, which limits the number of recordings per second based on the server parameter [internalQueryStatsRateLimit](#server-parameters). The goal of the rate limiter is to minimize @@ -74,32 +84,42 @@ be updated in the query stats store. Our rate limiter uses the sliding window al [here](rate_limiting.h#82-87). ## Metric Retrieval + To retrieve the stats gathered in the `QueryStatsStore`, there is a new aggregation stage, `$queryStats`. This stage must be the first in a pipeline and it must be run against the admin database. The structure of the command is as follows (note `aggregate: 1` reflecting there is no collection): + ```js db.adminCommand({ aggregate: 1, - pipeline: [{ - $queryStats: { - tranformIdentifiers: { - algorithm: "hmac-sha-256", - hmacKey: BinData(8, "87c4082f169d3fef0eef34dc8e23458cbb457c3sf3n2") /* bindata + pipeline: [ + { + $queryStats: { + tranformIdentifiers: { + algorithm: "hmac-sha-256", + hmacKey: BinData( + 8, + "87c4082f169d3fef0eef34dc8e23458cbb457c3sf3n2", + ) /* bindata subtype 8 - a new type for sensitive data */, - } - } - }] -}) + }, + }, + }, + ], +}); ``` + `transformIdentifiers` is optional. If not present, we will generate the regular Query Stats Key. If present: -- `algorithm` is required and the only currently supported option is "hmac-sha-256". -- `hmacKey` is required -- We will generate the [One-way Tokenized](#glossary) Query Stats Key by applying the "hmac-sha-256" - to the names of any field, collection, or database. Application Name field is not transformed. + +- `algorithm` is required and the only currently supported option is "hmac-sha-256". +- `hmacKey` is required +- We will generate the [One-way Tokenized](#glossary) Query Stats Key by applying the "hmac-sha-256" + to the names of any field, collection, or database. Application Name field is not transformed. The query stats store will output one document for each query stats key, which is structured in the following way: + ```js { key: {/* Query Stats Key */}, @@ -115,63 +135,73 @@ following way: } } ``` -- `key`: Query Stats Key. -- `asOf`: UTC time when $queryStats read this entry from the store. This will not return the same - UTC time for each result. The data structure used for the store is partitioned, and each partition - will be read at a snapshot individually. You may see up to the number of partitions in unique - timestamps returned by one $queryStats cursor. -- `metrics`: the metrics collected; these may be flawed due to: - - Server restarts, which will reset metrics. - - LRU eviction, which will reset metrics. - - Rate limiting, which will skew metrics. -- `metrics.execCount`: Number of recorded observations of this query. -- `metrics.firstSeenTimestamp`: UTC time taken at query completion (including getMores) for the - first recording of this query stats store entry. -- `metrics.lastSeenTimestamp`: UTC time taken at query completion (including getMores) for the - latest recording of this query stats store entry. -- `metrics.docsReturned`: Various broken down metrics for the number of documents returned by - observation of this query. -- `metrics.firstResponseExecMicros`: Estimated time spent computing and returning the first batch. -- `metrics.totalExecMicros`: Estimated time spent computing and returning all batches, which is the - same as the above for single-batch queries. -- `metrics.lastExecutionMicros`: Estimated time spent processing the latest query (akin to - "totalExecMicros", not "firstResponseExecMicros"). + +- `key`: Query Stats Key. +- `asOf`: UTC time when $queryStats read this entry from the store. This will not return the same + UTC time for each result. The data structure used for the store is partitioned, and each partition + will be read at a snapshot individually. You may see up to the number of partitions in unique + timestamps returned by one $queryStats cursor. +- `metrics`: the metrics collected; these may be flawed due to: + - Server restarts, which will reset metrics. + - LRU eviction, which will reset metrics. + - Rate limiting, which will skew metrics. +- `metrics.execCount`: Number of recorded observations of this query. +- `metrics.firstSeenTimestamp`: UTC time taken at query completion (including getMores) for the + first recording of this query stats store entry. +- `metrics.lastSeenTimestamp`: UTC time taken at query completion (including getMores) for the + latest recording of this query stats store entry. +- `metrics.docsReturned`: Various broken down metrics for the number of documents returned by + observation of this query. +- `metrics.firstResponseExecMicros`: Estimated time spent computing and returning the first batch. +- `metrics.totalExecMicros`: Estimated time spent computing and returning all batches, which is the + same as the above for single-batch queries. +- `metrics.lastExecutionMicros`: Estimated time spent processing the latest query (akin to + "totalExecMicros", not "firstResponseExecMicros"). #### Permissions + `$queryStats` is restricted by two privilege actions: -- `queryStatsRead` privilege allows running `$queryStats` without passing the `transformIdentifiers` - options. -- `queryStatsReadTransformed` allows running `$queryStats` with `transformIdentifiers` set. These -two privileges are included in the clusterMonitor role in Atlas. + +- `queryStatsRead` privilege allows running `$queryStats` without passing the `transformIdentifiers` + options. +- `queryStatsReadTransformed` allows running `$queryStats` with `transformIdentifiers` set. These + two privileges are included in the clusterMonitor role in Atlas. ### Server Parameters -- `internalQueryStatsCacheSize`: - * Max query stats store size, specified as a string like "4MB" or "1%". Defaults to 1% of the - machine's total memory. - * Query stats store is a LRU cache structure with partitions, so we may be under the cap due to - implementation. -- `internalQueryStatsRateLimit`: - * The rate limit is an integer which imposes a maximum number of recordings per second. Default is - 0 which has the effect of disabling query stats collection. Setting the parameter to -1 means - there will be no rate limit. +- `internalQueryStatsCacheSize`: -- `logComponentVerbosity.queryStats`: - * Controls the logging behavior for query stats. See [Logging](#logging) for details. + - Max query stats store size, specified as a string like "4MB" or "1%". Defaults to 1% of the + machine's total memory. + - Query stats store is a LRU cache structure with partitions, so we may be under the cap due to + implementation. + +- `internalQueryStatsRateLimit`: + + - The rate limit is an integer which imposes a maximum number of recordings per second. Default is + 0 which has the effect of disabling query stats collection. Setting the parameter to -1 means + there will be no rate limit. + +- `logComponentVerbosity.queryStats`: + - Controls the logging behavior for query stats. See [Logging](#logging) for details. ### Logging + Setting `logComponentVerbosity.queryStats` will do the following for each level: -* Level 0 (default): Nothing will be logged. -* Level 1 or higher: Invocations of $queryStats will be logged if and only if the algorithm is - "hmac-sha-256". The specification of the $queryStats stage is logged, with any provided hmac key - redacted. -* Level 2 or higher: Nothing extra, reserved for future use. -* Level 3 or higher: All results of any "hmac-sha-256" $queryStats invocation are logged. Each - result will be its own entry and there will be one final entry that says "we finished". -* Levels 4 and 5 do nothing extra. + +- Level 0 (default): Nothing will be logged. +- Level 1 or higher: Invocations of $queryStats will be logged if and only if the algorithm is + "hmac-sha-256". The specification of the $queryStats stage is logged, with any provided hmac key + redacted. +- Level 2 or higher: Nothing extra, reserved for future use. +- Level 3 or higher: All results of any "hmac-sha-256" $queryStats invocation are logged. Each + result will be its own entry and there will be one final entry that says "we finished". +- Levels 4 and 5 do nothing extra. ### Server Status Metrics + The following will be added to the `serverStatus.metrics`: + ```js queryStats: { numEvicted: NumberLong(0), @@ -183,6 +213,7 @@ queryStats: { ``` # Glossary + **Query Execution**: This term implies the overall execution of what a client would consider one query, but which may or may not involve one or more getMore commands to iterate a cursor. For example, a find command and two getMore commands on the returned cursor is one query execution. An diff --git a/src/mongo/db/query/search/README.md b/src/mongo/db/query/search/README.md index 9c098882f39..50de76c5369 100644 --- a/src/mongo/db/query/search/README.md +++ b/src/mongo/db/query/search/README.md @@ -1,17 +1,21 @@ # Search + This document is a work-in-progress and just provides a high-level overview of the search implementation. [Atlas Search](https://www.mongodb.com/docs/atlas/atlas-search/) provides integrated full-text search by running queries with the $search and $searchMeta aggregation stages. You can read about the $vectorSearch aggregation stage in [vector_search](https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/vector_search/README.md). ## Lucene + Diving into the mechanics of search requires a brief rundown of [Apache Lucene](https://lucene.apache.org/) because it is the bedrock of MongoDB's search capabilities. MongoDB employees can read more about Lucene and mongot at [go/mongot](http://go/mongot). Apache Lucene is an open-source text search library, written in Java. Lucene allows users to store data in three primary ways: -* inverted index: maps each term (in a set of documents) to the documents in which the term appears, in which terms are the unique words/phrases and documents are the pieces of content being indexed. Inverted indexes offer great performance for matching search terms with documents. -* storedFields: stores all field values for one document together in a row-stride fashion. In retrieval, all field values are returned at once per document, so that loading the relevant information about a document is very fast. This is very useful for search features that are improved by row-oriented data access, like search highlighting. Search highlighting marks up the search terms and displays them within the best/most relevant sections of a document. -* DocValues: column-oriented fields with a document-to-value mapping built at index time. As it facilitates column based data access, it's faster for aggregating field values for counts and facets. + +- inverted index: maps each term (in a set of documents) to the documents in which the term appears, in which terms are the unique words/phrases and documents are the pieces of content being indexed. Inverted indexes offer great performance for matching search terms with documents. +- storedFields: stores all field values for one document together in a row-stride fashion. In retrieval, all field values are returned at once per document, so that loading the relevant information about a document is very fast. This is very useful for search features that are improved by row-oriented data access, like search highlighting. Search highlighting marks up the search terms and displays them within the best/most relevant sections of a document. +- DocValues: column-oriented fields with a document-to-value mapping built at index time. As it facilitates column based data access, it's faster for aggregating field values for counts and facets. ## `mongot` + `mongot` is a MongoDB-specific process written as a wrapper around Lucene and run on Atlas. Using Lucene, `mongot` indexes MongoDB databases to provide our customers with full text search capabilities. In the current “coupled” search architecture, one `mongot` runs alongside each `mongod` or `mongos`. Each `mongod`/`mongos` and `mongot` pair are on the same physical box/server and communicate via localhost. @@ -19,11 +23,13 @@ In the current “coupled” search architecture, one `mongot` runs alongside ea `mongot` replicates the data from its collocated `mongod` node using change streams and builds Lucene indexes on that replicated data. `mongot` is guaranteed to be eventually consistent with mongod. Check out [mongot_cursor](https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/search/mongot_cursor.h) for the core shared code that establishes and executes communication between `mongod` and `mongot`. ## Search Indexes + In order to run search queries, the user has to create a search index. Search index commands similarly use `mongod`/`mongos` server communication protocols to communicate with a remote search index server, but with an Envoy instance that handles forwarding the command requests to Atlas servers and then eventually to the relevant Lucene/`mongot` instances. `mongot` and Envoy instances are co-located with every `mongod` server instance, and Envoy instances are co-located with `mongos` servers as well. The precise structure of the search index architecture will likely evolve in future as improvements are made to that system. Search indexes can be: -* Only on specified fields ("static") -* All fields (“dynamic”) + +- Only on specified fields ("static") +- All fields (“dynamic”) `mongot` stores the indexed data exclusively, unless the customer has opted into storing entire documents (more expensive). @@ -34,6 +40,7 @@ The four commands have security authorization action types corresponding with th Note: Indexes can also be managed through the Atlas UI. ## $search and $searchMeta stages + There are two text search stages in the aggregation framework (and $search is not available for find commands). [$search](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#-search) returns the results of full-text search, and [$searchMeta](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#-searchmeta) returns metadata about search results. When used for an aggregation, either search stage must be the first stage in the pipeline. For example: ``` @@ -44,16 +51,18 @@ db.coll.aggregate([ ]); ``` -$search and $searchMeta are parsed as [DocumentSourceSearch](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_search.h) and [DocumentSourceSearchMeta](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_search_meta.h), respectively. When using the classic engine, however, DocumentSourceSearch is [desugared](https://github.com/mongodb/mongo/blob/04f19bb61aba10577658947095020f00ac1403c4/src/mongo/db/pipeline/search/document_source_search.cpp#L118) into a sequence that uses the [$_internalSearchMongotRemote stage](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.h) and, if the `returnStoredSource` option is false, the [$_internalSearchIdLookup stage](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_internal_search_id_lookup.h). In SBE, both $search and $searchMeta are lowered directly from the original document sources. +$search and $searchMeta are parsed as [DocumentSourceSearch](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_search.h) and [DocumentSourceSearchMeta](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_search_meta.h), respectively. When using the classic engine, however, DocumentSourceSearch is [desugared](https://github.com/mongodb/mongo/blob/04f19bb61aba10577658947095020f00ac1403c4/src/mongo/db/pipeline/search/document_source_search.cpp#L118) into a sequence that uses the [$\_internalSearchMongotRemote stage](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.h) and, if the `returnStoredSource` option is false, the [$\_internalSearchIdLookup stage](https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/search/document_source_internal_search_id_lookup.h). In SBE, both $search and $searchMeta are lowered directly from the original document sources. -For example, the stage `{$search: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` will desugar into the two stages: `{$_internalSearchMongotRemote: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` and `{$_internalSearchIdLookup: {}}`. +For example, the stage `{$search: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` will desugar into the two stages: `{$_internalSearchMongotRemote: {query: “chocolate”, path: “flavor”}, returnStoredSource: false}` and `{$_internalSearchIdLookup: {}}`. -### $_internalSearchMongotRemote -$_internalSearchMongotRemote is the foundational stage for all search queries, e.g., $search and $searchMeta. This stage opens a cursor on `mongot` ([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L269)) and retrieves results one-at-a-time from the cursor ([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L163)). +### $\_internalSearchMongotRemote + +$\_internalSearchMongotRemote is the foundational stage for all search queries, e.g., $search and $searchMeta. This stage opens a cursor on `mongot` ([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L269)) and retrieves results one-at-a-time from the cursor ([here](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/db/pipeline/search/document_source_internal_search_mongot_remote.cpp#L163)). Within this stage, the underlying [TaskExecutorCursor](https://github.com/mongodb/mongo/blob/e530c98e7d44878ed8164ee9167c28afc97067a7/src/mongo/executor/task_executor_cursor.h) acts as a black box to handle dispatching commands to `mongot` only as necessary. The cursor retrieves a batch of results from `mongot`, iterates through that batch per each `getNext` call, then schedules a `getMore` request to `mongot` whenever the previous batch is exhausted. -Each batch returned from mongot includes a batch of BSON documents and metadata about the query results. Each document contains an _id and a relevancy score. The relevancy score indicates how well the document’s indexed values matched the user query. Metadata is a user-specified group of fields with information about the result set as a whole, mostly including counts of various groups (or facets). +Each batch returned from mongot includes a batch of BSON documents and metadata about the query results. Each document contains an \_id and a relevancy score. The relevancy score indicates how well the document’s indexed values matched the user query. Metadata is a user-specified group of fields with information about the result set as a whole, mostly including counts of various groups (or facets). -### $_internalSearchIdLookup -The $_internalSearchIdLookup stage is responsible for recreating the entire document to give to the rest of the agg pipeline (in the above example, $match and $project) and for checking to make sure the data returned is up to date with the data on `mongod`, since `mongot`’s indexed data is eventually consistent with `mongod`. For example, if `mongot` returned the _id to a document that had been deleted, $_internalSearchIdLookup is responsible for catching; it won’t find a document matching that _id and then filters out that document. The stage will also perform shard filtering, where it ensures there are no duplicates from separate shards, and it will retrieve the most up-to-date field values. However, this stage doesn’t account for documents that had been inserted to the collection but not yet propagated to `mongot` via the $changeStream; that’s why search queries are eventually consistent but don’t guarantee strong consistency. \ No newline at end of file +### $\_internalSearchIdLookup + +The $\_internalSearchIdLookup stage is responsible for recreating the entire document to give to the rest of the agg pipeline (in the above example, $match and $project) and for checking to make sure the data returned is up to date with the data on `mongod`, since `mongot`’s indexed data is eventually consistent with `mongod`. For example, if `mongot` returned the \_id to a document that had been deleted, $\_internalSearchIdLookup is responsible for catching; it won’t find a document matching that \_id and then filters out that document. The stage will also perform shard filtering, where it ensures there are no duplicates from separate shards, and it will retrieve the most up-to-date field values. However, this stage doesn’t account for documents that had been inserted to the collection but not yet propagated to `mongot` via the $changeStream; that’s why search queries are eventually consistent but don’t guarantee strong consistency. diff --git a/src/mongo/db/query/timeseries/README.md b/src/mongo/db/query/timeseries/README.md index 9d501fae996..72ec1b1794f 100644 --- a/src/mongo/db/query/timeseries/README.md +++ b/src/mongo/db/query/timeseries/README.md @@ -63,9 +63,9 @@ The `timeField` will be used in the `control` object in the buckets collection. will be `control.min.t`, and `control.max.