mirror of https://github.com/mongodb/mongo
SERVER-109671 Document mongobridge (#43402)
GitOrigin-RevId: 1e4f97c95e8fb8418dc8921f5e9302f9001f1fe0
This commit is contained in:
parent
ee19cd5693
commit
f0ffa7be9a
|
|
@ -0,0 +1,192 @@
|
||||||
|
# Network Fault Injection Framework (mongobridge)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Mongobridge](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L1) is a network fault injection testing tool that allows test authors to intentionally simulate network issues such as connection failures, message delays, or packet loss during communication to any node in a cluster. It acts as a transparent proxy between MongoDB processes and their clients, enabling controlled network fault injection for testing distributed system behavior.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
When `ReplSetTest` or `ShardingTest` are instructed to use `mongobridge`, they will [set up a mongobridge process](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/libs/replsettest.js#L2962) for each node that [creates a ProxiedConnection](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L323-L324) between the node and any clients (including other nodes in the cluster) attempting to communicate with it. When test authors send a command to a node, mongobridge [intercepts the command and applies any configured actions](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/src/mongo/tools/mongobridge_tool/bridge.cpp#L395-L430) onto the commands before forwarding the command along to the node itself. This allows simple fault injection from the test author's perspective.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
To use mongobridge in your tests:
|
||||||
|
|
||||||
|
1. **Enable mongobridge** in your test setup:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
let st = new ShardingTest({
|
||||||
|
shards: {rs0: {nodes: 2}},
|
||||||
|
mongos: 1,
|
||||||
|
config: 1,
|
||||||
|
useBridge: true, // Enable mongobridge
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Test commands must be enabled**: Mongobridge's `*From` commands require `enableTestCommands: true` (which is the default in test environments)
|
||||||
|
|
||||||
|
2. **Inject network faults** using bridge commands:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Delay messages by 5 seconds
|
||||||
|
st.rs0.getPrimary().delayMessagesFrom(st.rs0.getSecondary(), 5000);
|
||||||
|
|
||||||
|
// Reject all connections
|
||||||
|
st.rs0.getPrimary().rejectConnectionsFrom(st.rs0.getSecondary());
|
||||||
|
|
||||||
|
// Restore normal behavior
|
||||||
|
st.rs0.getPrimary().acceptConnectionsFrom(st.rs0.getSecondary());
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Operations that depend on communication between the affected nodes will fail or timeout as expected.
|
||||||
|
|
||||||
|
## What to keep in mind
|
||||||
|
|
||||||
|
Be aware that there are consequences to injecting network faults between nodes that can cause downstream impact in (for example) heartbeats, sync source selection, and SDAM, and so after a fault has been injected the test may not be in the state you expect it to be in for future commands. It is best to keep mongobridge tests relatively short and targeted to ensure that flakiness due to these faults doesn't impact the rest of your testing.
|
||||||
|
|
||||||
|
## Command Reference
|
||||||
|
|
||||||
|
Mongobridge supports four commands for network fault injection:
|
||||||
|
|
||||||
|
### `acceptConnectionsFrom(bridges)`
|
||||||
|
|
||||||
|
**Purpose**: Allows normal communication from specified sources
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
node.acceptConnectionsFrom(otherNode);
|
||||||
|
node.acceptConnectionsFrom([node1, node2, node3]); // Multiple nodes
|
||||||
|
```
|
||||||
|
|
||||||
|
**Effect**: Restores normal message forwarding (default state)
|
||||||
|
|
||||||
|
### `rejectConnectionsFrom(bridges)`
|
||||||
|
|
||||||
|
**Purpose**: Immediately closes connections from specified sources
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
node.rejectConnectionsFrom(otherNode);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Effect**: New connections are rejected, existing connections are closed when a new request is sent over them
|
||||||
|
|
||||||
|
**Use case**: Simulating complete network partitions
|
||||||
|
|
||||||
|
### `delayMessagesFrom(bridges, delayMs)`
|
||||||
|
|
||||||
|
**Purpose**: Delays message forwarding by specified milliseconds
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
node.delayMessagesFrom(otherNode, 5000); // 5 second delay
|
||||||
|
node.delayMessagesFrom(otherNode, 0); // Remove delay
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
|
||||||
|
- `delayMs`: Delay in milliseconds (0 to disable)
|
||||||
|
|
||||||
|
**Use case**: Simulating slow networks or testing timeout behavior
|
||||||
|
|
||||||
|
### `discardMessagesFrom(bridges, lossProbability)`
|
||||||
|
|
||||||
|
**Purpose**: Randomly discards messages with specified probability
|
||||||
|
|
||||||
|
**Usage**:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
node.discardMessagesFrom(otherNode, 0.5); // Drop 50% of messages
|
||||||
|
node.discardMessagesFrom(otherNode, 1.0); // Drop all messages
|
||||||
|
node.discardMessagesFrom(otherNode, 0.0); // Drop no messages
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parameters**:
|
||||||
|
|
||||||
|
- `lossProbability`: Number between 0.0 (no loss) and 1.0 (total loss)
|
||||||
|
|
||||||
|
**Use case**: Simulating unreliable networks or packet loss
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Basic Network Partition Test
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
assert.eq(jsTest.options().enableTestCommands, true);
|
||||||
|
|
||||||
|
// Set up a replica set with mongobridge
|
||||||
|
let rst = new ReplSetTest({
|
||||||
|
nodes: 3,
|
||||||
|
useBridge: true,
|
||||||
|
settings: {electionTimeoutMillis: 2000, heartbeatIntervalMillis: 400},
|
||||||
|
});
|
||||||
|
rst.startSet();
|
||||||
|
rst.initiate();
|
||||||
|
|
||||||
|
// Partition the primary from secondaries
|
||||||
|
let primary = rst.getPrimary();
|
||||||
|
let secondaries = rst.getSecondaries();
|
||||||
|
primary.rejectConnectionsFrom(secondaries);
|
||||||
|
|
||||||
|
// Verify primary steps down due to lost majority
|
||||||
|
assert.soon(() => {
|
||||||
|
return rst.getPrimary() !== primary;
|
||||||
|
});
|
||||||
|
|
||||||
|
// Restore network
|
||||||
|
primary.acceptConnectionsFrom(secondaries);
|
||||||
|
|
||||||
|
rst.stopSet();
|
||||||
|
```
|
||||||
|
|
||||||
|
### Write Concern Timeout Test
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
assert.eq(jsTest.options().enableTestCommands, true);
|
||||||
|
|
||||||
|
let st = new ShardingTest({
|
||||||
|
shards: {rs0: {nodes: 2}},
|
||||||
|
useBridge: true,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Delay replication to cause write concern timeout
|
||||||
|
st.rs0.getPrimary().delayMessagesFrom(st.rs0.getSecondary(), 10000);
|
||||||
|
|
||||||
|
// This write should fail due to timeout
|
||||||
|
assert.commandFailed(
|
||||||
|
st.s0.getCollection("test.coll").insert(
|
||||||
|
{x: 1},
|
||||||
|
{
|
||||||
|
writeConcern: {w: 2, wtimeout: 5000},
|
||||||
|
},
|
||||||
|
),
|
||||||
|
);
|
||||||
|
|
||||||
|
// Restore normal replication
|
||||||
|
st.rs0.getPrimary().delayMessagesFrom(st.rs0.getSecondary(), 0);
|
||||||
|
|
||||||
|
st.stop();
|
||||||
|
```
|
||||||
|
|
||||||
|
### Simulating Packet Loss
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Set up unreliable network with 30% packet loss
|
||||||
|
primary.discardMessagesFrom(secondary, 0.3);
|
||||||
|
|
||||||
|
// Operations may succeed or fail unpredictably
|
||||||
|
// Useful for testing retry logic and resilience
|
||||||
|
```
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
- **OP_QUERY exhaust**: Not supported for legacy exhaust queries (OP_MSG exhaust cursors are supported)
|
||||||
|
- **Direct connections**: Only works when connections go through the bridge proxy
|
||||||
|
- **TLS support**: Mongobridge is not supported if the cluster is using TLS.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [mongobridge.js test example](https://github.com/mongodb/mongo/blob/e810af1916caaedb1cde8d1e1b74bb50b2461daf/jstests/noPassthrough/mongobridge/mongobridge.js#L1)
|
||||||
Loading…
Reference in New Issue