mongo/docs/cpp_style.md

1105 lines
43 KiB
Markdown

# MongoDB Server C++ Style Guide
This document describes common conventions used in the MongoDB server codebase.
The document is about C++, but there are a few places where JavaScript style is
discussed as well.
A firmly established style guide can make source files unsurprising as they are
more easily navigable and regular in shape.
Style rules can eliminate wasted time on minor issues in code reviews. An author
should endeavor to be style-compliant before sending a pull request for review.
This should accelerate code reviews and establish consistent expectations on code.
The guide is carefully considered by very experienced C++ engineers. C++ code
can be complex, and there are subtle correctness and maintainability risks that
can arise from certain antipatterns addressed by the guide. Style adherence
enables code authors and their reviewers to productively write safer code
without having to first rediscover those problems for themselves.
## Feedback (MongoDB internal)
This is maintained by the Server Programmability team.
- Use `#server-programmability` on Slack for discussion and clarifications.
Contributors outside of MongoDB can use Jira instead.
- For change proposals, please feel free to add entries to the
MongoDB C++ Style Guide Proposals document pinned to that channel.
- Jira and PRs are fine for small fixes unrelated to C++ style, such as
typos, formatting, phrasing, and comments.
## Style
## Names of Identifiers
There's some truth in the old joke that naming is the hardest problem in
programming. It's impossible to write catch-all rules for naming, but we can set
guidelines with the intention of avoiding friction in reviews and having some
expectation of general consistency across our codebase.
- Types use `TitleCase`. First letter of each word is uppercase. Following
letters are lowercase.
- Functions and variables use `camelCase`. First letter of each word after the
first is uppercase. The first letter of each word, except the first, is
uppercase.
- Namespaces use `snake_case`. No uppercase letters, and words separated by underscores.
(See "[Namespaces](#namespaces)" section below).
- Spelling: Take care to avoid misspellings in names.
This is more than aesthetic. It is easier on readers.
Misspelled names can harm confidence in code quality.
Misspelled names might be skipped by code searches.
Our convention is to use US English spelling.
- Identifier names should be short but clear. Long sentence-like names
become a laborious comparison exercise for readers, and can form a "wall of
text" that can bury significant C++ keywords and operators. Local variable names
can be particularly brief without causing confusion, provided that the enclosing
functions remain compact and focused.
- Repetition and redundancy in names should be avoided. A function name doesn't
need to restate the types of its arguments, for example. The arguments can
usually speak for themselves, but explicit disambiguation may be desirable in
some cases.
- Word abbreviations should be used carefully. When used, they should be applied
very consistently and documented well. This keeps users from having to guess
which words are abbreviated and which are not.
- Private members are usually named with a leading underscore (e.g. `_detail`).
This applies to data members more consistently than to functions. Identifiers
with a leading underscore followed by an uppercase letter are reserved by
C++, and must not be used. Therefore, the leading `_` should not be used with
private types and typedefs. Double underscores `__` must be avoided as well.
See [article](https://devblogs.microsoft.com/oldnewthing/20230109-00/?p=107685).
### Constants
Constants are either ordinary variables `varName` or with a `k` as a prefix
word, like `kVarName`. You'll see both in the codebase and either is acceptable.
You may also find some older code using `MACRO_STYLE` for constants.
That should not be used in new code outside of macros.
### Test Access
Some entities are defined in an API purely to facilitate test access and
testability. We conventionally tack a `_forTest` suffix (or a `ForTest` suffix
for types) onto its name as an indicator that it should not be used by non-test
code.
## Class Definitions
While class and struct are largely equivalent in C++, this codebase uses a
convention where structs are used for simple collections of data
(possibly with methods), while classes are used for new abstractions. As a rule,
all data in a struct should be public and all data in a class should be private.
If you are unsure which to use, consider whether there are any invariants that
need to be upheld, either within or between members. If there are not, then a
struct may be appropriate.
If a type is a struct or struct-like class, then consider omitting all
constructors and letting it be a [C++ aggregate](https://en.cppreference.com/w/cpp/language/aggregate_initialization), which allows some flexibility
in initialization syntax.
If a type has invariant-preserving constructors, special behaviors, and internal
private details, it's not a `struct`. It's subjective, but structs should be a
mostly straightforward aggregation of data members.
Consider a somewhat canonical example of a `Date`, consisting of `year`,
`month`, `dayOfMonth`. The valid range of a `dayOfMonth` depends on `year` and
`month`, so this type either has an invariant, or it has to be allowed to be in
an invalid state. If the invariants of this type are enforced by the type's
constructors and setters, then it should be a `class`.
It's possible to leave such a `Date` type as a `struct` and enforce these
invariants from the outside through careful discipline among its users. This is
what C APIs have to do. We should prefer using data encapsulation and
`class` for such complex objects.
### Order of Class Members
Within a class or struct definition, try to stick to this ordering by default. A
consistent convention makes it easier for a reader to quickly understand and
navigate a class declaration.
Group public API at the top, and details at the bottom.
- `public`
- `protected`
- `private`
Within each of these visibility sections, there's a preferred order of declarations.
- Attributes of the class come first:
- Types and type aliases, including declarations and enums
- Static constants and static data members
- Static functions
- Then declarations that are relevant to each instance of the class:
- Constructors
- Destructor
- Copy and assignment operators
- Member functions
- Data members
As always, technical concerns override style, and this order sometimes cannot be
exactly followed for technical reasons, but it should be the predominant
weakly-binding preference when laying out a class in the absence of motivation
to diverge from it.
Private data members have a leading underscore followed by a camel case name like `_fooBarBaz`.
Protected members may or may not have a leading underscore, depending on how
logically internal they are. This convention doesn't apply to types.
### Naming of Class Members
```c++
class Foo {
public:
// This is just for demonstration purposes. Classes/structs should rarely
// have a mix of public and private data members.
int publicMember;
protected:
// We've never had a convention about protected members. Both are
// widespread, so either is okay. It depends on how "private" the variable
// is to the derived classes.
int x;
int _y;
private:
int _privateMember;
};
```
### User-facing Names That Include Units (not strictly a C++ issue)
This section applies to names that users can see, like BSON field names or
server parameters, but not necessarily to C++ identifiers.
In things like `serverStatus`, include the units in the field name if there is
any chance of ambiguity. For example, `writtenMB` or `timeMs`.
- For bytes: use `MB` and show in megabytes unless you know it will be tiny.
Note you can use a float so `0.1MB` is fine to show.
- Durations:
- Use milliseconds by default.
Prefer the suffix `Millis`, but be aware that `Ms` is also used.
- Use `Secs` and a floating point number for times that are
expected to be very long.
- For microseconds, use `Micros` as the suffix (e.g., `timeMicros`).
## Documentation
- API docs should appear directly above the thing being documented and use `/**` or `///` style comments.
- If it fits, a comment can be to the right of a variable with `///< doc`.
(See [Doxygen syntax](https://www.doxygen.nl/manual/docblocks.html#memberdoc)).
The `<` is important, as it tells tooling such as clangd to bind backwards to the preceding
decl rather than the following one.
- We don't run Doxygen or recommend other Doxygen markup, this style of comment
delimiter distinguishes API docs from other comments.
- Use complete, grammatical sentences for API docs. Reviewers should pay attention
to the clarity of documentation as it would appear to a reasonably-experienced
server engineer who may not be a domain expert on the code.
- Avoid overly conversational tone, unnecessary personal references (like "I",
or "Pat"), slang, or jargon. Comments should strive for professionalism, but
without rigid formality.
- Comment syntax
```c++
stdx::thread _thread; ///< Empty until init is called.
/** Single line doc. */
void easyFunction(int x, int y);
/**
* Multi line doc. Spans multiple lines.
* The top and bottom lines of this comment block are blank.
*/
void complexFunction(int x, int y) {
// Interior implementation details use line comments like this.
return someFunc(x + y);
}
```
- Give the right amount of information. Make some attempt to give the gist of
complex processes. Avoid being unnecessarily vague to avoid explanation
that would be helpful to the consumer of the API. Conversely, try to avoid
going too much into implementation details in doc-comments (or at least
clearly state when doing so using words like "currently") unless those details
are part of the API that consumers should rely on.
- Comments should be descriptive rather than imperative, e.g.
"Frobnicates the widget", not "Frobnicate the widget". The subject of the
initial sentence is assumed to be the thing being documented and should
generally be omitted, e.g. don't say "This function frobnicates the widget".
```c++
/** Calculates the sum. (GOOD: descriptive verb) */
/** Calculate the sum. (BAD: imperative verb) */
```
There's no need to be very formal about their formatting or use elaborate
Doxygen/Javadoc etc tags. A smattering of text-like markdown is good. Some IDE
features or other tooling might pick up on it, but it shouldn't interfere with the
primary use case of viewing the comments as text while browsing a header file.
Reader attention is a precious resource, so try to write concise comments, and
obvious things need not get a comment. Comments should be adding information.
Do not restate the name and signature, unless there is a subtle detail that
should be highlighted.
Assume the reader knows the language. Special member functions like the copy
constructor do not need comments saying what they are. `operator==` should only
get a comment if there is something interesting about it like omitting a member,
or being order-sensitive.
Most classes and functions should default to having at least a 1-liner comment,
but sometimes context and good naming can make even that a redundant formality
to be omitted. While this is a subjective decision, remember that later readers
will need more hints than the original implementers.
```c++
/**
* If the current command does not override Foo, then it comes from a system-wide default
* value set by the "foo" server parameter. (GOOD: nonobvious).
*/
Foo getFoo() const;
/** Gets the bar (BAD: obvious, no info). */
const Bar& getBar() const;
```
### TODOs
To cite a ticket as a TODO in the code, use this format, with a short reason for
the link. A Jira bot will create reminders when the cited target ticket is
resolved. The target of the TODO cannot be the current ticket. Suppose
SERVER-12345 was a ticket to fix the frobber, and we're documenting some
workaround code:
```c++
// TODO(SERVER-12345): Remove this code when the frobber works again.
```
In comments, a function may be referred to using just its name `foo`, or by `foo()`,
or `foo(int,int)`, depending on context and whether the other forms are ambiguous.
## C++ Code
Much of the guide has been about cosmetics like layout and formatting, comments, and naming
conventions. This section presents more substantial technical issues.
### Minimal Syntax
If a keyword or operator is a "noise" word with no technical benefit, omit it.
The philosophy here is that it's better to write the code as plainly as
possible. Code should not look like it's doing something special when it isn't.
Some examples of "noise" syntax:
- Redundantly marking members and bases as `public`, `protected` or `private`,
etc when they already are.
- Marking a function decl to be `extern` (they're already extern).
- Using `virtual` on a function that's already `override` or `final` (see
"[Overriding Virtuals](#overriding-virtuals)").
### Constructors
Constructors that can be called with single arguments should be `explicit`,
unless implicit conversion is desired, in which case use `explicit(false)` to
explicitly show that intent.
Non-unary constructors should NOT be `explicit` unless it is important to
disable bare braced initialization. If a constructor takes a variable number of arguments
such that it is possibly unary, make it `explicit`.
### `= default`
Prefer `= default;` when needed over defining an empty or trivial function body `{}`.
But where possible, it is usually better to omit the declarations for lifetime methods
entirely and let the compiler declare them implicitly.
Consider that for some classes it may be useful to declare a function normally
in a `.h` file and provide `= default;` as the implementation in a `.cpp` file.
### Noexcept
The `noexcept` feature is easy to overuse. Do not use it solely as "documentation"
since it affects runtime behavior. It's a large topic, covered in the [Exception
Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md#using-noexcept)
document.
### Overriding Virtuals
Use `override` wherever it can be used. Tighten this to `final` when necessary,
and where further overrides would introduce opportunities to break base class
guarantees.
Each declaration should have at most one `virtual`, `override`, or `final`.
Like many style rules, there are rare technical situations to bend this rule. In
this case it can be used to force compilation errors on unintentional hiding.
If a class is known to be a leaf in a hierarchy of polymorphic types, annotating
the class with `final` can be a useful optimization to enable its `virtual`
functions to be devirtualized in some contexts.
### Rules For `.h` Files
- Use `#pragma once` as an include guard, as the first line after the copyright notice.
- No unnamed namespaces in headers at all.
(See the "Namespaces" section below).
- Use `inline` or `extern` on namespace-scope variables in headers, so that each
translation unit does not get its own copy. Note that `inline` variables
provide some init order guarantees which may add a small startup cost, so
define them as `constexpr` or `constinit` if possible.
- Keep complex code out of headers. If a function is not performance sensitive, and it
is longer than a few lines, put it in the corresponding .cpp file. This practice
should help to reduce the number of include statements needed in headers,
which is good for modularity and for compilation speed. That said, simple
getters and setters should generally be inline.
### Rules For `.cpp` Files
Entities with "external linkage" are usable from outside the .cpp file where
they are defined. It's the default linkage for functions, variables, and types
defined at namespace scope, making this unintentional exporting a common error
in C++.
Export with intent. Avoid defining anything with external linkage unless it's
declared in the header. We don't want to have surprising link-time name
collisions or other multi-definition problems as the codebase evolves.
When code has no more callers, it can be readily identified as dead code if it has
internal linkage.
Use either unnamed namespaces or `static` to make definitions with "internal
linkage". These are private to the .cpp file in which they appear.
(See "[Linkage](https://en.cppreference.com/w/cpp/language/storage_duration#Linkage)").
### API Conventions
#### Integer Ranks
We don't typically use the `long` or `long long` integer ranks, except in the
BSON API or when interfacing with third_party or system APIs. In particular, we
should never use plain `long` directly unless required by some outside API since
it is 32 bits on some of our supported platforms. We use `int`, `size_t`, and
the explicit width typedefs `int32_t`, `uint32_t`, `int64_t`, `uint64_t`, etc.
Prefer `size_t` for string/array/container/sequence sizes and indexes, since
that's what C++ does.
#### `const`
- Our code uses "west const" (`const X x;`) rather than "east const" (`X const x;`).
- `const` is not required on local variables.
- Making `const` data members of a movable class can lead to problems with
move and assign operations, and is usually not necessary. On the other hand,
it can be useful for types that are never moved or copied. In particular, for
types that are accessed concurrently it is useful to mark members that are
not modified after construction as `const` because they cannot participate in
data races.
- Don't use `volatile` qualifications. It's an oft-misunderstood feature and
only appropriate in very precise technical scenarios.
### Strings
- We do not use `std::string_view`. Use `StringData` from `base/string_data.h` instead.
For interoperability with functions that accept or return `std::string_view`
(e.g. `std::string`), use the pair of conversion functions
`toStdStringViewForInterop` and `toStringDataForInterop`.
- Working with `char*` strings can be notoriously error-prone. Convert such data to
`StringData` or `std::string` for safety, or use utilities in `util/str.h` for
this sort of thing.
### Performing String Formatting
There are at least two kinds of generic string formatting available. We have
stream-oriented formatting with `StringBuilder` and its wrapper `str::stream()`
(using a stripped-down `std::ostream`-like API), and newer `libfmt` formatting
(using Python-like syntax). We do not use `std::format`. `sprintf`-style
formatting is very rarely used.
```c++
#include <fmt/format.h>
takesString(fmt::format("x={}, y={}\n", xValue, yValue));
```
```c++
#include "mongo/util/str.h"
takesString(str::stream() << "x=" << xValue << ", y=" << yValue << "\n");
```
### Output Parameters
Use pointers or mutable references as "in/out" or "output" parameters,
but prefer returning values to using pure output parameters.
Mutable references used to be banned, but this is no longer the case, and
they are now encouraged for many cases, especially if the callee will not
require the reference to be valid after returning. That said, some types,
such as `OperationContext` are conventionally passed by pointer.
It is best to stick to established conventions for such types to avoid
needing a lot of additional `&opCtx` and `*opCtx` noise at call sites
between functions using different conventions.
```c++
void appendData(const std::string& tag, std::vector<MyType>& out) {
out.push_back(_getData(tag));
}
```
### Namespaces
- Namespace names use `snake_case`. No uppercase letters, and words separated by underscores.
- Contents of `namespace` scopes are not indented.
- Close namespaces with a comment. `clang-format` automatically adds these comments.
```c++
namespace foo {
int fooVar;
namespace bar {
int barVar;
} // namespace bar
} // namespace foo
```
- Do not use "using directives" (i.e. `using namespace foo;`) for arbitrary
namespaces as a naming shortcut. Some namespaces are designed to be used this
way in restricted contexts, but still never at namespace-scope in header
files. These carefully curated namespaces contain only a few definitions.
Examples of these limited exceptional namespaces would include:
- The `std::literals`, `fmt::literals`, and similar namespaces that hold
user-defined literal operators. Using directives are necessary for importing
user-defined literals.
- The `std::placeholders` namespace containing `_1`, `_2`, for use with the
`std::bind` API (which we have banned anyway).
As an alternative, a namespace _alias_ may help to declutter local scopes.
```c++
namespace bc = timeseries::bucket_catalog;
namespace bfs = boost::filesystem;
```
- No unnamed namespaces in headers at all.
They can produce subtle correctness risks, particularly in the form of
[ODR (One Definition Rule)](https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule)
violations.
- In .cpp files, use unnamed namespaces to strip definitions of their linkage.
Headers should generally only be declaring entitiees with external linkage.
- Most server code should be in the `mongo` namespace, and we have several
sub-namespaces nested within that, often used to help organize code by team, by
project, or by large feature.
- Defining a new nested namespace as an API point is cheap, but can be a little
fiddly for users if we have too many of them, so they should be substantial and
relatively coarse-grained (a handful per team).
- Use a component-unique namespace, eg `future_details` or `duration_detail`, to
give names to pseudo-"private" details in headers. It's important to include
the component name here. Using `mongo::detail` or `mongo::internal` doesn't
mitigate the problem of name collisions between components.
- As a matter of namespace etiquette and modularity, avoid using anything in a
component's `detail` or `internal` -suffixed namespaces from outside the
component. If you need to use such a private name, that should ideally involve
a conversation with the code owners about promoting it out of the detail
namespace.
- Combine immediately-nested namespace blocks where possible:
```c++
namespace mongo::foo::bar {
int barVar;
} // namespace mongo::foo::bar
```
### Control flow
- Place exceptional path first.
- Return early.
- Avoid `else` after a returning `if` statement.
```c++
Status ifElseSpaghetti() {
Status err;
if (err = doStuff1(); err.isOK()) {
if (err = doStuff2(); err.isOK()) {
if (err = doStuff3(); err.isOK()) {
if (err = doStuff4(); err.isOK()) {
// Expected path obscure and indented
} else {
}
} else {
}
} else {
}
} else {
}
return err;
}
Status withEarlyReturns() {
if (auto err = doStuff1(); !err.isOK())
return err;
if (auto err = doStuff2(); !err.isOK())
return err;
if (auto err = doStuff3(); !err.isOK())
return err;
if (auto err = doStuff4(); !err.isOK())
return err;
// Expected path obvious and prominent.
return Status::OK();
}
```
#### Range-Based `for` Loops
[Range-based for loops](https://en.cppreference.com/w/cpp/language/range-for) can have subtle issues.
The usual practice is to use a forwarding reference (`auto&&`) as the item variable. Applying this
pattern as a default practice prevents subtle copies and conversions of the range elements.
```c++
for (auto&& item : someRange)
```
For ranges that have pair or tuple elements, particularly maps, it's common to
use structured bindings to give names to the parts of the item:
```c++
for (auto&& [key, value]: someMap)
```
It's worth a note of caution about the dangers of the range expression in a
range-based for loop, as this is a common and subtle source of bugs.
The range expression is bound to an implicit range variable, and its lifetime
will be extended if it's a temporary, as usual with C++ initializers.
But other temporaries created in the initializer expression will die after the
initializer. They are not extended to the lifetime of the for loop.
```c++
// ok: temporary is bound to implicit range variable.
for (auto&& item: makeVector())
// BUG: the result of obj() is destroyed.
for (auto&& item: obj().view())
```
The rules here change in C++23, such that all temporaries in the range initializer are extended.
The fix is a theoretically a breaking change for some code. But the risk tradeoff
overwhelmingly favored making this change anyway.
> [!WARNING]
> The compilers we are using have not all implemented this feature yet, even on the v5 toolchain. So
> we still need to be extremely careful with range expressions that rely on
> intermediate temporaries.
It would be helpful to read the [CppReference](https://en.cppreference.com/w/cpp/language/range-for#Temporary_range_initializer) on this topic.
Some good [bug examples](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2644r0.pdf)
are listed in the single-page ISO C++ proposal to fix the problem.
### Assertions
This is a large topic. See the [Exception Architecture](https://github.com/mongodb/mongo/blob/master/docs/exception_architecture.md) architecture guide.
### Logging and Output
We use a custom logging system, documented in the
[Logging](https://github.com/mongodb/mongo/blob/master/docs/logging.md)
architecture guide. Direct output to `stdout` or `stderr` streams is only done
by special server code.
### Numeric Constants
Large, round numeric constants should be written in a user-friendly way.
- If a number is derived from a simple numeric expression, expressing it as an
expression can help a reader verify and maintain it. For example, prefer
`50 * 1024 * 1024` to `52'428'800`.
- Use digit separators `'` for large numeric constants. 3-digit groups for
decimal. Conventionally, use 4-digit or 8-digit groups for hexadecimal or
binary.
- Use a bit-shifted form for power-of-two exponentiation. eg, `1<<13` to express 2<sup>13</sup>.
Make sure the "1" is wide enough for the shift if it's large (e.g. `uint64_t{1} << 52`).
A `* 1024` sequence is also acceptable, as it's a recognizable idiom for kiB and MiB expressions.
- Do not assume suffixes like `ULL` will produce specifically typed quantities like `uint64`.
Use a numeric literal and the compiler will give it a wide-enough type.
Where the exact type matters, use an explicitly typed expression.
```c++
const int tenMillion = 10'000'000;
const int miBiByte = 1 << 20;
const uint64 exBiByte = 1ull << 60; // Arithmetic expressions may need a particular type.
const uint32 crc32Polynomial = 0xEDB8'8320;
const uint32 asciiMask = 0b0111'1111;
arrayBuilder.append(uint64_t{1234}); // Force argument type.
```
### Casting
- Do not use C-style cast syntax (parentheses around the preceding type) ever.
See [this CGL rule](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast)
and [this Google rule](https://google.github.io/styleguide/cppguide.html#Casting) for discussion.
- Use `static_cast` as needed. Use `const_cast` when necessary.
- Be aware that `dynamic_cast`, unlike other casts, is done at runtime. You
should always check for `dynamic_cast<T*>` returning null pointer.
- `reinterpret_cast` should be used sparingly. It is typically done for
low-level layout conversions and accessing objects in ways that may break the
protections of the type system and exhibit undefined behavior if misapplied.
- When down-casting from a base type where the program logic guarantees that
the runtime type is correct, consider using `checked_cast` from
`mongo/base/checked_cast.h`. It is equivalent to `static_cast` in release builds,
but adds an invariant to debug builds that ensures the cast is valid.
### RAII and Smart Pointers
- Embrace RAII (Resource Acquisition Is Initialization). This means that resources
should generally be managed by objects that automatically release them when
going out of scope.
- By default, the assumption in our codebase is that raw pointers are
views/borrows and never owning. Document exceptions to that rule, and try to
avoid having owning raw pointers as part of your API.
- Make heavy use of smart pointers such as `std::unique_ptr` and `std::shared_ptr`.
For some types we use `boost::intrusive_ptr` instead.
- Generally, bare calls to `new`/`delete` and `malloc`/`free` outside of the
implementation of an RAII type should be red flags and draw extra scrutiny in
review. Prefer factory functions like `std::make_unique` and
`std::make_shared`.
- Use `ScopeGuard` or `ON_BLOCK_EXIT` to protect other resources that must be
released (e.g. `fopen`/`fclose` pairs), or perform some other action when
leaving scope. It is often a good idea to put "undo X" logic right after the
"do X" logic rather than at the bottom of the function to ensure that the
logic stays correct if someone adds an early return or throws. Or, write an
object to do this for you via its constructor and destructor.
### The `WithLock` Convention
It is common practice in our codebase for a larger "business logic" class to
have an obvious primary mutex member. These tend to have some private functions
that require that this mutex be held. These functions often take a
`WithLock` as the first parameter to document the contract and provide some
checking of the callers. The parameter should usually be unnamed. This is a
technical check that forces callers to present a lock-holding resource handle
(e.g. `unique_lock`) to call the function. See
[with_lock.h](../src/mongo/util/concurrency/with_lock.h).
## Files (Physical Design)
### Components
A component is a grouping of classes, entities, and functions that is built as a
single packaged unit. There are 1 or more components in a library. A component
should represent a grouping of functionality and interrelated classes and
functions that work together.
A component normally consists of a `.h`, a `.cpp`, and a `_test.cpp` file.
Source filenames use lowercase words separated by underscores (i.e. snake_case).
In uncommon cases, there are other files in the component for technical or
internal organizational reasons. These might be a `foo_internal.h` auxiliary
header, or a `foo_test_part4.cpp` test fragment, but these extra files are not
meant to serve as its main interface or present its main idea. They're helper
details and they should have the component name as a prefix of their file names.
A component will commonly be dominated by a single dominant class, and for
discoverability, it should therefore use that class name, in snake_case, as its
filename. That said, we have no rule limiting the number of declarations in a
file, and it is useful to define related classes together in a single component.
### Using `#include`
- To make a declaration available, we require inclusion of a header file that
provides it. There should not be any implicit reliance on transitive includes,
even if the code compiles. As an exception to this general rule, `foo.cpp` and
`foo_test.cpp` do not need to duplicate the includes from `foo.h`.
- Do not make forward declarations to avoid an inclusion. It may be tempting to
do this as an optimization, but we don't do it, as there are correctness and
modularity risks.
- Do not include headers that are not needed. Do not blindly copy large blocks
of include statements.
- An "umbrella" interface header may provide several related transitive
includes, but these umbrella headers should be documented as such, and they
should be provided by the library maintainer. Use IWYU (include what you use)
pragma comments to prevent tools and editors from incorrectly auto-suggesting
the private headers.
In the public header (e.g. `unittest/unittest.h`):
```c++
#include "mongo/unittest/assert.h" // IWYU pragma: export
```
In the private headers (e.g. `unittest/assert.h`):
```c++
// IWYU pragma: private, include "mongo/unittest/unittest.h"
// IWYU pragma: friend "mongo/unittest/.*"
```
- A header should also be "self-contained", and include everything it needs. It
must not rely on other headers having been included above it by its users.
- Use "double quotes" to include headers under `mongo/`, and \<angle brackets\>
for headers under `third_party/`, or for system libraries.
- Always use the forward relative path from `mongo/src/`. "Forward" means to not
refer to the parent directory `../`.
- Don't use `third_party/` as part of include paths. Use `<>` and omit it.
```c++
#include <boost/optional.hpp> // Yes
#include "third_party/boost/optional.hpp" // No: omit "third_party/" and use <>
#include "boost/optional.hpp" // No: use <>
#include "mongo/db/namespace_details.h" // Yes
#include "../db/namespace_details.h" // No: ".." is disallowed
```
### Ordering and Grouping of C++ `#include` Directives
We have a standard order for the include directives at the top of a C++ file.
It is automatically applied by our configuration of clang-format.
The purpose of this ordering is to keep the list organized to aid in visual
scanning, and to catch headers that are missing includes.
The include directives are organized into several blocks.
Within each block, the include directives are sorted alphabetically.
Follow each block with a blank line.
- Main header
For the `.cpp` and `_test.cpp` files of a component, include the component's
`.h` file if applicable as the first include. This is a safety practice that
helps us ensure that a `.h` file doesn't rely on any preceding inclusions.
- First-party headers
All include directives using `""` and starting with `mongo/`.
E.g. `"mongo/db/db.h"`.
- C++ stdlib headers
Include directives using `<>`, with no `/` or `.` in path.
E.g. `<vector>`, `<cmath>`.
- Unnamespaced headers
Include directives using `<>`, with no `/` in path.
Typically these are system C headers ending in `.h`
E.g. `<unistd.h>`.
- Remaining third-party headers
Include directives using `<>`, with `/` in path.
E.g. `<boost/optional/optional.hpp>`, `<sys/types.h>`.
To summarize, a typical .cpp file "classy.cpp" might have up to 5 sorted blocks of
include directives:
```c++
/** (Copyright notice would appear at the top, then...) */
#include "mongo/db/classy.h"
#include "mongo/db/db.h"
#include "mongo/db/namespace_details.h"
#include "mongo/util/concurrency/qlock.h"
#include <cstdio>
#include <string>
#include <unistd.h>
#include <boost/thread/thread.hpp>
```
Any headers that are conditionally included under the control of `#if`
directives (if technically possible) will appear after these blocks.
Clang-format will not reorder includes across anything other than a blank line
or other includes. In the rare case where some header must be included before
or after all other headers, you can use a comment line to separate it from
other includes like:
```cpp
#include <last/normal/header.h>
// This header must be after all others:
#include <a/weird/header.h>
```
If you see a comment line in old code that is unintentionally preventing proper
header ordering, you are encouraged to clean that up when adding or removing
includes.
### For `js` Files (JavaScript only)
- Disable formatting for [template literals](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals)
```js
// clang-format off
newCode = `load("${overridesFile}"); (${jsCode})();`;
// clang-format on
```
### Copyright Notices
- All new C++ files added to the MongoDB code base that will be upstreamed for
public consumption (such as anything upstreamed to `mongodb/mongo`) should
use the following copyright notice and SSPL license language, substituting
the current year for `YYYY` as appropriate:
```c++
/**
* Copyright (C) YYYY-present MongoDB, Inc.
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the Server Side Public License, version 1,
* as published by MongoDB, Inc.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* Server Side Public License for more details.
*
* You should have received a copy of the Server Side Public License
* along with this program. If not, see
* <http://www.mongodb.com/licensing/server-side-public-license>.
*
* As a special exception, the copyright holders give permission to link the
* code of portions of this program with the OpenSSL library under certain
* conditions as described in each individual source file and distribute
* linked combinations including the program with the OpenSSL library. You
* must comply with the Server Side Public License in all respects for
* all of the code used other than as permitted herein. If you modify file(s)
* with this exception, you may extend this exception to your version of the
* file(s), but you are not obligated to do so. If you do not wish to do so,
* delete this exception statement from your version. If you delete this
* exception statement from all source files in the program, then also delete
* it in the license file.
*/
```
- Enterprise source code is not SSPL, and must bear a shorter copyright notice:
```c++
/**
* Copyright (C) YYYY-present MongoDB, Inc.
*/
```
## Basic Formatting Conventions in C++ Code
There are several matters of file formatting expected in source files, and we
enforce these when we can. If you use our recommended
[config](https://github.com/mongodb/mongo/blob/master/.vscode_defaults/linux-virtual-workstation.code-workspace)
for VSCode, much of this will be handled automatically for you.
### Whitespace
- Use spaces, no TAB characters.
- 4 spaces per indentation.
- Limit lines to 100 columns.
- Use Posix text format for source files.
All lines (including the final line) end with a LF (ASCII "line feed" aka `\n`) character.
We don't use the Windows CRLF (`\r\n`) line endings in source files.
In VS Code, `files.eol` should be set to "\n", and `files.insertFinalNewline`
set to true to help with this. A Git config option on Windows can convert
line endings automatically (`core.autocrlf`).
### Braces
Our braces style is that the opening brace appears at the end of the line. We
do not open a new line just for the opening brace that is part of a control flow
structure (`if`, `while`, etc).
Braces are optional for sufficiently simple statements.
```c++
if (condition)
doStuff();
if (condition) {
doStuff();
}
while (condition)
doStuff();
while (condition) {
doStuff();
}
do {
doStuff();
} while (condition);
```
### ESLint (JavaScript only)
All JS files must be linted by ESLint before they are formatted by clang-format.
We use [ESLint](http://eslint.org/) to lint JS code. ESLint is a JS
linting tool that uses the config file located at `.eslintrc.yml`, in the root
of the mongo repository, to control the linting of the JS code.
[Plugins](http://eslint.org/docs/user-guide/integrations) are available for most
editors that will automatically run ESLint on file save. It is recommended to
use one of these plugins.
Use the wrapper script `buildscripts/eslint.py` to check that the JS code is
linted correctly as well as to fix linting errors in the code. This wrapper
selects the appropriate version of eslint to be used.
```sh
python buildscripts/eslint.py lint # lint js code
python buildscripts/eslint.py fix # auto-fix js code
```
### Clang-Format
All code changes must be formatted by
[clang-format](http://clang.llvm.org/docs/ClangFormat.html) before they are
checked in. Use `bazel run format` to reformat C++ and JS code.
Clang-format is a C/C++ & JS code formatting tool that uses the config files
located at `src/mongo/.clang-format` and `jstests/.clang-format` to control the
format of the code. The version and configuration of clang-format is selected by
`bazel run format`.
Plugins are available for most editors that will automatically run clang-format
on file save.
Clang-format is essential, but we should not let it create unreadable code.
There are some ways to keep it from producing a mess:
- It will not join a line that ends in a (potentially empty) `//` comment.
- It also recognizes comma-terminated lists as significant hints.
- As a last resort, it honors `clang-format off` and `clang-format on` in comments.
This should only be used where it is really important, since it may result in indentation
drift with the surrounding code as we upgrade clang-format or change settings.
```c++
void clangFormatExamples() {
// Trailing comma prevents joining braces with data.
std::array arr{
123, 234, 456, 567, 678,
};
std::vector<std::vector<int>> vvi{
{
123,
345,
},
{
456,
},
};
// Just one leading EOL comment '//' prevents joining all lines.
b //
.append(x, 123)
.append(y, 234)
.append(z, 345);
}
// Example tabular data that would be harmed by reformatting.
// clang-format off
#define EXPAND_TABLE(X) \
/* (id, val , shortName , logName , parent) */ \
X(kDefault, = 0 , "default" , "-" , kNumLogComponents) \
X(kAccessControl, , "accessControl", "ACCESS" , kDefault) \
X(kAssert, , "assert" , "ASSERT" , kDefault) \
X(kCommand, , "command" , "COMMAND" , kDefault) \
X(kControl, , "control" , "CONTROL" , kDefault) \
X(kExecutor, , "executor" , "EXECUTOR", kDefault) \
X(kGeo, , "geo" , "GEO" , kDefault)
// clang-format on
```
---
# Additional Learning Resources
- [Learn C++](http://learncpp.com), free C++ tutorial.
- CppCon "Back to Basics" track playlist.
[link](https://www.youtube.com/playlist?list=PLHTh1InhhwT4TJaHBVWzvBOYhp27UO7mI)
- "A Tour of C++", Stroustrup.
ISBN: 9780133549003
- "Large-Scale C++: Process and Architecture, Volume 1", Lakos.
ISBN 9780133927665
- All of Herb Sutter's "Exceptional" series of books.
- All of Alexandrescu books
- All of Scott Meyer's "Effective" books (getting very old but still great)
# References
- [MongoDB C++ Style Guide Proposals](https://docs.google.com/document/d/1nvmEnjw-5DNFIoXPa7WzM1PbOOl1fN19jl1sz9cpzAg)
Roadmap and suggestion box for this document.
- [Server Code Style](https://github.com/mongodb/mongo/wiki/Server-Code-Style) on mongo github wiki to be replaced by this document.
- [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) We used to default
to this for all things not explicitly covered by our own guide, but that is no longer the case.
- [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) Interesting reading.
Diverges significantly at times from our style.
- [cppreference.com](https://cppreference.com) The best C++ reference site
- [C++ SUPER FAQ](https://isocpp.org/faq)
- [Compiler Explorer](https://goldbolt.org) Great for demonstrating C++ ideas on multiple compilers.
- [VSCode workspace file](https://github.com/mongodb/mongo/blob/master/.vscode_defaults/linux-virtual-workstation.code-workspace)
A default configuration for server engineers who use VSCode. It's configured
to handle editor configuration and formatting issues in accordance with this
guide.