43 KiB
MongoDB Server C++ Style Guide
This document describes common conventions used in the MongoDB server codebase. The document is about C++, but there are a few places where JavaScript style is discussed as well.
A firmly established style guide can make source files unsurprising as they are more easily navigable and regular in shape.
Style rules can eliminate wasted time on minor issues in code reviews. An author should endeavor to be style-compliant before sending a pull request for review. This should accelerate code reviews and establish consistent expectations on code.
The guide is carefully considered by very experienced C++ engineers. C++ code can be complex, and there are subtle correctness and maintainability risks that can arise from certain antipatterns addressed by the guide. Style adherence enables code authors and their reviewers to productively write safer code without having to first rediscover those problems for themselves.
Feedback (MongoDB internal)
This is maintained by the Server Programmability team.
- Use
#server-programmabilityon Slack for discussion and clarifications. Contributors outside of MongoDB can use Jira instead. - For change proposals, please feel free to add entries to the MongoDB C++ Style Guide Proposals document pinned to that channel.
- Jira and PRs are fine for small fixes unrelated to C++ style, such as typos, formatting, phrasing, and comments.
Style
Names of Identifiers
There's some truth in the old joke that naming is the hardest problem in programming. It's impossible to write catch-all rules for naming, but we can set guidelines with the intention of avoiding friction in reviews and having some expectation of general consistency across our codebase.
-
Types use
TitleCase. First letter of each word is uppercase. Following letters are lowercase. -
Functions and variables use
camelCase. First letter of each word after the first is uppercase. The first letter of each word, except the first, is uppercase. -
Namespaces use
snake_case. No uppercase letters, and words separated by underscores. (See "Namespaces" section below). -
Spelling: Take care to avoid misspellings in names. This is more than aesthetic. It is easier on readers. Misspelled names can harm confidence in code quality. Misspelled names might be skipped by code searches. Our convention is to use US English spelling.
-
Identifier names should be short but clear. Long sentence-like names become a laborious comparison exercise for readers, and can form a "wall of text" that can bury significant C++ keywords and operators. Local variable names can be particularly brief without causing confusion, provided that the enclosing functions remain compact and focused.
-
Repetition and redundancy in names should be avoided. A function name doesn't need to restate the types of its arguments, for example. The arguments can usually speak for themselves, but explicit disambiguation may be desirable in some cases.
-
Word abbreviations should be used carefully. When used, they should be applied very consistently and documented well. This keeps users from having to guess which words are abbreviated and which are not.
-
Private members are usually named with a leading underscore (e.g.
_detail). This applies to data members more consistently than to functions. Identifiers with a leading underscore followed by an uppercase letter are reserved by C++, and must not be used. Therefore, the leading_should not be used with private types and typedefs. Double underscores__must be avoided as well. See article.
Constants
Constants are either ordinary variables varName or with a k as a prefix
word, like kVarName. You'll see both in the codebase and either is acceptable.
You may also find some older code using MACRO_STYLE for constants.
That should not be used in new code outside of macros.
Test Access
Some entities are defined in an API purely to facilitate test access and
testability. We conventionally tack a _forTest suffix (or a ForTest suffix
for types) onto its name as an indicator that it should not be used by non-test
code.
Class Definitions
While class and struct are largely equivalent in C++, this codebase uses a convention where structs are used for simple collections of data (possibly with methods), while classes are used for new abstractions. As a rule, all data in a struct should be public and all data in a class should be private. If you are unsure which to use, consider whether there are any invariants that need to be upheld, either within or between members. If there are not, then a struct may be appropriate.
If a type is a struct or struct-like class, then consider omitting all constructors and letting it be a C++ aggregate, which allows some flexibility in initialization syntax.
If a type has invariant-preserving constructors, special behaviors, and internal
private details, it's not a struct. It's subjective, but structs should be a
mostly straightforward aggregation of data members.
Consider a somewhat canonical example of a Date, consisting of year,
month, dayOfMonth. The valid range of a dayOfMonth depends on year and
month, so this type either has an invariant, or it has to be allowed to be in
an invalid state. If the invariants of this type are enforced by the type's
constructors and setters, then it should be a class.
It's possible to leave such a Date type as a struct and enforce these
invariants from the outside through careful discipline among its users. This is
what C APIs have to do. We should prefer using data encapsulation and
class for such complex objects.
Order of Class Members
Within a class or struct definition, try to stick to this ordering by default. A consistent convention makes it easier for a reader to quickly understand and navigate a class declaration.
Group public API at the top, and details at the bottom.
publicprotectedprivate
Within each of these visibility sections, there's a preferred order of declarations.
-
Attributes of the class come first:
- Types and type aliases, including declarations and enums
- Static constants and static data members
- Static functions
-
Then declarations that are relevant to each instance of the class:
- Constructors
- Destructor
- Copy and assignment operators
- Member functions
- Data members
As always, technical concerns override style, and this order sometimes cannot be
exactly followed for technical reasons, but it should be the predominant
weakly-binding preference when laying out a class in the absence of motivation
to diverge from it.
Private data members have a leading underscore followed by a camel case name like _fooBarBaz.
Protected members may or may not have a leading underscore, depending on how
logically internal they are. This convention doesn't apply to types.
Naming of Class Members
class Foo {
public:
// This is just for demonstration purposes. Classes/structs should rarely
// have a mix of public and private data members.
int publicMember;
protected:
// We've never had a convention about protected members. Both are
// widespread, so either is okay. It depends on how "private" the variable
// is to the derived classes.
int x;
int _y;
private:
int _privateMember;
};
User-facing Names That Include Units (not strictly a C++ issue)
This section applies to names that users can see, like BSON field names or server parameters, but not necessarily to C++ identifiers.
In things like serverStatus, include the units in the field name if there is
any chance of ambiguity. For example, writtenMB or timeMs.
-
For bytes: use
MBand show in megabytes unless you know it will be tiny. Note you can use a float so0.1MBis fine to show. -
Durations:
- Use milliseconds by default.
Prefer the suffix
Millis, but be aware thatMsis also used. - Use
Secsand a floating point number for times that are expected to be very long. - For microseconds, use
Microsas the suffix (e.g.,timeMicros).
- Use milliseconds by default.
Prefer the suffix
Documentation
-
API docs should appear directly above the thing being documented and use
/**or///style comments. -
If it fits, a comment can be to the right of a variable with
///< doc. (See Doxygen syntax). The<is important, as it tells tooling such as clangd to bind backwards to the preceding decl rather than the following one. -
We don't run Doxygen or recommend other Doxygen markup, this style of comment delimiter distinguishes API docs from other comments.
-
Use complete, grammatical sentences for API docs. Reviewers should pay attention to the clarity of documentation as it would appear to a reasonably-experienced server engineer who may not be a domain expert on the code.
-
Avoid overly conversational tone, unnecessary personal references (like "I", or "Pat"), slang, or jargon. Comments should strive for professionalism, but without rigid formality.
-
Comment syntax
stdx::thread _thread; ///< Empty until init is called.
/** Single line doc. */
void easyFunction(int x, int y);
/**
* Multi line doc. Spans multiple lines.
* The top and bottom lines of this comment block are blank.
*/
void complexFunction(int x, int y) {
// Interior implementation details use line comments like this.
return someFunc(x + y);
}
-
Give the right amount of information. Make some attempt to give the gist of complex processes. Avoid being unnecessarily vague to avoid explanation that would be helpful to the consumer of the API. Conversely, try to avoid going too much into implementation details in doc-comments (or at least clearly state when doing so using words like "currently") unless those details are part of the API that consumers should rely on.
-
Comments should be descriptive rather than imperative, e.g. "Frobnicates the widget", not "Frobnicate the widget". The subject of the initial sentence is assumed to be the thing being documented and should generally be omitted, e.g. don't say "This function frobnicates the widget".
/** Calculates the sum. (GOOD: descriptive verb) */
/** Calculate the sum. (BAD: imperative verb) */
There's no need to be very formal about their formatting or use elaborate Doxygen/Javadoc etc tags. A smattering of text-like markdown is good. Some IDE features or other tooling might pick up on it, but it shouldn't interfere with the primary use case of viewing the comments as text while browsing a header file.
Reader attention is a precious resource, so try to write concise comments, and obvious things need not get a comment. Comments should be adding information. Do not restate the name and signature, unless there is a subtle detail that should be highlighted.
Assume the reader knows the language. Special member functions like the copy
constructor do not need comments saying what they are. operator== should only
get a comment if there is something interesting about it like omitting a member,
or being order-sensitive.
Most classes and functions should default to having at least a 1-liner comment, but sometimes context and good naming can make even that a redundant formality to be omitted. While this is a subjective decision, remember that later readers will need more hints than the original implementers.
/**
* If the current command does not override Foo, then it comes from a system-wide default
* value set by the "foo" server parameter. (GOOD: nonobvious).
*/
Foo getFoo() const;
/** Gets the bar (BAD: obvious, no info). */
const Bar& getBar() const;
TODOs
To cite a ticket as a TODO in the code, use this format, with a short reason for the link. A Jira bot will create reminders when the cited target ticket is resolved. The target of the TODO cannot be the current ticket. Suppose SERVER-12345 was a ticket to fix the frobber, and we're documenting some workaround code:
// TODO(SERVER-12345): Remove this code when the frobber works again.
In comments, a function may be referred to using just its name foo, or by foo(),
or foo(int,int), depending on context and whether the other forms are ambiguous.
C++ Code
Much of the guide has been about cosmetics like layout and formatting, comments, and naming conventions. This section presents more substantial technical issues.
Minimal Syntax
If a keyword or operator is a "noise" word with no technical benefit, omit it. The philosophy here is that it's better to write the code as plainly as possible. Code should not look like it's doing something special when it isn't.
Some examples of "noise" syntax:
- Redundantly marking members and bases as
public,protectedorprivate, etc when they already are. - Marking a function decl to be
extern(they're already extern). - Using
virtualon a function that's alreadyoverrideorfinal(see "Overriding Virtuals").
Constructors
Constructors that can be called with single arguments should be explicit,
unless implicit conversion is desired, in which case use explicit(false) to
explicitly show that intent.
Non-unary constructors should NOT be explicit unless it is important to
disable bare braced initialization. If a constructor takes a variable number of arguments
such that it is possibly unary, make it explicit.
= default
Prefer = default; when needed over defining an empty or trivial function body {}.
But where possible, it is usually better to omit the declarations for lifetime methods
entirely and let the compiler declare them implicitly.
Consider that for some classes it may be useful to declare a function normally
in a .h file and provide = default; as the implementation in a .cpp file.
Noexcept
The noexcept feature is easy to overuse. Do not use it solely as "documentation"
since it affects runtime behavior. It's a large topic, covered in the Exception
Architecture
document.
Overriding Virtuals
Use override wherever it can be used. Tighten this to final when necessary,
and where further overrides would introduce opportunities to break base class
guarantees.
Each declaration should have at most one virtual, override, or final.
Like many style rules, there are rare technical situations to bend this rule. In this case it can be used to force compilation errors on unintentional hiding.
If a class is known to be a leaf in a hierarchy of polymorphic types, annotating
the class with final can be a useful optimization to enable its virtual
functions to be devirtualized in some contexts.
Rules For .h Files
-
Use
#pragma onceas an include guard, as the first line after the copyright notice. -
No unnamed namespaces in headers at all. (See the "Namespaces" section below).
-
Use
inlineorexternon namespace-scope variables in headers, so that each translation unit does not get its own copy. Note thatinlinevariables provide some init order guarantees which may add a small startup cost, so define them asconstexprorconstinitif possible. -
Keep complex code out of headers. If a function is not performance sensitive, and it is longer than a few lines, put it in the corresponding .cpp file. This practice should help to reduce the number of include statements needed in headers, which is good for modularity and for compilation speed. That said, simple getters and setters should generally be inline.
Rules For .cpp Files
Entities with "external linkage" are usable from outside the .cpp file where they are defined. It's the default linkage for functions, variables, and types defined at namespace scope, making this unintentional exporting a common error in C++.
Export with intent. Avoid defining anything with external linkage unless it's declared in the header. We don't want to have surprising link-time name collisions or other multi-definition problems as the codebase evolves. When code has no more callers, it can be readily identified as dead code if it has internal linkage.
Use either unnamed namespaces or static to make definitions with "internal
linkage". These are private to the .cpp file in which they appear.
(See "Linkage").
API Conventions
Integer Ranks
We don't typically use the long or long long integer ranks, except in the
BSON API or when interfacing with third_party or system APIs. In particular, we
should never use plain long directly unless required by some outside API since
it is 32 bits on some of our supported platforms. We use int, size_t, and
the explicit width typedefs int32_t, uint32_t, int64_t, uint64_t, etc.
Prefer size_t for string/array/container/sequence sizes and indexes, since
that's what C++ does.
const
-
Our code uses "west const" (
const X x;) rather than "east const" (X const x;). -
constis not required on local variables. -
Making
constdata members of a movable class can lead to problems with move and assign operations, and is usually not necessary. On the other hand, it can be useful for types that are never moved or copied. In particular, for types that are accessed concurrently it is useful to mark members that are not modified after construction asconstbecause they cannot participate in data races. -
Don't use
volatilequalifications. It's an oft-misunderstood feature and only appropriate in very precise technical scenarios.
Strings
-
We do not use
std::string_view. UseStringDatafrombase/string_data.hinstead. For interoperability with functions that accept or returnstd::string_view(e.g.std::string), use the pair of conversion functionstoStdStringViewForInteropandtoStringDataForInterop. -
Working with
char*strings can be notoriously error-prone. Convert such data toStringDataorstd::stringfor safety, or use utilities inutil/str.hfor this sort of thing.
Performing String Formatting
There are at least two kinds of generic string formatting available. We have
stream-oriented formatting with StringBuilder and its wrapper str::stream()
(using a stripped-down std::ostream-like API), and newer libfmt formatting
(using Python-like syntax). We do not use std::format. sprintf-style
formatting is very rarely used.
#include <fmt/format.h>
takesString(fmt::format("x={}, y={}\n", xValue, yValue));
#include "mongo/util/str.h"
takesString(str::stream() << "x=" << xValue << ", y=" << yValue << "\n");
Output Parameters
Use pointers or mutable references as "in/out" or "output" parameters,
but prefer returning values to using pure output parameters.
Mutable references used to be banned, but this is no longer the case, and
they are now encouraged for many cases, especially if the callee will not
require the reference to be valid after returning. That said, some types,
such as OperationContext are conventionally passed by pointer.
It is best to stick to established conventions for such types to avoid
needing a lot of additional &opCtx and *opCtx noise at call sites
between functions using different conventions.
void appendData(const std::string& tag, std::vector<MyType>& out) {
out.push_back(_getData(tag));
}
Namespaces
-
Namespace names use
snake_case. No uppercase letters, and words separated by underscores. -
Contents of
namespacescopes are not indented. -
Close namespaces with a comment.
clang-formatautomatically adds these comments.namespace foo { int fooVar; namespace bar { int barVar; } // namespace bar } // namespace foo -
Do not use "using directives" (i.e.
using namespace foo;) for arbitrary namespaces as a naming shortcut. Some namespaces are designed to be used this way in restricted contexts, but still never at namespace-scope in header files. These carefully curated namespaces contain only a few definitions. Examples of these limited exceptional namespaces would include:- The
std::literals,fmt::literals, and similar namespaces that hold user-defined literal operators. Using directives are necessary for importing user-defined literals. - The
std::placeholdersnamespace containing_1,_2, for use with thestd::bindAPI (which we have banned anyway).
As an alternative, a namespace alias may help to declutter local scopes.
namespace bc = timeseries::bucket_catalog; namespace bfs = boost::filesystem; - The
-
No unnamed namespaces in headers at all. They can produce subtle correctness risks, particularly in the form of ODR (One Definition Rule) violations.
-
In .cpp files, use unnamed namespaces to strip definitions of their linkage. Headers should generally only be declaring entitiees with external linkage.
-
Most server code should be in the
mongonamespace, and we have several sub-namespaces nested within that, often used to help organize code by team, by project, or by large feature. -
Defining a new nested namespace as an API point is cheap, but can be a little fiddly for users if we have too many of them, so they should be substantial and relatively coarse-grained (a handful per team).
-
Use a component-unique namespace, eg
future_detailsorduration_detail, to give names to pseudo-"private" details in headers. It's important to include the component name here. Usingmongo::detailormongo::internaldoesn't mitigate the problem of name collisions between components. -
As a matter of namespace etiquette and modularity, avoid using anything in a component's
detailorinternal-suffixed namespaces from outside the component. If you need to use such a private name, that should ideally involve a conversation with the code owners about promoting it out of the detail namespace. -
Combine immediately-nested namespace blocks where possible:
namespace mongo::foo::bar {
int barVar;
} // namespace mongo::foo::bar
Control flow
- Place exceptional path first.
- Return early.
- Avoid
elseafter a returningifstatement.
Status ifElseSpaghetti() {
Status err;
if (err = doStuff1(); err.isOK()) {
if (err = doStuff2(); err.isOK()) {
if (err = doStuff3(); err.isOK()) {
if (err = doStuff4(); err.isOK()) {
// Expected path obscure and indented
} else {
}
} else {
}
} else {
}
} else {
}
return err;
}
Status withEarlyReturns() {
if (auto err = doStuff1(); !err.isOK())
return err;
if (auto err = doStuff2(); !err.isOK())
return err;
if (auto err = doStuff3(); !err.isOK())
return err;
if (auto err = doStuff4(); !err.isOK())
return err;
// Expected path obvious and prominent.
return Status::OK();
}
Range-Based for Loops
Range-based for loops can have subtle issues.
The usual practice is to use a forwarding reference (auto&&) as the item variable. Applying this
pattern as a default practice prevents subtle copies and conversions of the range elements.
for (auto&& item : someRange)
For ranges that have pair or tuple elements, particularly maps, it's common to use structured bindings to give names to the parts of the item:
for (auto&& [key, value]: someMap)
It's worth a note of caution about the dangers of the range expression in a range-based for loop, as this is a common and subtle source of bugs.
The range expression is bound to an implicit range variable, and its lifetime will be extended if it's a temporary, as usual with C++ initializers.
But other temporaries created in the initializer expression will die after the initializer. They are not extended to the lifetime of the for loop.
// ok: temporary is bound to implicit range variable.
for (auto&& item: makeVector())
// BUG: the result of obj() is destroyed.
for (auto&& item: obj().view())
The rules here change in C++23, such that all temporaries in the range initializer are extended. The fix is a theoretically a breaking change for some code. But the risk tradeoff overwhelmingly favored making this change anyway.
[!WARNING] The compilers we are using have not all implemented this feature yet, even on the v5 toolchain. So we still need to be extremely careful with range expressions that rely on intermediate temporaries.
It would be helpful to read the CppReference on this topic. Some good bug examples are listed in the single-page ISO C++ proposal to fix the problem.
Assertions
This is a large topic. See the Exception Architecture architecture guide.
Logging and Output
We use a custom logging system, documented in the
Logging
architecture guide. Direct output to stdout or stderr streams is only done
by special server code.
Numeric Constants
Large, round numeric constants should be written in a user-friendly way.
-
If a number is derived from a simple numeric expression, expressing it as an expression can help a reader verify and maintain it. For example, prefer
50 * 1024 * 1024to52'428'800. -
Use digit separators
'for large numeric constants. 3-digit groups for decimal. Conventionally, use 4-digit or 8-digit groups for hexadecimal or binary. -
Use a bit-shifted form for power-of-two exponentiation. eg,
1<<13to express 213.
Make sure the "1" is wide enough for the shift if it's large (e.g.uint64_t{1} << 52). A* 1024sequence is also acceptable, as it's a recognizable idiom for kiB and MiB expressions. -
Do not assume suffixes like
ULLwill produce specifically typed quantities likeuint64. Use a numeric literal and the compiler will give it a wide-enough type. Where the exact type matters, use an explicitly typed expression.
const int tenMillion = 10'000'000;
const int miBiByte = 1 << 20;
const uint64 exBiByte = 1ull << 60; // Arithmetic expressions may need a particular type.
const uint32 crc32Polynomial = 0xEDB8'8320;
const uint32 asciiMask = 0b0111'1111;
arrayBuilder.append(uint64_t{1234}); // Force argument type.
Casting
-
Do not use C-style cast syntax (parentheses around the preceding type) ever. See this CGL rule and this Google rule for discussion.
-
Use
static_castas needed. Useconst_castwhen necessary. -
Be aware that
dynamic_cast, unlike other casts, is done at runtime. You should always check fordynamic_cast<T*>returning null pointer. -
reinterpret_castshould be used sparingly. It is typically done for low-level layout conversions and accessing objects in ways that may break the protections of the type system and exhibit undefined behavior if misapplied. -
When down-casting from a base type where the program logic guarantees that the runtime type is correct, consider using
checked_castfrommongo/base/checked_cast.h. It is equivalent tostatic_castin release builds, but adds an invariant to debug builds that ensures the cast is valid.
RAII and Smart Pointers
-
Embrace RAII (Resource Acquisition Is Initialization). This means that resources should generally be managed by objects that automatically release them when going out of scope.
-
By default, the assumption in our codebase is that raw pointers are views/borrows and never owning. Document exceptions to that rule, and try to avoid having owning raw pointers as part of your API.
-
Make heavy use of smart pointers such as
std::unique_ptrandstd::shared_ptr. For some types we useboost::intrusive_ptrinstead. -
Generally, bare calls to
new/deleteandmalloc/freeoutside of the implementation of an RAII type should be red flags and draw extra scrutiny in review. Prefer factory functions likestd::make_uniqueandstd::make_shared. -
Use
ScopeGuardorON_BLOCK_EXITto protect other resources that must be released (e.g.fopen/fclosepairs), or perform some other action when leaving scope. It is often a good idea to put "undo X" logic right after the "do X" logic rather than at the bottom of the function to ensure that the logic stays correct if someone adds an early return or throws. Or, write an object to do this for you via its constructor and destructor.
The WithLock Convention
It is common practice in our codebase for a larger "business logic" class to
have an obvious primary mutex member. These tend to have some private functions
that require that this mutex be held. These functions often take a
WithLock as the first parameter to document the contract and provide some
checking of the callers. The parameter should usually be unnamed. This is a
technical check that forces callers to present a lock-holding resource handle
(e.g. unique_lock) to call the function. See
with_lock.h.
Files (Physical Design)
Components
A component is a grouping of classes, entities, and functions that is built as a single packaged unit. There are 1 or more components in a library. A component should represent a grouping of functionality and interrelated classes and functions that work together.
A component normally consists of a .h, a .cpp, and a _test.cpp file.
Source filenames use lowercase words separated by underscores (i.e. snake_case).
In uncommon cases, there are other files in the component for technical or
internal organizational reasons. These might be a foo_internal.h auxiliary
header, or a foo_test_part4.cpp test fragment, but these extra files are not
meant to serve as its main interface or present its main idea. They're helper
details and they should have the component name as a prefix of their file names.
A component will commonly be dominated by a single dominant class, and for discoverability, it should therefore use that class name, in snake_case, as its filename. That said, we have no rule limiting the number of declarations in a file, and it is useful to define related classes together in a single component.
Using #include
-
To make a declaration available, we require inclusion of a header file that provides it. There should not be any implicit reliance on transitive includes, even if the code compiles. As an exception to this general rule,
foo.cppandfoo_test.cppdo not need to duplicate the includes fromfoo.h. -
Do not make forward declarations to avoid an inclusion. It may be tempting to do this as an optimization, but we don't do it, as there are correctness and modularity risks.
-
Do not include headers that are not needed. Do not blindly copy large blocks of include statements.
-
An "umbrella" interface header may provide several related transitive includes, but these umbrella headers should be documented as such, and they should be provided by the library maintainer. Use IWYU (include what you use) pragma comments to prevent tools and editors from incorrectly auto-suggesting the private headers.
In the public header (e.g.
unittest/unittest.h):#include "mongo/unittest/assert.h" // IWYU pragma: exportIn the private headers (e.g.
unittest/assert.h):// IWYU pragma: private, include "mongo/unittest/unittest.h" // IWYU pragma: friend "mongo/unittest/.*" -
A header should also be "self-contained", and include everything it needs. It must not rely on other headers having been included above it by its users.
-
Use "double quotes" to include headers under
mongo/, and <angle brackets> for headers underthird_party/, or for system libraries. -
Always use the forward relative path from
mongo/src/. "Forward" means to not refer to the parent directory../. -
Don't use
third_party/as part of include paths. Use<>and omit it.#include <boost/optional.hpp> // Yes #include "third_party/boost/optional.hpp" // No: omit "third_party/" and use <> #include "boost/optional.hpp" // No: use <> #include "mongo/db/namespace_details.h" // Yes #include "../db/namespace_details.h" // No: ".." is disallowed
Ordering and Grouping of C++ #include Directives
We have a standard order for the include directives at the top of a C++ file. It is automatically applied by our configuration of clang-format. The purpose of this ordering is to keep the list organized to aid in visual scanning, and to catch headers that are missing includes.
The include directives are organized into several blocks. Within each block, the include directives are sorted alphabetically. Follow each block with a blank line.
-
Main header
For the
.cppand_test.cppfiles of a component, include the component's.hfile if applicable as the first include. This is a safety practice that helps us ensure that a.hfile doesn't rely on any preceding inclusions. -
First-party headers
All include directives using
""and starting withmongo/.E.g.
"mongo/db/db.h". -
C++ stdlib headers
Include directives using
<>, with no/or.in path.E.g.
<vector>,<cmath>. -
Unnamespaced headers
Include directives using
<>, with no/in path. Typically these are system C headers ending in.hE.g.
<unistd.h>. -
Remaining third-party headers
Include directives using
<>, with/in path.E.g.
<boost/optional/optional.hpp>,<sys/types.h>.
To summarize, a typical .cpp file "classy.cpp" might have up to 5 sorted blocks of include directives:
/** (Copyright notice would appear at the top, then...) */
#include "mongo/db/classy.h"
#include "mongo/db/db.h"
#include "mongo/db/namespace_details.h"
#include "mongo/util/concurrency/qlock.h"
#include <cstdio>
#include <string>
#include <unistd.h>
#include <boost/thread/thread.hpp>
Any headers that are conditionally included under the control of #if
directives (if technically possible) will appear after these blocks.
Clang-format will not reorder includes across anything other than a blank line or other includes. In the rare case where some header must be included before or after all other headers, you can use a comment line to separate it from other includes like:
#include <last/normal/header.h>
// This header must be after all others:
#include <a/weird/header.h>
If you see a comment line in old code that is unintentionally preventing proper header ordering, you are encouraged to clean that up when adding or removing includes.
For js Files (JavaScript only)
- Disable formatting for template literals
// clang-format off
newCode = `load("${overridesFile}"); (${jsCode})();`;
// clang-format on
Copyright Notices
- All new C++ files added to the MongoDB code base that will be upstreamed for
public consumption (such as anything upstreamed to
mongodb/mongo) should use the following copyright notice and SSPL license language, substituting the current year forYYYYas appropriate:
/**
* Copyright (C) YYYY-present MongoDB, Inc.
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the Server Side Public License, version 1,
* as published by MongoDB, Inc.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* Server Side Public License for more details.
*
* You should have received a copy of the Server Side Public License
* along with this program. If not, see
* <http://www.mongodb.com/licensing/server-side-public-license>.
*
* As a special exception, the copyright holders give permission to link the
* code of portions of this program with the OpenSSL library under certain
* conditions as described in each individual source file and distribute
* linked combinations including the program with the OpenSSL library. You
* must comply with the Server Side Public License in all respects for
* all of the code used other than as permitted herein. If you modify file(s)
* with this exception, you may extend this exception to your version of the
* file(s), but you are not obligated to do so. If you do not wish to do so,
* delete this exception statement from your version. If you delete this
* exception statement from all source files in the program, then also delete
* it in the license file.
*/
- Enterprise source code is not SSPL, and must bear a shorter copyright notice:
/**
* Copyright (C) YYYY-present MongoDB, Inc.
*/
Basic Formatting Conventions in C++ Code
There are several matters of file formatting expected in source files, and we enforce these when we can. If you use our recommended config for VSCode, much of this will be handled automatically for you.
Whitespace
-
Use spaces, no TAB characters.
-
4 spaces per indentation.
-
Limit lines to 100 columns.
-
Use Posix text format for source files. All lines (including the final line) end with a LF (ASCII "line feed" aka
\n) character. We don't use the Windows CRLF (\r\n) line endings in source files.In VS Code,
files.eolshould be set to "\n", andfiles.insertFinalNewlineset to true to help with this. A Git config option on Windows can convert line endings automatically (core.autocrlf).
Braces
Our braces style is that the opening brace appears at the end of the line. We
do not open a new line just for the opening brace that is part of a control flow
structure (if, while, etc).
Braces are optional for sufficiently simple statements.
if (condition)
doStuff();
if (condition) {
doStuff();
}
while (condition)
doStuff();
while (condition) {
doStuff();
}
do {
doStuff();
} while (condition);
ESLint (JavaScript only)
All JS files must be linted by ESLint before they are formatted by clang-format.
We use ESLint to lint JS code. ESLint is a JS
linting tool that uses the config file located at .eslintrc.yml, in the root
of the mongo repository, to control the linting of the JS code.
Plugins are available for most editors that will automatically run ESLint on file save. It is recommended to use one of these plugins.
Use the wrapper script buildscripts/eslint.py to check that the JS code is
linted correctly as well as to fix linting errors in the code. This wrapper
selects the appropriate version of eslint to be used.
python buildscripts/eslint.py lint # lint js code
python buildscripts/eslint.py fix # auto-fix js code
Clang-Format
All code changes must be formatted by
clang-format before they are
checked in. Use bazel run format to reformat C++ and JS code.
Clang-format is a C/C++ & JS code formatting tool that uses the config files
located at src/mongo/.clang-format and jstests/.clang-format to control the
format of the code. The version and configuration of clang-format is selected by
bazel run format.
Plugins are available for most editors that will automatically run clang-format on file save.
Clang-format is essential, but we should not let it create unreadable code. There are some ways to keep it from producing a mess:
- It will not join a line that ends in a (potentially empty)
//comment. - It also recognizes comma-terminated lists as significant hints.
- As a last resort, it honors
clang-format offandclang-format onin comments. This should only be used where it is really important, since it may result in indentation drift with the surrounding code as we upgrade clang-format or change settings.
void clangFormatExamples() {
// Trailing comma prevents joining braces with data.
std::array arr{
123, 234, 456, 567, 678,
};
std::vector<std::vector<int>> vvi{
{
123,
345,
},
{
456,
},
};
// Just one leading EOL comment '//' prevents joining all lines.
b //
.append(x, 123)
.append(y, 234)
.append(z, 345);
}
// Example tabular data that would be harmed by reformatting.
// clang-format off
#define EXPAND_TABLE(X) \
/* (id, val , shortName , logName , parent) */ \
X(kDefault, = 0 , "default" , "-" , kNumLogComponents) \
X(kAccessControl, , "accessControl", "ACCESS" , kDefault) \
X(kAssert, , "assert" , "ASSERT" , kDefault) \
X(kCommand, , "command" , "COMMAND" , kDefault) \
X(kControl, , "control" , "CONTROL" , kDefault) \
X(kExecutor, , "executor" , "EXECUTOR", kDefault) \
X(kGeo, , "geo" , "GEO" , kDefault)
// clang-format on
Additional Learning Resources
-
Learn C++, free C++ tutorial.
-
CppCon "Back to Basics" track playlist. link
-
"A Tour of C++", Stroustrup. ISBN: 9780133549003
-
"Large-Scale C++: Process and Architecture, Volume 1", Lakos. ISBN 9780133927665
-
All of Herb Sutter's "Exceptional" series of books.
-
All of Alexandrescu books
-
All of Scott Meyer's "Effective" books (getting very old but still great)
References
-
MongoDB C++ Style Guide Proposals Roadmap and suggestion box for this document.
-
Server Code Style on mongo github wiki to be replaced by this document.
-
Google C++ Style Guide We used to default to this for all things not explicitly covered by our own guide, but that is no longer the case.
-
C++ Core Guidelines Interesting reading. Diverges significantly at times from our style.
-
cppreference.com The best C++ reference site
-
Compiler Explorer Great for demonstrating C++ ideas on multiple compilers.
-
VSCode workspace file A default configuration for server engineers who use VSCode. It's configured to handle editor configuration and formatting issues in accordance with this guide.