48 KiB
IDL (Interface Definition Language)
- IDL (Interface Definition Language)
Interface Definition Language (IDL) is a custom Domain Specific Language (DSL) originally designed to generate code to meets MongoDB's needs for handling BSON. Server parameters and configuration options support was added later. It uses YAML 1.1 to defined a custom IDL and generates C++ code. IDL is primarily written in Python 3 (Python 2 originally) in the [buildscripts/idl/] directory. It has C++ support code for the generated code in the [src/mongo/idl] directory. The config option parsing support code is in [src/mongo/util/options_parser]. It is inspired by other IDL languages like XDR, ASN.1, MIDL, and Google's Protocol Buffers.
Key Features
- Generate C++ classes that represent BSON documents of a specific schema, and parse/serialize
between BSON documents and the class (aka
structin IDL ) usingBSONObjandBSONObjBuilder - Generate C++ classes that represent MongoDB BSON commands of a specific schema, and
parse/serialize between BSON documents
- Commands are a subset of
structbut understand the unique requirements of commands. Also, can parseOpMsg's document sequences.
- Commands are a subset of
- Parse and serialize Enums as strings or integers
- Declare, parse and serialize server parameters (aka setParameters)
- Declare, parse and serialize configuration options
- C++ Exception based design. IDL throws C++ exceptions on errors, does not use
Status/StatusWith
Overview
IDL is like a giant script that prints C++ code. For each invocation of the IDL compiler idlc.py, there are two files generated, a header and a source file. For instance,
python buildscripts/idl/idlc.py src/mongo/idl/unittest.idl
generates two files when invoked:
build/opt/mongo/idl/unittest_gen.h
build/opt/mongo/idl/unittest_gen.cpp
The generated files always have a suffix of either _gen.h or _gen.cpp. These files are human
readable and the output tries to match MongoDB's C++ style.
Important: At the top of each file is warning about modifying the generated file by hand. Any modifications to the files are lost when the build is rerun since the build regenerates the files. Also, the command used to regenerate the file is at top if one wants to generate a file without running the build system.
Getting started
IDL is wide spread across the code base. Existing IDL files are good examples of how to use IDL. A
good reference is src/mongo/idl/unittest.idl which tests all IDL features.
IDL automates the tedious work of writing BSON parsers. Before IDL, a developer would need to write code to read the document and then add tests cases to validate the parser worked by design. IDL eliminates the need to write hand-written parsers and the test burden they incur.
Example Document:
{
"intField": 42,
"stringField": "question"
}
to represent this in IDL, write the following file:
src\mongo\example\example.idl:
global:
cpp_namespace: "mongo"
imports:
- "mongo/db/basic_types.idl"
structs:
example:
description: An example struct
fields:
intField:
type: int
stringField:
type: string
The next step is to actually generate code from the YAML description. To do that, add the following
to a BUILD.bazel file:
src\mongo\example\BUILD.bazel:
mongo_idl_library(
name='example',
src=[
'example.idl',
],
deps=[
'//src/mongo/idl:idl_parser',
],
)
Bazel knows how to invoke the IDL compiler and generate files in the build directory with the C++
code. This code can also be generated by --build_tag_filters=gen_source tag in bazel which is useful for
code navigation.
The generated IDL code looks something like the simplified code below.
build\<variant_director>\mongo\example\example_gen.h:
/**
* An example struct
*/
class Example {
public:
Example(std::int32_t intField, std::string stringField,
boost::optional<SerializationContext> serializationContext = boost::none)
void serialize(BSONObjBuilder* builder) const;
BSONObj toBSON() const;
static Example parse(const IDLParserContext& ctxt, const BSONObj& bsonObject);
std::int32_t getIntField() const;
void setIntField(std::int32_t value);
StringData getStringField() const
void setStringField(StringData value);
private:
std::int32_t _intField;
std::string _stringField;
};
IDL generates 5 sets of key methods.
constructor- a C++ constructor with only the required fields as argumentsparse- a static function that parses a BSON document to the C++ classserialize/toBSON- a method that serializes the C++ class to BSONget*- methods to value of a field after parsingset*- methods to set a field in the C++ class before serialization
To use this class in a C++ file, write the following code:
src\mongo\example\example.cpp:
#include "mongo/example/example_gen.h"
bool is42(BSONObj& doc) {
Example example = Example::parse(IDLParserContext("root"), doc);
return doc.getIntField() == 42;
}
If there are any problems parsing, the generated parser throws an exception. More details on the various features of IDL are described in the sections below.
Getting Started with Commands
Commands are a subset of structs. All commands are structs but not all structs are commands.
Commands are part of the MongoDB RPC protocol. As such, commands have special rules like the first
field of the command must be its name. IDL supports the unique needs of commands with additional
fields on the commands object.
The special features/requirements of commands:
- First element must match the name of the command, and the parsing rules of this element
can be customized via the
namespacefield. - In
OP_MSG,$dbmust be present or defaults toadmin - Commands may have a
structas a reply - Commands may be a part of API Version 1
- Any structs marked with
is_generic_cmd_list: "arg"that are in imported IDL files will automatically be chained to all commands. The IDL compiler importsgeneric_argument.idlby default, so any generic argument struct defined in that file will be chained to all commands by default. - Command replies ignore the generic arguments fields like
$clusterTime,ok, etc during parsing. The list of these fields is ingeneric_argument.idl.
Example Command:
{
"hasEncryptedFields": "testCollection",
"encryptionType": "queryableEncryption",
"comment": "Example command",
"$db": "testDB"
}
which has a reply
{
"answer": "yes",
"ok": 1
}
to represent this in IDL, write the following file:
src\mongo\example\example_command.idl:
global:
cpp_namespace: "mongo"
imports:
- "mongo/db/basic_types.idl"
structs:
hasEncryptedFieldReply:
is_command_reply: true
fields:
answer:
type: string
commands:
hasEncryptedFields:
description: An example command
namespace: concatenate_with_db
fields:
encryptionType:
type: string
To see how to integrate a command IDL file in Bazel, see the example above for structs.
The IDL file
A IDL file consist of a series of top-level sections (i.e. YAML maps).
global- Global settings that affect code generationimports- List of other IDL files that contain enums, types and structs this file refers toenums- List of enums to generate code fortypes- List of types which instruct IDL how deserialize/serialize primitivesstructs- List of BSON documents to deserialize/serialize to C++ classescommands- List of BSON commands used by MongoDB RPC to deserialize/serialize to C++ classesserver_parameters- See docs/server_parameters.mdconfigs- TODO SERVER-79135feature_flags- TODO SERVER-79135
Global
-
cpp_namespace- string - The C++ namespace for all generated classes and enums to belong to. Must start withmongo. -
cpp_includes- sequence - A list of C++ headers to include in the generated.hfile. You should not list generated IDL headers here as includes for them are automatically generated fromimports. -
configs- map - A section that defines global settings for configuration options-
source- sequence - a subset of [yaml,cli,ini]cli- configuration option handled by command lineyaml- configuration option handled by yaml config fileini- configuration option handled by deprecated ini file format. Do not use for new flags.
-
section- string - Name of displayed section in--help -
initializer- map-
register- string - Name of generated function to add configuration options.If not provided, an anonymous MONGOMODULE_STARTUP_OPTIONS_REGISTER initializer will be declared which will automatically register the config settings named in this file at startup time. This initializer will be named "idl" followed by a string of hex digits. Currently this string is the SHA1 hash of the header's filename, but this should not be used in dependency rules since it may change at a later time.
If provided, all registration logic will be implemented in a public function of the form Status registerName(optionenvironment::OptionSection* options_ptr). It it up to additional code to decide how and when this registration function is called.
-
store- string - Name of generated function to store configuration options.This behaves like
register, but using a MONGO_STARTUP_OPTIONS_STORE initializer in the not-provided case, and declaring Status storeName(const optionenvironment::Environment& params) in the provided case.
-
-
An example for a typical global section is:
global:
cpp_namespace: "mongo"
cpp_includes:
- "mongo/idl/idl_test_types.h"
mongo is the C++ namespace for the generated code. One header is listed because the IDL types
depend on it in this imaginary example.
Imports
The imports section is a list of other IDL files to include. If your IDL references other enums,
types, or structs, the imports section lists IDL file with the definition or IDL throws an error.
Note: The IDL compiler does not generate code for imported things, it generates code for the file
listed on the command line. For instance, if your IDL file imports a struct named ImportedStruct,
the generated code calls its ImportedStruct::parse function but does not generate the
ImportedStruct::parse definition or declaration.
The imports are transitive. The IDL compiler will recursively import all IDL files imported by
other IDL files. IDL will also implicitly de-duplicate imports and only process each file once. The
de-duplication is similar to how #pragma once works in C++.
IDL generates a C++ include for the generated headers of each IDL file in the generated code.
An example for a typical imports section is:
imports:
- "mongo/db/basic_types.idl"
Note: src/mongo/db/basic_types.idl is a foundational file for IDL. This file defines the standard types of IDL. Without this file, IDL does not know how to read and write a string or integer for instance.
Enums
The enums section is a YAML map that allow integer and string enumeration. These both map to
C++ enums, but differ in whether they parse integers or strings in a bson document.
String Enums
Used to map a string value to a C++ enum value. In this case, the values of the enums themselves are
not important. Use string enums when strings are persisted, not integers. For string enums, the
values map is a map of enum value names to strings.
StringEnum:
description: "An example string enum"
type: string
values:
s0: "zero"
s1: "one"
s2: "two"
it generates an enum and functions to parse and serialize the enum:
enum class StringEnumEnum : std::int32_t {
s0,
s1,
s2,
};
StringEnumEnum StringEnum_parse(const IDLParserContext& ctxt, StringData value);
StringData StringEnum_serializer(StringEnumEnum value);
Integer Enums
Used to map a integer value to a C++ enum value. In this case, the values of the enums themselves
are important unlike string enums. Use integer enums when integers are persisted. For integer enums,
the values map is a map of enum value names to integers.
IntEnum:
description: "An example int enum"
type: int
values:
s0: 0
s1: 2
s2: 4
it generates an enum and functions to parse and serialize the enum:
enum class IntEnum : std::int32_t {
kS0 = 0,
kS1 = 2,
kS2 = 4,
};
IntEnum IntEnum_parse(const IDLParserContext& ctxt, std::int32_t value);
std::int32_t IntEnum_serializer(IntEnum value);
Reference
Each enum can have the following pieces:
description- string - A comment to add to the generated C++type- string - can be eitherstringorintvalues- map - a map ofenum value name->enum value
Like struct.fields[], enum.values[] may be given as either a simple mapping name: value and indeed
most are, but the may also map names to a dictionary of information:
IntEnum:
description: "An example int enum"
type: int
values:
s0:
description: Nothing, nada, zip.
value: 0
s1: 2
description: 2 to the first power
value: 2
s2: 4
description: 2 squared!
value: 4
This is not needed in a lot of cases, but in some places it provides good documentation (which will be surfaced in the generated files as well) for future readers.
There's also a third, very rarely used property for enum values called extra_data. You can see an
example of this in src/mongo/db/auth/action_type.idl where the
ResourcePattern enum correlates itself to permitted ActionType enums allowed in Serverless. This
data gets used in
src/db/auth/authorization_session_impl.cpp.
Types
A type declares all the information that IDL needs to know to read and write a C++ type from/to
BSON. Types are typically string values but can be anything such as documents. They are the main
extensibility point into IDL for C++ code. They allow users to incrementally adopt IDL in their
parsing. This means that not all structs have to be defined in IDL for IDL to be useful. Finally,
types allow users to customize IDL parsing for their own unique needs.
A field in a struct or command can be defined as a type but a field can also be an array, enum,
struct or variant. Declaring a field as something other then a type preferred to using types since
it allows more type information to be represented in IDL over C++. See type in the field
reference for more information.
Type supports builtin BSON types like int32, int64, and string. These are types built into
BSONElement/BSONObjBuilder. It also supports custom types to give the code full control of
parsing and serialization. Note: IDL has no builtin types. The
src/mongo/db/basic_types.idl file declares all common BSON types and must
be manually imported into every file. This separation makes unit testing easier and allows IDL to be
extendable by separating most type concerns from the python code.
The declaration of a type does not generate any code. The code for a type is generated once it is instantiated in a struct or command.
Type Overview
Kinds of types:
- Types builtin to
BSONElement/BSONObjBuilder. The src/mongo/db/basic_types.idl file declares all common BSON types - Types with custom deserialization/serialization behavior but rely on IDL to validate BSON types. For example, IDL checks if a type is string before calling deserialize.
- Types that want full control of deserialization/serialization. Specify
anyas thebson_serialization_type.
Basic Types
Here is a basic type definition for the string type.
string:
bson_serialization_type: string
description: "A BSON UTF-8 string"
cpp_type: "std::string"
deserializer: "mongo::BSONElement::str"
is_view: false
The five key things to note in this example:
bson_serialization_type- a list of types BSON generated code should check a type is before calling the deserializer. In this case, IDL generated code checks if the BSON type isstring.cpp_type- The C++ type to store the deserialized value as. This is type of the member variable in the generated C++ class when this type is instantiated in a struct.deserializer- a method to all deserialize the type. Typically this is a function that takesBSONElementas a parameter. The IDL generator has custom rules forBSONElement.serializer- omitted in this example becauseBSONObjBuilderhas builtin support forstd::stringis_view- indicates whether the type is a view or not. If the type is a view, then it's possible that objects of the type will not own all of its members. If the type is not a view, then objects of the type are guaranteed to own all of its members. This field is optional and defaults to True. To reduce the size of the C++ representation of structs including this type, you can specify this field as False if the type is not a view type.
Custom Types
Here is a more interesting example for mongo::NamespaceString. A NamespaceString is a BSON string
but has custom serialization rules.
namespacestring:
bson_serialization_type: string
description: "A MongoDB NamespaceString"
cpp_type: "mongo::NamespaceString"
serializer: ::mongo::NamespaceStringUtil::serialize
deserializer: ::mongo::NamespaceStringUtil::deserialize
deserialize_with_tenant: true
is_view: false
The key thing to note is this example specifies that both deserializer and serializer. They are
both prefixed with :: which tells IDL these are global static functions, not members of the C++
type mongo::NamespaceString. This also impacts what is passed to the function. Global static (or
free) serializer functions get the instance as the first arg, while member methods do not (because
they have access to this).
Any Types
any types are the escape hatch of the IDL type system. Use any types when custom types are not
flexible enough. This is often used to deal pre-IDL fields/structs. IDL any types are responsible
for their own BSON type checking. They are also responsible for serializing the field name itself in
the BSON. IDL provides any type serializers with the field name but any types are responsible for
actually writing it to the BSONObjBuilder.
IDLAnyType:
bson_serialization_type: any
description: "Holds a BSONElement of any type."
cpp_type: "mongo::IDLAnyType"
serializer: mongo::IDLAnyType::serializeToBSON
deserializer: mongo::IDLAnyType::parseFromBSON
is_view: true
Type Reference
description- string - A comment to add to the generated C++bson_serialization_type- string or sequence - a list of types BSON generated code should check a type is before calling the deserializer. Can also beany. buildscripts/idl/idl/bson.py lists the supported types.bindata_subtype- string - ifbson_serialization_typeisbindata, this is the required bindata subtype. buildscripts/idl/idl/bson.py lists the supported bindata subtypes.cpp_type- The C++ type to store the deserialized value as. This is type of the member variable in the generated C++ class when a struct/command uses this type.std::string- When usingstd::string, the getters/setters usingmongo::StringDatainsteadstd::vector<_>- When usingstd::vector<->, the getters/setters usingmongo::ConstDataRangeinstead
deserializer- string - a method name to all deserialize the type. Typically this is a function that takesBSONElementas a parameter. The IDL generator has custom rules forBSONElement. - By default, IDL assumes it is a instance methods ofcpp_type. - If prefixed with::, assumes the function is a global static function - By default, the deserializer's function signature is<function_name>(<cpp_type>). - Forobjecttypes, the deserializer's function signature is<function_name>(const BSONObj& obj)- Foranytypes, the deserializer's function signature is<function_name>(BSONElement element).serializer- string -a method name to all serialize the type. - By default, IDL assumes it is a instance methods ofcpp_type. - If prefixed with::, assumes the function is a global static function - By default, the deserializer's function signature is<type_append> <function_name>(const <cpp_type>&)wheretype_appendis a typeBSONObjBuilderunderstands. - Forobjecttypes, the deserializer's function signature is<function_name>(const BSONObj& obj)- Foranytypes that are not in an array, the serializer's function signature is<function_name>(StringData fieldName, BSONObjBuilder* builder). - Foranytypes that are in an array, the serializer's function signature is<function_name>(BSONArrayBuilder* builder).deserialize_with_tenant- bool - if set, addsTenantIdas the first parameter todeserializerinternal_only- bool - undocumented, DO NOT USEdefault- string - default value for a type. A field in a struct inherits this value if a field does not set a default. See struct'sdefaultrules for more information.is_view- indicates whether the type is a view or not. If the type is a view, then it's possible that objects of the type will not own all of its members. If the type is not a view, then objects of the type are guaranteed to own all of its members.
Structs
Structs are the main IDL feature. They are used to serialize and deserialize a BSON document to C++.
A struct consists of a description, a set of optional flags and a sequence of fields. Commands are a
separate feature of IDL designed to handle the unique needs of commands (such as the first field of
a command is its name, common fields across commands). See commands below
The generated C++ parsers for structs are strict by default. This means that they throw an error on
fields that do not know about. Use strict: false to change this behavior. Mark persisted structs
with strict: false for future backwards compatibility needs.
A struct consists of one or more fields. All fields are required by default. The generated parser
errors if field is missing from the BSON document. On serialization, if a field has not been set,
the serializer calls invariant.
Fields can optionally be marked as optional and stored as boost::optional. These fields are
optional in the BSON document and the parser does not throw an error if missing. Also, they are not
required to be set before serialization. Non-optional fields can also have a default value. If a
field has a default, then it does not need to present in the BSON document or set in a setter.
exampleStruct:
description: An example command
fields:
requiredField: int
optionalField:
description: Provide it if you want to.
type: bool
optional: true
defaultedField:
description: >-
Most callers should rely on 42
as it is the answer to the question
of life the universe and everything.
type: long
validator:
gt: 0
lt: 50
default: 42
This generates a C++ function with methods to parse and serialize the struct. Note: This code has been simplified from the full code IDL generates.
class ExampleStruct {
public:
static constexpr auto kValueFieldName = "value"_sd;
ExampleStruct(boost::optional<SerializationContext> serializationContext = boost::none);
ExampleStruct(mongo::BSONObj value, boost::optional<SerializationContext> serializationContext = boost::none);
void serialize(BSONObjBuilder* builder) const;
BSONObj toBSON() const;
static ExampleStruct parse(const IDLParserContext& ctxt, const BSONObj& bsonObject);
std::int32_t getRequiredField() const { return _requiredField; }
void setRequiredField(std::int32_t value) { _requiredField = std::move(value); }
boost::optional<bool> getOptionalField() const { return _optionalField; }
void setOptionalField(boost::optional<bool> value) { _optionalField = std::move(value); }
std::int64_t getDefaultedField() const { return _defaultedField; }
void setDefaultedField(std::int64_t value) { validateDefaultedField(value); _defaultedField = std::move(value); }
const mongo::SerializationContext& getSerializationContext() const { return _serializationContext; }
void setSerializationContext(mongo::SerializationContext value) { _serializationContext = std::move(value); }
protected:
void parseProtected(const IDLParserContext& ctxt, const BSONObj& bsonObject);
BSONObj _anchorObj;
private:
string _value;
};
The IDL serializers take mongo::SerializationContext which a class to provide to the functions
that serialize mongo::NameSpaceString and mongo::DatabaseName. For more details see
src/mongo/util/serialization_context.h.
BSON Lifetime
By default IDL parsers do not hold a reference to the BSONObj they parse. In the typical case, of
parsing a command from the network, this is fine since the network buffer outlives the generated
parser. But in other cases, you may want to anchor the BSONObj to the IDL generated parser. To do
this call parseOwned instead of parse.
BSONObj behaves as either a view type (i.e. StringData) or owned type (i.e. std::string). In
the first case, a view type, it is a const char* pointer to a block if memory. It does not control
the lifetime of the memory. In the second case, the owned case, it a block of memory with the first
8 bytes being pair of [uint32, uint32]. The first member is a reference count using
boost::intrusive_ptr and the second is the length of the bson document. The rest of the BSON
document is adjacent to this the second uint32. In this second case, every copy of the BSONObj
increments the reference count and when the reference count drops to zero, the BSONObj deletes the
memory block.
A unowned BSONObj can be converted to a owned type with the method getOwned(). This performs a
memory copy. This method is a no-op if type is already owned.
It can be advantageous to use parseOwned instead of parse since your IDL struct can use object
for fields instead of object_owned which create copies. The parseOwned method only affects the
lifetime of view types like object. IDL deep-copies all other types like string and binary
today.
Chained Structs (aka struct reuse by composition)
Chained Structs is IDL's mechanism of IDL reuse by composition. Chained structs allow re-use of common struct definitions across IDL structs and commands.
For instance, the write commands insert, delete, and update all take
bypassDocumentValidation as an optional field. These write commands share
WriteCommandRequestBase as a chained struct that defines bypassDocumentValidation. By using
chained structs, IDL structs share the definition of the fields. This allows users to write a set of
field definitions once and reuse them across structs.
When IDL generates the classes, the chained structs are available as getters/setters on the generated class. This allows code that works with them to treat the shared IDL struct as a shared C++ class. Code can written once to work with the shared struct without having to resort to C++ templates. The fields of a chained struct are not stored in the parent class, they remain in the child chained struct. Also, chaining does not affect the code generation of any chained structs, only the type declares it wants to include chained structs.
If inline_chained_structs is true, then the members of the chained struct are also available on
the struct including them. This means that instead of users have to call
obj.getChainedStruct.getCommonField(), they can call obj.getCommonField() instead. Field storage
is not affected as this option is only syntactic sugar.
Struct Reference
description- string - A comment to add to the generated C++fields- sequence - see fields attributes reference belowstrict- bool - defaults to true, a strict parser errors if a unknown field is encountered by the generated parser. Persisted structs should set this tofalseto allow them to encounter documents from future versions of MongoDB without throwing an error.chained_structs- mapping - a list of structs to include this struct. IDL adds the chained structs as member variables in the generated C++ class. IDL also adds a getter for each chained struct.inline_chained_structs- bool - if true, exposes chained struct getters as members of this struct in generated code.immutable- bool - if true, does not generate mutable getters for structsgenerate_comparison_operators- bool - if true, generates support for C++ operatiors:==,!=,<,>,<=,>=,non_const_getter- bool - if true, generates mutable getters for non-struct fieldscpp_validator_func- string - name of a C++ function to call after a BSON document has been deserialized. Function has signature ofvoid <function_name>(<struct_name>* obj). Method is expected to thrown a C++ exception (i.e.uassert) if validation fails.is_command_reply- bool - if true, marks the struct as a command reply. A struct marked ais_command_replygenerates a parser that ignores known generic or common fields across all replies when parsing replies (i.e.ok,errmsg, etc)is_generic_cmd_list- string - choice [arg,reply], if set, generates functionsbool hasField(StringData)andbool shouldForwardToShards(StringData)for each field in the struct. If set toarg, the struct will automatically be chained to everycommand.query_shape_component- bool - true indicates this special serialization code will be generated to serialize as a query shapeunsafe_dangerous_disable_extra_field_duplicate_checks- bool - undocumented, DO NOT USE
Struct Fields Attribute Reference
description- string - A comment to add to the generated C++cpp_name- string - Optional name to use for member variable and getters/setters. Defaults tocamelCaseof field name.type- string or mapping - supports a single type,array<type>, or variant. Can also be arrays.- string name of a type must be a
enum,type, orstructthat is defined in an IDL file or imported - string can also be
array<type>where type must be aenum,type,struct, orvariant. The C++ type will bestd::vector<type>in this case - Mappings or Variants - IDL supports a variant that chooses among a set of IDL types. You can
have a variant of strings and structs.
- Variant string support differentiates the type to choose based on the BSON type.
- Variant struct support differentiates the type to choose based on the first field of the
struct. The first field must be unique in each struct across the structs. When parsing a
BSON object as a variant of multiple structs, the parser assumes that the first field
declared in the IDL struct is always the first field in its BSON representation.
See
bulkWritefor an example.
- string name of a type must be a
ignore- bool - true means field generates no code but is ignored by the generated deserializer. Used to deprecate fields that no longer have an affect but allow strict parsers to ignore them.optional- bool - true means the field is optional. Generated C++ type isboost::optional<type>.default- string - the default value of type. Types with default values are not required to be found in the original document or set before serializationsupports_doc_sequence- bool - true indicates the field can be found in aOpMsg's document sequence. Must use the generated<struct>::parse(OpMsgRequest)parser to use thiscomparison_order- sequence - comparison order for fieldsvalidator- see validator referencenon_const_getter- bool - true indicates it generates a mutable getterunstable- bool - deprecated, preferstability=unstableinsteadstability- string - choice [unstable,stable] - ifunstable, parsing the field throws a field if strict api checking is enabledalways_serialize- bool - whether to always serialize optional fields even if noneforward_to_shards- bool - used by generic arg code to generateshouldForwardToShards, no affect on BSON deserialization/serializationforward_from_shards- bool - used by generic arg code to generateshouldForwardFromShards, no affect on BSON deserialization/serializationquery_shape- choice of [anonymize,literal,parameter,custom] - see [src/mongo/db/query/query_shape.h]
Field Validator Reference
Validators generate functions that ensure a value during parse or set in a setter are valid. Comparisons are generated with C++ operators for these comparisons
gt- string - Validates field is greater thanstringlt- string - Validates field is less than or equal tostringgte- string - Validates field is greater thanstringlte- string - Validates field is less than or equal tostringcallback- string - A static function to call of the shapeStatus <function_name>(const <cpp_type> value). For non-simple types,valueis passed by const-reference.
Commands
Commands are a customized version of structs designed for MongoDB RPC. All structs are commands but
not all structs are commands. IDL supports the unique needs of commands with additional fields on
the command object when compared to struct.
The special features:
- First element must match the name of the command, and the parsing rules of this element
can be customized via the
namespacefield. - In
OP_MSG,$dbmust be present or defaults toadmin - Commands may have a
structas a reply - Commands may be a part of API Version 1
- Any structs marked with
is_generic_cmd_list: "arg"that are in imported IDL files will automatically be chained to all commands. The IDL compiler importsgeneric_argument.idlby default, so any generic argument struct defined in that file will be chained to all commands by default. - Command replies ignore the generic arguments fields like
$clusterTime,ok, etc during parsing. The list of these fields is ingeneric_argument.idl.
The namespace field is the field that describes one kind of parameter a command takes.
concatenate_with_db- takes a collection name. Generates a methodconst NamespaceString getNamespace(). Examples:insert,update,deleteconcatenate_with_db_or_uuid- takes a collection name. Generates a methodconst NamespaceStringOrUUID& getNamespaceOrUUID(). Examples:find,countignored- ignores the first argument entirely. Examples:hello,setParameter,pingtype- takes a struct as the first argument. Examples:getLog,clearLog,renameCollection
Commands can also specify their replies that they return. Replies are regular struct with
is_command_reply = true.
Commands Reference
description- see structschained_structs- - see structsfields- - see structscpp_name- - see structsstrict- - see structsgenerate_comparison_operators- see structsinline_chained_structs- see structsimmutable- see structsnon_const_getter- see structsnamespace- string - choice of a string [concatenate_with_db,concatenate_with_db_or_uuid,ignored,type]. Instructs how the value of command field should be parsed -concatenate_with_db- Indicates the command field is a string and should be treated as a collection name. Typically used by commands that deal with collections. Automatically concatenated with$dbby the IDL parser. Adds a methodconst NamespaceString getNamespace()to the generated class. -concatenate_with_db_or_uuid- Indicates the command field is a string or uuid, and should be treated as a collection name. Typically used by commands that deal with collections. Automatically concatenated with$dbby the IDL parser. Adds a methodconst NamespaceStringOrUUID& getNamespaceOrUUID()to the generated class. -ignored- Ignores the value of the command field. Used by commands that ignore their command argument entirely -type- Indicates the command takes a custom type for the first field.typefield must be set.type- string - name of IDL type or struct to parse the command field ascommand_name- string - IDL generated parser expects the command to be named the name of YAML map. This can be overwritten withcommand_name. Commands should becamelCasecommand_alias- string - allows commands to have multiple names. DO NOT USE. Some older commands have bothlowercaseandcamelCasenames.reply_type- string - IDL struct that this command replies with. Reply struct must haveis_command_replysetapi_version- string - Typically set to the empty string"". Only set to a non-empty string if command is part of the stable API. Generates a class name<command_name>CommandNameCmdVersion1Genderived fromTypedCommandthat commands should be derived from.is_deprecated- bool - indicates command is deprecatedallow_global_collection_name- bool - if true, command can accept both collect names and non-collection names. Used by theaggregatecommandaccess_check- mapping - see access check reference
Access Check Reference
A list of privileges the command checks. Only applicable for commands that are a part of API Version 1. Checked at runtime when test commands are enabled.
none- bool - No privileges requiredsimple- mapping - single check or privilegecomplex- sequence - list of check and/or privilege
Check or Privilege
check- string - checks a part of the access control system like is_authenticated. Seesrc/mongo/db/auth/access_checks.idlfor a complete list.privilege- mappingresource_pattern- string - a resource pattern to check for a given set of privileges. SeeMatchTypeenum insrc/mongo/db/auth/action_type.idlfor complete list.action_type- sequence - list of action types the command may check. SeeActionTypeenum insrc/mongo/db/auth/action_type.idlfor complete list.agg_stage- string - aggregation only. Name of aggregation stage. Used to appease the idl compatibility checker.
IDL Compiler Overview
The IDL compiler is organized as a traditional compiler written in Python 3 (originally Python 2)
and is located in buildscripts\idl. It has 3 passes and has two
different tree representations that pass between passes. Having multiple passes reduces the
complexity of each pass by separating tasks across different files.
Here is an example of how IDL processes a file example.idl.
sequenceDiagram
title: IDL Flow for example.idl
participant Compiler
participant Parser
participant Binder
participant Generator
Compiler->>Parser: parser.parse("example.idl")
Parser->>Compiler: syntax.IDLSpec
Compiler->>Binder: binder.bind()
Binder->>Compiler: ast.IDLBoundSpec
Compiler->>Generator: generator.generate_code
Generator-->Generator: generator._generate_header
Generator-->Generator: generator._generate_source
Generator->>Compiler: Ok
compiler.py orchestrates the 3 passes by calling each
one in sequence. For instance, it calls the parser and passes the syntax tree it returns to the
binder. It also fixes up the include files for the generated code.
Trees
- Concrete Syntax - a tree representation of the YAML file. Each item in the tree is 1-1 match for an item in the YAML file. Also stores the symbol table. May have errors in like references to missing types.
- AST - (Abstract Syntax Tree) - a simplified version of the syntax tree. AST tree does not map 1-1 to YAML file as some "internal types" and hidden fields are injected into the tree. The AST tree has no errors.
The two trees (syntax and ast) share just one type common.SourceLocation between them. While it
means there is some duplication between trees, it makes code readability better. If types were
shared between passes but with some fields just read/written in some passes, it would make reasoning
about the code more difficult.
Passes
- Parser - Parses YAML file and produces a concrete syntax tree. Does some error checking like checks for duplicate symbols.
- Binder - Translates concrete syntax tree to ast tree. Does most error checking. Injects hidden fields and other things into AST tree. AST tree is clean of errors after binder.py
- Generator - Writes header file
_gen.hand source file_gen.cpp. Generator does no error checks (it does have a few asserts though) as error checking is the responsibility of earlier passes.
Error Handling and Recovery
IDL compiler does not throw exceptions. The C++ generated code does throw exceptions though. The
compiler adds all errors to the errors.ParserContext in
errors.py. This allows the IDL to capture more than one
error from the user's IDL file and report it to the user. All errors codes start with ID and are
of the format IDNNNN where N is a number. The python unit tests assert these error codes in
negative tests but by using string constants ERROR_ID_... for each error.
Testing
IDL has two sets of tests:
- Python unit tests for the parser and binder - see tests
- C++ Unit Tests for the code generator and generated code - see unittest.idl and idl_test.cpp respectively
Extending IDL
Since IDL is a python script, it is quick to iterate on since it does not need to be compiled. When making changes to IDL, it is recommended to call the IDL compiler directly instead of through the build system. If the IDL scripts are changed, this often triggers all the IDL generated files to be regenerated and then recompiled. It can be faster to just invoke the scripts manually and then invoke the compiler by hand also. Every IDL file has the python invocation to generate it printed at the top of the file.
When extending IDL, add tests to the python unit tests and C++ unit tests. With few exceptions, the unit tests exercise all features and combinations IDL can handle.
Implementaion Details
BSONObj Anchor
The parsing method a struct is initialized with indicates what type of ownership the constructed
object has on the BSONObj parameter. An internal BSONObj anchor ensures that the lifetime of
the BSONObj matches the lifetime of the object in the cases that the BSONObj parameter is
owned or shared.
View Types
If the struct is a view, then it's possible that objects of the type will not own all of its
members. If the struct is not a view, then objects of the type are guaranteed to own all of its
members. This is determined by recursively checking the fields of a struct. This info is used
during generation to determine whether or not a struct will need a BSONObj anchor.
Best Practices
IDL has been in use since 2017. In that time, here are a few best practices:
- strict or non-strict parsers - Structs that are persisted to disk should set
strict: false. It's better for upgrade/downgrade. Commands should setstrict: trueor omit it asstrict: trueis the default. 1. For persistance: For upgrade/downgrade, if a persisted document with a strict parser has a field added in new version N+1 and then the user downgrades to old version N, the strict parser will throw an exception and reject the document. If this document was part of the storage catalog for instance, the server would fail to start. 2. For commands: By using strict parsers, it gives the server the ability to add fields without the risk of clients accidentally sending fields with the same name that had been ignored. - Extending existing structs/commands - all new fields in a struct/command must be marked optional to support backwards compatibility. For new structs/commands, there should be some required fields. It does not matter if the struct is not persisted, non-optional fields break backwards compatibility. If optional fields are not the right fit, a new field can be given a default value instead. Fields with default values are not required fields.
- Lifetime - Use
object_ownedinstead ofobject. If your IDL usesobject, it does not own thatBSONObjthat it is returned from its getter. This means that once theBSONObjthat was passed toparse()goes out of scope, the object will point to free memory. Useobject_ownedif this is not desired.object_ownedincurs extra memory allocations though.- An alternative is to use either the
parseSharingOwnershiporparseOwnedmethods. These methods will ensure the IDL generated class has an anchor to theBSONObj. See comments in the generated class. It is not advisable though to use these methods during normal command request processing. The network buffer that holds the inbound request is available during the lifetime of the request even though IDL does not anchor the network buffer.
- An alternative is to use either the
Developer Workflow
Adding new functionality to IDL should be accompanied by adding tests to idl_test.cpp, adding
tests to buildscripts/idl/tests, and adding the necessary schema to idl_schema.json.