Under the hood

This article describes the basics of how Megalo code is compiled and stored in a saved gametype. You don't need to know any of this in order to write Megalo scripts; this should be considered advanced knowledge.

Basic structure of a script

Megalo scripts are organized into triggers, which map roughly to blocks of code in a script. The code itself consists of conditions and actions which are executed in sequence when the game engine executes the trigger as a whole. Triggers have additional data allowing them to specify their overall type (e.g. what kind of for-loop, if any, they are, and what Forge label they act on) and what event they handle.

In order to nest a block of code, the game compiles the inner block as its own trigger separate from the outer block; the inner block is marked as a "subroutine" block, which distinguishes it from a top-level trigger so that the game won't blindly execute it when scanning through the trigger list. Then, the outer block has a "call" action compiled in: this action refers to another trigger by index and directs the game to execute that trigger's contents.

ReachVariantTool abuses this implementation of nesting in order to offer user-defined functions: we compile the function's body as an independent trigger that has been marked as an inner trigger, and we use the "call" action to... well, call it. You could think of a user-defined function as an inner block that's nested under multiple outer blocks, existing in multiple places at once.

Storage

Gametype script code uses a struct-of-arrays approach for storage. Rather than physically locating conditions and actions inside of the data for their containing trigger, Halo: Reach instead stores a list of every condition in the script, a list of every action in the script, and a list of every trigger in the script.

Triggers contain a data structure that is, internally, called an "action scope." This data structure contains the indices of the trigger's first condition and first action, as well as the number of conditions and actions inside of the trigger: the trigger defines its contents as slices of the condition and action arrays. This approach on its own would not allow conditions and actions to be interleaved; Bungie accomplished that by having each condition specify the index (within the containing trigger, not the overall list) of the action it precedes.

(This is all part of the overall design for game variant data: nothing inside of a game variant is heap-allocated, and there don't appear to be any internal pointers. All lists are stored in fixed-size memory regions. This extends even to the string table: the max uncompressed size is 0x4C00 bytes, and so there's a region that large within the in-memory game variant struct.)

A potential optimization

It would hypothetically be possible to "overlap" triggers. If trigger A's actions (ignoring any interleaved conditions) are exactly identical to a subset of trigger B's actions (again ignoring any interleaved conditions), then a game variant would really only need to store trigger B's actions, with trigger A using a slice of those. (Sharing conditions would be harder, since conditions store additional data values that help to define the structure of their containing trigger (see next section).)

Unfortunately, implementing this optimization within ReachVariantTool, specifically, would be highly impractical. ReachVariantTool stores triggers in memory without using the "slices of global lists" approach used by the file format itself. In memory, each loaded trigger stores a std::vector<Opcode*>, and when saving, we traverse each trigger to build the "all conditions" and "all actions" arrays and serialize triggers as slices of each. In other words: for the compiler to apply this optimization, it'd have to be carefully designed to predict the future, in a way that feels like a breeding ground for potential bugs.

In order for this optimization to be viable within ReachVariantTool, the program would have to be redesigned to store a Megalo script in memory the same way it's stored in a game variant file: an array of all conditions; an array of all actions; and triggers (and action scopes generally) defining slices of each array. Triggers could at least be given accessors that adapt the data, so it can still be accessed as a flat, sequential, interleaved list of opcodes per-trigger.

Execution

In order to execute a trigger, the game loops over every action inside the trigger. For each action, we loop over all conditions that we haven't looped over before, and that come before the current action. If those conditions, in sum, fail, then we stop trigger execution; otherwise, we execute the current action. Conditions, then, act as early-returns from the current block. There are no "if" blocks in Megalo; rather, there are bare blocks which contain conditions.

The qualifier "in sum" here refers to the fact that conditions can be linked with "and" relationships or, alternatively, with "or" relationships. Conditions each have an "or group" value, an arbitrary integer; consecutive conditions with the same "or group" value are considered "or"-linked. Conditions fail in sum if any "and"-linked condition fails, or if all of a set of "or"-linked conditions fail.

MegaloActionScope::Execute(const MegaloScriptCode& script) const {
   size_t ci = 0; // condition index
   for(size_t ai = 0; ai < this->action_count; ++ai) {

      int32_t current_or_group = -1;

      // True if all conditions in the current "or group" have matched. 
      // An "or group" is a group of conditions that have been OR'd 
      // together.
      bool group_still_valid = true;

      for(; ci < this->condition_count; ++ci) {
         const auto& condition = script.conditions[this->condition_start + ci];

         if (condition.exec_before > ai)
            //
            // This condition applies to actions later in the trigger.
            //
            break;

         if (condition.or_group != current_or_group) {
            if (!group_still_valid)
               //
               // If we reach the end of an "or group" and none of the 
               // conditions have matched, then stop the trigger.
               //
               return;
            current_or_group  = condition.or_group;
            group_still_valid = false;
         }
         group_still_valid |= condition.Execute();
      }
      if (!group_still_valid)
         return;

      const auto& action = script.actions[this->action_start + ai];
      action.Execute();
   }
}

Because conditions act as an early-return mechanism for their containing block, they don't always require a separate block of their own; a condition's influence runs to the end of its containing block. If you wish to have some sequence of actions where only the middle few actions are conditional, however, then you must split that sequence of actions into a nested trigger, which will limit those conditions' influence.

Inlined triggers

The mid-July 2023 update to Halo: The Master Chief Collection backported a number of scripting features from Halo 4 to Halo: Reach. One of these is the begin action, which allows one to nest a block of script code without having to create an entire new trigger for it (thereby avoiding the limit on how many ordinary triggers a game variant can contain).

The implementation is simple. The action's sole parameter is an action scope — the data structure that triggers use to identify what conditions and actions they contain. In other words, the action scope for the nested block is inlined directly into the action used to execute that block.

References

There are five kinds of values to which one can define a reference: integers, objects, players, teams, and timers. These can be split into two categories: numeric references (though MegaloEdit uses the term "custom") and handle references, the latter so named because Halo: Reach's engine generally uses handles rather than pointers for working with objects, players, and teams.

Explicit handle references are single values — enums, which indicate specific, globally-accessible handles; for example, current_object, local_player, and all global and temporary handle variables in Megalo.

enum class explicit_object_handle : int8_t;
enum class explicit_player_handle : int8_t;
enum class explicit_team_handle   : int8_t;

union explicit_handle_union {
   int8_t untyped = -1; // "none"

   explicit_object_handle object;
   explicit_player_handle player;
   explicit_team_handle   team;
};

Handle reference operands in Megalo, then, resemble the following, using object handles as an example:

enum class object_reference_kind : uint8_t;

class object_reference {
   public:
      object_reference_kind  kind; // like a tagged union's tag
      explicit_handle_union  top_level;
      std::optional<uint8_t> member_index;
};

If the operand refers directly to an explicit object handle, then the kind enum will indicate this, and the top_level field will specify which handle is being referred to, and the member_index field will be absent. If the operand refers to an object that is the member of some other handle, then the kind enum will indicate that; the top_level field will specify the handle, and the member_index field will specify which object handle member is being referred to.

Complicating matters slightly is the fact that properties are also represented in this system. If the operand is a reference to a player's biped, then this is indicated via the kind; the member_of and target_index fields hold the same values that they would for a player_reference referring to that same player.

RVT-syntax reference top_level member_index
no_object -1
global.object[T] T
global.object[T].object[M] T M
current_player.object[M] 24[note 1] M
temporaries.object[T] 21 + T
current_player.biped 24[note 1]
current_player.player[M].biped 24[note 1] M
[note 1:] In this case, 24 is a value in the explicit_player_handle enum, which corresponds to current_player.

Numeric references are slightly different. First, I'll illustrate this with timers:

enum class timer_reference_kind : uint8_t;

class timer_reference {
   public:
      timer_reference_kind  kind; // like a tagged union's tag
      explicit_handle_union member_of;
      uint8_t               target_index;
};

If the operand is a reference to a built-in timer, like the game's round timer, then this is indicated via the kind; neither of the other fields are present in the serialized data. If the operand is a reference to a globally-scoped timer, then this is indicated via the kind, the member_of is absent from the serialized data, and the target_index specifies which timer is being used. If the timer is a member of some other object, then all three fields are used.

RVT-syntax reference member_of target_index
game.round_timer
global.timer[T] T
global.object[M].object[T] M T
current_player.timer[T] 24[note 1] T
[note 1:] In this case, 24 is a value in the explicit_player_handle enum, which corresponds to current_player.

Integer references are essentially the same, except that they can also refer to an integer constant — a signed 16-bit value:

enum class integer_reference_kind : uint8_t;

class integer_reference {
   public:
      integer_reference_kind kind; // like a tagged union's tag
      union {
         int16_t constant;
         struct {
            explicit_handle_union member_of;
            uint8_t               target_index;
         } variable;
      };
};

As with handles, integer property references on handle references (e.g. read-access to a player or team's score) are indicated by the kind enum; member_of and target_index hold the same values that their corresponding fields on handle references, top_level and member_index, would use to refer to the handle on which the property is being accessed. (Perhaps, then, it might even be a good idea to separate out the latter two fields into a struct, and add that as a union member in integer_reference?)

Temporary variables

The mid-July 2023 update to Halo: The Master Chief Collection backported a number of scripting features from Halo 4 to Halo: Reach. One of these is temporary variables. Their implementation differs between numeric and handle references, naturally. Temporary integer variables are identified by a kind value and a target_index. Temporary handle variables are identified as new explicit handle references — that is, new values in the explicit_object_handle, explicit_player_handle, and explicit_team_handle enumerations shown above.

Temporary variables don't need to be declared within the gametype metadata; the game always has room for them; they always exist. When you declare a temporary variable in MegaloEdit, the official editor for Halo: Reach gametypes, it handles this by just compiling in an assignment to the initial value you specify. It automatically allocates temporary variables based on block scoping, similarly to allocation to aliases in ReachVariantTool. One important difference between the editors is that MegaloEdit is capable of "spilling over" into unused global variables if a script runs out of temporary variables. MegaloEdit can do this because all non-temporary variables must be declared before all script code, so even with a single-pass parser, it always knows which variables are used and which are not. ReachVariantTool, by contrast, also uses a single-pass parser but allows variable declarations (and implicit instantiation: using an undeclared variable automatically declares it) anywhere in the script, and so when ReachVariantTool encounters an allocation-to-alias, it can't know for sure that any given global variable will stay unused through the rest of the script, and so can't safely spill extra temporaries to globals.

(And in case it needs to be said: when MegaloEdit spills to globals, it still uses block-scoping. Once a spilled temporary variable goes out of scope, the global used to back it is considered free for use by any temporaries that spill later on.)

ReachVariantTool implementation

ReachVariantTool does not make any distinction between numeric and handle references, because at the time I didn't realize there was any such distinction to make. All five reference types are handled uniformly. The components of a reference operand (called a "variable" operand in RVT) are classified based solely on their types, and not on their meanings. This is an unfortunate design flaw arising from an incomplete understanding of Megalo, and it makes ReachVariantTool's code for references harder to follow and reason about.

The kind field is in ReachVariantTool called the operand's "scope" value. The top_level field on handle references and the member_of field on numeric references are treated uniformly and called the "which" value. The member_index field on handle references and the target_index field on target references are treated uniformly and called the "index."

Scope-values are implemented as VariableScopeIndicatorValue instances stored in arrays (one for each reference type); instances effectively serve as annotations for the values of the ..._reference_kind enums. The enum values are annotated with:

Field name Purpose
flags Flags, to track whether this "scope" represents read-only values like player bipeds or the values of Custom Game options, and whether the "scope" is used for references to a variable. (The latter flag is used by the compiler for things like not allowing you to write a variable declaration wherein the declared variable's initial value is another variable.)
base The explicit handle reference's type, if any.
index_type The index's type, if any; or an indicator that this is the "scope" value for an integer constant (represented as enum index_type::generic).
index_bitcount The bitcount for the index. The index's bitcount can be inferred when the reference is to a Megalo number variable; however, when the reference is to some indexed data object, such as a Megalo-defined scoreboard stat or Custom Game option, we use this field to indicate the bitcount needed for the index. We also use this field to indicate the bitcount of integer constants.
indexed_list_accessor

For references to indexed data objects, like Megalo-defined scoreboard stats or Custom Game options, this is a functor that can be used to get a type-erased pointer to the data object in memory, i.e. a pointer to the scoreboard stat definition or Custom Game option definition.

ReachVariantTool allows users to reorder most indexed data options: Custom Game options, scoreboard stats, and similar are defined and edited through the GUI and can be rearranged. The naive approach to this would be to have all script operands store raw indices, and iterate through the entire script to update every operand whenever the user reorders something. The approach ReachVariantTool takes is to have these indexed data objects all inherit from a common base class that tracks its own index within its containing list (i.e. the n-th scoreboard stat knows that it is the n-th scoreboard stat), and have script operands store pointers (typed to the base class) to the indexed data objects. When we go to save the gametype to a file, the operand can simply ask the indexed data object for its index and then serialize that index.

Which-values are implemented as VariableScopeWhichValue instances stored in arrays, with the instances having names and flags.

Indexes are implemented as simple integers. When the reference is to an indexed data object, they are paired with smart pointers that manage refcounts on the indexed data objects (another feature of the base class described above, used to allow end users to delete indexed data objects if they aren't in use by the script).

ReachVariantTool's compiler

ReachVariantTool uses a mostly-single-pass compiler for its Megalo dialect. Parsing and compilation are done as a single step, and the resulting grammar is context-dependent rather than context-free. There is not any formal grammar; rather, the grammar is entirely a side-effect of how the parser is implemented.

(This is not a terribly good way to design or implement a scripting language, but in my defense, I had never come across Robert Nystrom's excellent book Crafting Interpreters before.)

The compiler reads your script source code and instantiates objects which represent the basic elements of a script: blocks (including user-defined functions), comparisons, statements, aliases, and user-defined enums. The script is represented as a tree structure wherein blocks are branch nodes, comparisons act as annotations to blocks, and statements and aliases are leaf nodes contained within a block. Additionally, during parsing, the compiler retains its own set of unowned pointers to keep track of what aliases, user-defined functions, and user-defined enums are in scope. (User-defined enums are tracked only by the compiler and do not exist within the block hierarchy; however, each enum has a pointer to the block in which it was defined, which allows enums to be block-scoped: we know when to delete them.)

Statements are compiled into Megalo opcodes and operands as they are read. Blocks are not compiled immediately as they're read; rather, top-level blocks are compiled once their end is fully parsed. This is necessary in order to properly handle if-blocks.

Variable references are parsed as they are seen. ReachVariantTool only stores enough data to reliably identify a valid variable reference — so, fields for a top-level variable, nested variable, property, and accessor; as well as fields for integer constants. Invalid variable references (e.g. references nested too deeply) can't be retained at all, which also means that strict typing cannot be enforced for them. For example, ReachVariantTool cannot emit additional errors if you try to compare global.object[0].number[1].player[2] to no_team.

The parser has a limited backtracking capability: handlers for keywords and other language constructs can create a backup of its current position, and then restore that backup, essentially allowing the compiler to rewind to places it's already been. This is used for things like overload resolution (i.e. game.send_incident): because parsing and compiling are done as a single step, the only way to tell which overload the user is invoking is trial-and-error. Function calls are compiled by finding an opcode that has a matching name and has been designated as a function, and attempting to compile each of the passed-in arguments into operands for that opcode.