Form versions 43 and 44

Ever since Skyrim Special Edition's release — and the release of several improperly ported Skyrim Classic mods — the community has paid close attention to form versions, the version numbers embedded into every record in an ESP file. It's widely understood that version 43 is specific to Classic and heralds Doom and Death, while version 44 is specific to Special and is Right and Pure. There are xEdit scripts for scanning your load order in search of any version 43 content, and many authors field complaints from users about such content. Yet scattered findings and observations around the community paint a more complicated picture, and suggest that version 43 content isn't inherently dangerous.

This document aims to bring these observations together and present a clearer picture of form versioning. My goal was to combat misinformation which contradicts what is known through reverse-engineering, while also incorporating observations made in other scenes within the Bethesda RPG modding community by people with other specialties. During the process of writing this document, I managed to personally verify some community observations via disassembly, while also clarifying the scope and scale of other phenomena I had noticed in the past. It's my hope that this document can be as educational for others as the process of writing it was for me.

The short version

What are the basic differences between form versions 43 and 44?

The main differences between form versions 43 and 44 are:

This means that data for version 43 is identical to data for version 44 when all of the following are true:

And indeed, if you check the official game files — Dragonborn.esm and such — you'll see both version 43 and version 44 forms inside, along with forms using even older version numbers. Though the publicly-available Creation Kit appears to upgrade the version numbers on all forms when re-saving content, it seems that during development, Bethesda didn't bother to change version numbers consistently when working with data whose format didn't change.

We'll go over Bethesda's efforts to maintain backward-compatibility, and the few mistakes they made in doing so, in a later section.

What are the differences between forms in official content and forms in mods?

There are none. Some mod authors have claimed that Skyrim Special Edition uses special behavior for the official files only, so that those files can have data in version 43 but mods cannot. This is not true. Forms in official content are loaded the same way as forms in modded content.

There are a few reverse-engineers in the community who have analyzed Skyrim's compiled code in order to determine how it works — and how to tamper with it, to make mods that can't be made with the Creation Kit and Papyrus scripting alone. I happen to be one of them. I'm currently working on making my own tools for editing mods, and as part of that process, I've been reverse-engineering how the game loads forms. My focus is on Skyrim Classic, but I've occasionally cross-referenced against Skyrim Special itself and against the findings of the reverse-engineers who specialize in Skyrim Special.

The basic process that the game has for loading forms is as follows:

There are a few extra details, for things like the Creation Kit's collaboration features (which are relevant to the game as well, because the Creation Kit is basically just a heavily hacked copy of the game engine). There are also some special cases for things like game settings (GMST) and NavMeshInfoMaps (NAVI), where they either aren't "real" forms or are otherwise handled in very unique ways. The vast majority of forms follow the two basic steps above, though.

Now, let's dive into some light technical details.

I mentioned that forms load their data using a "virtual function." The game doesn't use the exact same code to load every form, because the forms all have completely different data, often arranged in completely different ways. Rather, every form type has a list of functions attached to it, and the game refers to these by number.

The seventh function in a form's virtual function list is used to load a form's data from an ESP, ESM, or ESL file. Every form type has its own such function, and the program is organized so that the code for loading a form doesn't have to ever bother to check what kind of form it's working with. It knows that for whatever form it has, there will be a function list, and it just needs to grab the seventh function in that list. When reverse-engineers like me look at Skyrim's code, we can actually find these lists, study the functions inside of the list, and work out what each function is supposed to do.

Skyrim's reverse-engineering community is fairly good about documenting our findings, so those lists are publicly visible in places like CommonLibSSE's source code. As of this writing, the TESForm class, used as a base for all form types, has in Skyrim Special and in Skyrim Classic the same number of virtual functions, and all functions that have been identified across both games are an exact match. There isn't some special set of functions used specifically for official content only.

This means that if a given piece of data can safely use version 43 in official files, then it can safely use version 43 in mods. You might see this in cases where someone uses SSEEdit to create a Skyrim Special mod from scratch, with overrides of forms that were version 43 in the base game or bundled DLCs.

Bethesda made some mistakes, so the Creation Kit won't get everything right

The correct way to port a Skyrim Classic mod to Skyrim Special Edition is to re-save that mod in the Skyrim Special Edition Creation Kit. However, not all of the mod's data will be properly preserved: there are specific cases where Bethesda made mistakes while trying to implement backward-compatibility with Skyrim Classic, resulting in unintentional changes to the file format; and because these file format changes are unintentional, the Creation Kit doesn't handle them entirely correctly.

One of the known issues is with critical hit data on weapons. Every weapon can optionally specify a spell that should be applied to an enemy when the weapon is used to land a critical hit on that enemy. (This behavior can optionally be limited to fatal critical hits as well.) However, Bethesda failed to maintain backward-compatibility for the critical-hit data structure (WEAP/CRDT in the file format), and so in practice the specified effect is lost. In order to gain an understanding as to why, let's look at how this data is stored.

We're going to get a bit technical here, so feel free to skip a few headings if that's not your scene. Ctrl + F for "What goes wrong when we re-save a weapon?"

What are structs?

In programming terms, a "data structure" is a collection of individual values that have been grouped together, generally because they're part of some whole. In C++, the programming language that Skyrim is written in, we use the term struct for smaller, simpler data structures. Each value in a struct is often called a "field" or a "member," and each field has a type and a name.

The data structure for a weapon's critical hit data looks like this:

struct CriticalData {
   uint16_t damage;
   float    multiplier;
   uint8_t  flags;
   FormPtr  effect;
};

Each field is listed as its type, and then its name, with the name being used to actually work with the field when writing program code. The fields inside of the critical-hit data, then, are:

damage
The amount of critical hit damage that the weapon deals. This is a 16-bit unsigned integer — that is, a two-byte value that only holds non-negative whole numbers.
multiplier
A multiplier used to influence how likely an actor is to land critical hits using this weapon. This is a floating-point number — that is, four bytes used to encode a decimal number, such as 0.3 or -50.0.
flags
A set of on-or-off settings which apply to the weapon. These settings are grouped into an 8-bit number, with one bit per setting. There are eight bits in a byte, so this is essentially a one-byte number.
effect
The spell to apply to any actor that takes a critical hit from this weapon. We'll talk about how this is stored later.

Each of those fields has a unique type suited to a different purpose. There are integer types with different sizes, because if you only plan on storing a small number, then you don't need to take up all the space needed for a large number; there are also decimal number types, which are slower for a CPU to work with (not on any human-noticeable timescale, but if you overuse them, it'll add up). A program has to know what a value's type is — how the value is stored — so that the program knows how to actually read and modify the value.

So now that we know what a struct is, and how the critical hit data is stored, we can begin to talk about what goes wrong with it.

How do forms refer to each other in a Bethesda RPG?

An experienced Skyrim mod author or mod user will know that every piece of data in the game is a "form," and every form has a unique four-byte numeric ID. If you're familiar with programming, you'll also know what a "pointer" is; if you aren't, let me catch you up.

When a computer program is running, every piece of data in that program has a memory address. This is basically a unique number that describes the location of that data, kind of like how a street address is a unique number that describes the location of a house. When you have two pieces of data in a computer's memory, and one of them needs to refer to the other, it will usually do that by storing the other piece of data's memory address. Your program needs to know the memory address of a piece of data in order to work with it, so if you hold onto something's address, you can access it at any time. We call this held-onto address a "pointer." It "points" to some other object.

When you have two forms in memory, then, the most efficient way for one form to refer to the other is by storing a pointer to the other form. However, this is only possible when both forms actually are in memory. If something isn't in memory, then it's not going to have a memory address — it's not going to have a location in memory. That leads to an interesting question: how do forms refer to each other while they're being loaded? They use form IDs inside of a file, and they use pointers inside of memory, but what do they use when we're actively loading them from a file into memory?

In programming, we can create what's called a union type. If a value has a union type, that means that it can be one of several different things, depending on what the situation calls for. Unions are very similar to structs, so let's take a look at a basic example:

union ExampleUnion {
   uint64_t integer;
   float    decimal;
};

If we create a variable whose type is ExampleUnion, then that variable can hold either of the following values, but not both:

integer
An eight-byte non-negative integer. This can store any whole number between 0 and 18,446,744,073,709,551,615.
decimal
A four-byte floating-point number. This can store decimal numbers as massive as 3e38 (a three followed by thirty-eight zeroes), or as small and precise as 1.1e-38. However, the further you get from zero, the less precise the storage gets. They're also slower to work with if you only need a whole number.

A variable whose type is ExampleUnion can store an integer or it can store a decimal number, but it cannot store both. Moreover, it needs to have room for all of the things it can store. If the largest thing it can store is eight bytes long, then the union needs to have eight bytes to work with, even if right now it's just storing something that's four bytes. That means that our ExampleUnion above takes up eight bytes, because it's supposed to be able to hold an eight-byte integer.

Bethesda has a really clever union type in their code, which I called FormPtr above. It's a union of a pointer and a form ID. The basic idea is that Bethesda loads an ESP file (or ESM or ESL) in two steps. In the first step, Skyrim loads all of the forms into memory, but in the places where those forms would usually have a pointer, Skyrim instead just stores the form IDs that it pulls from the file. Then, in the second step of the process, when all of the forms are loaded into memory and it's possible to create pointers to them, Skyrim goes back over all of the forms that it just loaded, swapping out those form IDs for pointers.

Okay, so what's the catch?

On a 32-bit system, a pointer is four bytes — the same number of bytes as a form ID. On a 64-bit system, however, a pointer is eight bytes. This means that every FormPtr needs to be twice as long in Skyrim Special Edition. Now, normally, that wouldn't be an issue. After all, we're just talking about how things are arranged in Skyrim's memory, and that doesn't necessarily have to be how things are arranged in an ESP file.

The thing is, Skyrim Classic often loads structs by copying them directly out of a file and into memory... which means that these structs' values have to be arranged the same way in a file as they are in memory. Skyrim Special usually accounts for this by loading the structs one field at a time, but there's at least one specific place where Bethesda actually forgot to do that: critical hit data for weapons. Skyrim Special and its Creation Kit also blindly copy that data into and out of files. This means that the way critical hit data for weapons has to be arranged will actually change depending on whether it's being loaded by Skyrim Classic or Skyrim Special.

The most obvious consequence this has is that the form ID for the critical hit effect now needs four bytes of padding in Skyrim Special, because the FormPtr union is now four bytes larger. However, that form ID is at the end of the data structure, so there actually isn't much of a severe consequence there; the padding is missing, but the game seems to be able to cope with that just fine. There's another wrinkle, though: data structure alignment.

Let's look back at our struct:

struct CriticalData {
   uint16_t damage;
   float    multiplier;
   uint8_t  flags;
   FormPtr  effect;
};

You might expect each of those fields to appear in the data one after another. However, that's actually not as efficient as it would seem. Skyrim can actually work with this struct more efficiently if it's willing to waste a little space.

Remember what I said about memory addresses? Well, everything in a computer's memory has a memory address: if it's in memory, then it has a location in memory. That applies to forms, but it also applies to structs — and to the individual fields in a struct. Fields in a struct also have an offset, or the "distance" from their location to the struct's location. In our struct above, the damage field has an offset of 0: it's right at the start of the struct, so there's no distance between it and the start of the struct. Now, the damage field is two bytes long, so you'd probably expect the next field, the multiplier field, to have an offset of 2.

But there's a catch.

Modern CPUs are able to work with an individual value much more quickly if that value is aligned. This means that its offset must be a multiple of its size. A two-byte value must have an offset that is a multiple of two; a four-byte value needs an offset that's a multiple of four; and an eight-byte value needs an offset that is a multiple of eight. If a field doesn't have the right offset, then it won't be aligned, which means it'll be slower for a CPU to work with.

Modern compilers will make sure that every field in a struct is aligned, by inserting filler between fields whenever that filler is necessary. Let's look at our struct, but this time, I've indicated where the filler bytes are for Skyrim Classic, and I've also annotated each field with its offset:

struct CriticalData {
   uint16_t damage;     //  0 // two-byte value
   uint8_t  PADDING;    //  2 // padding byte, to align the next field
   uint8_t  PADDING;    //  3 // padding byte, to align the next field
   float    multiplier; //  4 // four-byte value
   uint8_t  flags;      //  8
   uint8_t  PADDING;    //  9 // padding byte, to align the next field
   uint8_t  PADDING;    // 10 // padding byte, to align the next field
   uint8_t  PADDING;    // 11 // padding byte, to align the next field
   FormPtr  effect;     // 12 // four-byte value
   // End.              // 16
};

In Skyrim Classic, the FormPtr — the union of a pointer and a form ID — is four bytes, because pointers and form IDs are both four bytes, and the union needs room for the largest of all the values it's allowed to hold. Because the union is four bytes long, it needs to have an offset that's a multiple of four, and indeed, we can see that the padding bytes above are used to give it the right offset.

In Skyrim Special, pointers are eight bytes long, which means that a FormPtr also needs to be eight bytes long, and it needs to have an offset that's a multiple of eight. That means that we need to include four more padding bytes. Here's what the CriticalData struct looks like in Skyrim Special:

struct CriticalData {
   uint16_t damage;     //  0 // two-byte value
   uint8_t  PADDING;    //  2 // padding byte, to align the next field
   uint8_t  PADDING;    //  3 // padding byte, to align the next field
   float    multiplier; //  4 // four-byte value
   uint8_t  flags;      //  8
   uint8_t  PADDING;    //  9 // padding byte, to align the next field
   uint8_t  PADDING;    // 10 // padding byte, to align the next field
   uint8_t  PADDING;    // 11 // padding byte, to align the next field
   uint8_t  PADDING;    // 12 // padding byte, to align the next field
   uint8_t  PADDING;    // 13 // padding byte, to align the next field
   uint8_t  PADDING;    // 14 // padding byte, to align the next field
   uint8_t  PADDING;    // 15 // padding byte, to align the next field
   FormPtr  effect;     // 16 // eight-byte value
   // End.              // 24
};

We see that effect is now later in the data.

What goes wrong when we re-save a weapon?

The problems begin when the Skyrim Special Edition Creation Kit first reads our weapon and its critical data. Bethesda didn't intentionally change the file format; rather, they forgot to make Skyrim Special Edition and its Creation Kit load the data field-by-field in order to maintain compatibility with Skyrim Classic. The change is accidental, so it's not handled at all.

So what happens is, the Creation Kit misreads our data. It thinks that the form ID for the critical effect is supposed to be four bytes later than it actually is, so it takes the form ID that we provide and it mistakes that form ID for irrelevant padding. Our data structure ends early, so the Creation Kit ends up treating our weapon as if we didn't specify a form ID, so there is no critical hit effect on our weapon anymore.

If you re-save the file in the Creation Kit, you'll end up with valid data, in the sense that the data will be adjusted to conform to the version 44 format; but that adjustment occurs after the data has already been misinterpreted by the Creation Kit. It will dutifully write form ID zero, the "none" ID, to our weapon, and if (if!) the form ID we intended is preserved at all, it will be lost in the padding. The final data is in the right format, but it's not what the mod author intended.

Now, CK Fixes by Nukem does fix the specific issue with weapons' critical hit data; it patches the Creation Kit to do a proper conversion, and I strongly encourage installing it for that reason and others. It's important to be aware of the limitations of the tools we use — and of the fixes available for them — and one of those limitations is that the developer-intended workflow for porting mods will only work when dealing with changes and issues that the developer was actually aware of.

Conclusions

Ultimately, version 43 data is not radioactive in the way that much of the community believes. There are many situations where it is definitively safe to have version 43 data in files intended for use in Skyrim Special Edition, and even when version 43 data is incorrect and therefore corrupt, that data is not necessarily guaranteed to obliterate someone's game or save files (though obviously, I strongly encourage mod authors to avoid shipping corrupt data to their users, and I will never say that it's safe to supply corrupt data to the game).

It's common for mod authors and users alike to use scripts that are intended to scan a load order and identify any mods that contain any version 43 data; it's fortunately somewhat less common, but still too common, for users to react to this by immediately contacting the authors of these mods and telling them that they absolutely must resave the mod in the Creation Kit. This is excessive.

If I must give concrete recommendations, they would be...

For authors

For users