The stolen bytes: Visual Studio, virtual methods and data alignment

This article describes a design choice in the C++ ABI of the Visual Studio compiler that I believe should be considered a bug. I propose a trivial workaround at the end.

TL;DR — if the topmost polymorphic class in a hierarchy has members with alignment requirement N where N > sizeof(void *), the Visual Studio compiler may add up to N bytes of useless padding to your objects.

Update: be sure to read the explanation by Jan Gray, who designed the relevant part of the MS C++ ABI some 22 years ago, in the comments section below.

My colleague Benlitz first hit the problem when trying to squeeze memory out of some of our game’s most often instantiated classes. I think it is best illustrated with the following minimal example:

class Foo
{
    virtual void Hello() {}

    float f;     /* 4 bytes */
};
class Bar
{
    virtual void Hello() {}

    float f;     /* 4 bytes */
    double d;    /* 8 bytes */
};

This is the size of Foo and Bar on various 32-bit platforms:

Platform sizeof(Foo) sizeof(Bar) Madness?
Linux x86 (gcc) 8 16 no
Linux ARMv9 (gcc) 8 16 no
Win32 (gcc) 8 16 no
Win32 (Visual Studio 2010) 8 24 yes
Xbox 360 (Visual Studio 2010) 8 24 yes
PlayStation 3 (gcc) 8 16 no
PlayStation 3 (SNC) 8 16 no
Mac OS X x86 (gcc) 8 16 no

There is no trick. This is by design. The Visual Studio compiler is literally stealing 8 bytes from us!

What the fuck is happening?

This is the memory layout of Foo on all observed platforms:

\begin{tabular}{|r|llll|llll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\
\hline
field & \multicolumn{4}{|c|}{\textit{vfptr}}
      & \multicolumn{4}{|c|}{\texttt{float f;}} \\
\hline
\end{tabular}

The vfptr field is a special pointer to the vtable. The vtable is probably the most widespread compiler-specific way to implement virtual methods. Since all the platforms studied here are 32-bit, this pointer requires 4 bytes. A float requires 4 bytes, too. The total size of the class is therefore 8 bytes.

This is the memory layout of Bar on eg. Linux using GCC:

\begin{tabular}{|r|llll|llll|llllllll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\
\hline
field & \multicolumn{4}{|c|}{\textit{vfptr}}
      & \multicolumn{4}{|c|}{\texttt{float f;}}
      & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

The double type has an alignment requirement of 8 bytes, which makes it fit perfectly at byte offset 8.

And finally, this is the memory layout of Bar on Win32 using Visual Studio 2010:

\begin{tabular}{|r|llll|llll|llll|llll|llllllll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15
     & 16 & 17 & 18 & 19 & 20 & 21 & 22 & 23 \\
\hline
field & \multicolumn{4}{|c|}{\textit{vfptr}}
      & \multicolumn{4}{|c|}{\textit{padding}}
      & \multicolumn{4}{|c|}{\texttt{float f;}}
      & \multicolumn{4}{|c|}{\textit{padding}}
      & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

This is madness! The requirement for the class to be 8-byte aligned causes the first element of the class to be 8-byte aligned, too! I demand a rational explanation for this design choice.

The problem is that the compiler decides to add the vtable pointer after it has aligned the class data, resulting in excessive realignment.

Compilers affected

The Visual Studio compilers for Win32, x64 and Xbox 360 all appear to create spurious padding in classes.

Though this article focuses on 32-bit platforms for the sake of simplicity, 64-bit Windows is affected, too.

The problem becomes even worse with larger alignment requirements, for instance with SSE3 or AltiVec types that require 16-byte storage alignment such as _FP128:

class Quux
{
    virtual void Hello() {}

    float f;     /* 4 bytes */
    _FP128 dd;   /* 16 bytes */
};

This is the GCC memory layout on both 32-bit and 64-bit platforms:

\begin{tabular}{|r|c|c|c|c|c|c|c|c|}
\hline
byte & 0--3 & 4--7 & 8--11 & 12--15
     & 16--19 & 20--23 & 24--27 & 28--31 \\
\hline
\hline
field (32-bit) & \textit{vfptr}
                & \texttt{float f;}
                & \multicolumn{2}{|c|}{\textit{padding}}
                & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
field (64-bit) & \multicolumn{2}{|c|}{\textit{vfptr}}
               & \texttt{float f;}
               & \textit{padding}
               & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
\end{tabular}

The padding there is perfectly normal and expected, because of the alignment requirements for dd.

But this is how Visual Studio decides to lay it out:

\begin{tabular}{|r|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
byte & 0--3 & 4--7 & 8--11 & 12--15
     & 16--19 & 20--23 & 24--27 & 28--31
     & 32--35 & 36--39 & 40--43 & 44--47 \\
\hline
\hline
field (32-bit) & \textit{vfptr}
               & \multicolumn{3}{|c|}{\textit{padding}}
               & \texttt{float f;}
               & \multicolumn{3}{|c|}{\textit{padding}}
               & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
field (64-bit) & \multicolumn{2}{|c|}{\textit{vfptr}}
               & \multicolumn{2}{|c|}{\textit{padding}}
               & \texttt{float f;}
               & \multicolumn{3}{|c|}{\textit{padding}}
               & \multicolumn{4}{|c|}{\texttt{\_FP128 dd;}} \\
\hline
\end{tabular}

That is 16 lost bytes, both on 32-bit and 64-bit versions of Windows.

Workaround

There is fortunately a workaround if you want to get rid of the useless padding. It is so trivial that it actually makes me angry that the problem exists in the first place.

This will get you your bytes back:

class EmptyBase
{
protected:
    virtual ~EmptyBase() {}
};

class Bar : public EmptyBase
{
    virtual void Hello() {}

    float f;     /* 4 bytes */
    double d;    /* 8 bytes */
};

And this is the size of Bar on the same 32-bit platforms:

Platform sizeof(Bar)
Linux x86 (gcc) 16
Linux ARMv9 (gcc) 16
Win32 (gcc) 16
Win32 (Visual Studio 2010) 16
Xbox 360 (Visual Studio 2010) 16
PlayStation 3 (gcc) 16
PlayStation 3 (SNC) 16
Mac OS X x86 (gcc) 16

Phew. Sanity restored.

\begin{tabular}{|r|cccc|cccc|cccccccc|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\
\hline
\texttt{EmptyBase} fields & \multicolumn{4}{|c|}{\textit{vfptr}}
                          & \multicolumn{12}{|c|}{} \\
\hline
\texttt{Bar} fields & \multicolumn{4}{c}{\texttt{EmptyBase}}
                    & \multicolumn{4}{|c|}{\texttt{float f;}}
                    & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

The compiler is a lot less confused now: it no longer has to create space for a vfptr in Bar since it is technically already part of EmptyBase.

Conclusion

Lessons learned:

  • The pointer to the vtable isn’t just like any other pointer.
  • Various C++ ABIs have different stances on padding and alignment.
  • Inheriting from an empty abstract class can make your objects smaller on Windows and Xbox 360!
  • Design decisions can haunt you for decades!

The workaround is so simple that it sounds like a good idea to always use it, preemptively.

Attachments (1)

Download all attachments as: .zip

Comments

1. capisce -- 2012-10-21 12:22

In what way is 64-bit Windows affected? I'd assume that sizeof(Bar) would be 24 bytes with both gcc and visual studio 64-bit, since the vtable pointer presumably will be 8 bytes then, and 4 bytes of padding will be added for the float?

I just verified that sizeof(Bar) will be 24 bytes with gcc targeting x86-64. Do you mean that sizeof(Bar) is greater than 24 bytes with visual studio 64-bit?

2. capisce -- 2012-10-21 12:25

Ok, I guess you mean 64-bit is affected when it comes to SSE3 or AltiVec types with 16-byte alignment.

3. sam -- 2012-10-21 12:26

@capisce yes, that is what I meant. I will add a short example to make it clear.

4. anonymous -- 2012-10-21 13:30

And what happen when you use #pragma pack ?

5. sampsa.lehtonen@iki.fi -- 2012-10-21 13:36

"Apparently the compiler is a lot less confused now that the vtable pointer is technically no longer part of Bar ."

Deriving from another virtual class does not make the vtable pointer go away - of course the derived class has its own vtable!

6. sam -- 2012-10-21 13:40

@anonymous: when using #pragma pack the issue disappears since there are no alignment constraints any more. But this causes a whole lot of other problems with potentially illegal unaligned reads. Packing class members is definitely not acceptable when using AltiVec, for instance.

Note that manually tailoring the class so that all members lie at properly aligned offsets within the object is not a solution either. There is no guarantee that the object will be allocated at a properly aligned address, especially on the stack.

7. sam -- 2012-10-21 13:43

@sampsa.lehtonen: check the sentence; I am saying that the vtable pointer is technically no longer part of Bar because it is part of EmptyBase, not because I somehow believe it went away.

8. anonymous -- 2012-10-21 17:26

I can't believe anyone cares about saving such a tiny amount of memory in 2012.

9. anonymous -- 2012-10-21 18:00

Make that 'I can't believe console manufacturers can get away with putting such a shitty amount of RAM in consoles.'. Even a few bytes per object can add up if you have lots of objects, and do note it scales with the amount of variables on the class.

10. anonymous2 -- 2012-10-21 18:11

@anonymous Maybe you can't believe that because you have no idea what you're writing about? If some class is instantiated a few million times, 8 bytes of savings per instance can quickly add up to something substantial, in particular on video game consoles or mobile devices. Less padding might in turn lead to further savings by the memory allocator and may also improve CPU cache access patterns.

11. RuTT -- 2012-10-21 18:24

@anonymous/17:26

Can't tell if your comment is negative or positive, but thanks to such guys who care about saving such a tiny amount of memory, we have incredible games running on ~7 years-old, totally obsolete hardware.

Great things start from tiny beginnings.

12. sam -- 2012-10-21 18:27

@anonymous: modern desktop CPUs have a typical L1 data cache size of about 64 KiB. The latest iPad and iPhone have an L1 data cache of 32 KiB. The PlayStation 3 SPUs have a local memory of 256 KiB.

If that doesn’t teach you why 8 or 16 bytes per object can matter, study more.

13. ponce -- 2012-10-21 19:08

@sam:

Note that manually tailoring the class so that all members lie at properly aligned offsets within the object is not a solution either. There is no guarantee that the object will be allocated at a properly aligned address, especially on the stack.

I'm not sure to understand where the work-around could fail.

  • #pragma pack (or equivalent) and manual padding can be used to ensure a memory layout.
  • I think the alignment of the object base address (vs field relative address) is a separate problem. The base address need be aligned anyway, isn't it?

That said, it is indeed unfortunate that MSVC does something like that.

14. ponce -- 2012-10-21 19:15

Er... never mind, you already have a work-around.

15. @jangray -- 2012-10-21 19:32

I wrote the object layout code in the MS C++ compiler, ~22 years ago. Mea culpa.

Here is a "rational explanation" that addresses when you might need to worry about this.

If I recall correctly layout of Bar reveals an unfortunate layout phase ordering choice: a vfptr is inserted into a class layout (if required and not inheritable from a base class) only after the new data members have been laid out, respecting their natural alignment, and so as to still preserve natural alignment. This explains class Bar: after data member layout, we have f@0 and d@8 (naturally aligned); then inserting a vfptr (and keeping d 8B aligned) we have vfptr@0, f@8, d@16.

So you may rarely get this extra alignment padding if the class 1) introduces (not inherits) a vfptr or vbptr; 2) has a naturally aligned data member longer than a vfptr that 3) is preceded by shorter data member(s) that otherwise could have fit between between vfptr and (2), keeping (2) naturally aligned. So Bar above has extra padding, but Bar2 below does not:

class Bar2 { virtual void Hello(); float f; float f2; double d; };

It should be the same size (24B) across all cited 32b and 64b implementations.

Unfortunate, but that's what was shipped all those long years ago (16b/32b). ABIs being what they are this cannot be changed as the default layout. I’m not sure what transpired with the more recent x64 ABI except that 64b vfptrs mean this is only of concern if using 128b+ data members (and _FP_128 is itself an embedded _CRT_ALIGN’d struct, yes?).

The suggested workaround is one way to mitigate the extra padding. I applaud your attention to memory footprint optimization. But I would not encourage its indiscriminate use, however, nor see it adopted as a cargo cult "I don't know what it does but it is supposed to make objects smaller".

cl /d1reportAllClassLayout can be instructive.

See also http://www.openrce.org/articles/files/jangrayhood.pdf.

Thank you.

(Not employed by nor speaking for Microsoft, etc.)

16. sam -- 2012-10-21 22:39

@jangray Thanks a lot for taking some of your time to share this part of history with us!

Your explanation is consistent with our observations. So, the fact that the vfptr is inserted last means in practice that the first member of the class is always aligned at a N-byte boundary (where N is the maximum alignment requirement of the class members) even though it might not require that alignment.

Your Bar2 class is a good example where that padding is not problematic, since some padding would have been necessary at some point anyway:

\begin{tabular}{|r|llll|llll|llll|llll|llllllll|}
\hline
byte & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7
     & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15
     & 16 & 17 & 18 & 19 & 20 & 21 & 22 & 23 \\
\hline
\hline
field (gcc) & \multicolumn{4}{|c|}{\textit{vfptr}}
            & \multicolumn{4}{|c|}{\texttt{float f;}}
            & \multicolumn{4}{|c|}{\texttt{float f2;}}
            & \multicolumn{4}{|c|}{\textit{padding}}
            & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
field (MSVC) & \multicolumn{4}{|c|}{\textit{vfptr}}
             & \multicolumn{4}{|c|}{\textit{padding}}
             & \multicolumn{4}{|c|}{\texttt{float f;}}
             & \multicolumn{4}{|c|}{\texttt{float f2;}}
             & \multicolumn{8}{|c|}{\texttt{double d;}} \\
\hline
\end{tabular}

It’s unfortunate that the x64 and Xbox 360 ABIs weren’t “fixed” when the opportunity was given… but well, it’s not a big deal when one knows how to avoid it.

17. anonymous -- 2012-10-23 08:33

What if i told u "Inheriting from empty abstract class could make your objects small" - simply brilliant

18. Referencing quote -- 2013-12-14 15:35

“What if I told you inheriting from an empty abstract class could make your objects smaller”

— Morpheus-sam, 2012.

Add New Comment


Note: the spam filter is extremely sensitive; don't worry! Even if it is detected as spam, your comment will be manually approved.