Jul 20

Table-driven methods with no relocations

The other day, someone in the qt-interest mailing list posted the link to an article entitled: How to make fewer errors at the stage of code writing, the third part in a series of articles about writing better code. In this particular article, the author ran a static code-checker tool on the Qt source code and found some mistakes.

One of his suggestions to avoid mistakes was to use table-driven methods as a way to avoid mistakes, like the following:

struct {
  QSysInfo::WinVersion; m_ver;
  const char *m_str;
} Table_WinVersionToString[] = {
  { WV_Me,   "WinMe" },
  { WV_95,   "Win95" },
  { WV_98,   "Win98" },
  { WV_NT,   "WinNT" },
  { WV_2000, "Win2000" },
  { WV_2003, "Win2003" },
  { WV_XP,   "WinXP" },
  { WV_VISTA,"WinVista" }

Unfortunately, I see two problems with that solution. The first is that it does not prevent the mistake that the author found anymore than the original solution did (the mistake was to test QSysInfo::WV_2000 twice, trying to return “Win2003″ for the second test). A better solution, making this fool-proof, would not use a table, but an array. However, this would require that the values in the QSysInfo::WinVersion were sequential. I’d instead write the code as:

static const char *DOS_based_names[] =
    { 0, "Win32s", "Win95", "Win98", "WinMe" };
static const char *NT_based_names[] =
    { 0, "WinNT", "Win2000", "WinXP", "Win2003", "WinVista", "Win7" };
if (ver & QSysInfo::WV_DOS_based)
    ret = QLatin1String(DOS_based_names[ver]);
else if (ver & QSysInfo::WV_NT_based)
    ret = QLatin1String(NT_based_names[ver >> 4]);

This code above also uses a table-driven method and is more secure against typing mistakes. It could use some extra checks to see if the versions IDs are in the table, if that’s possible to happen by construction. However, it doesn’t solve the second problem: relocations.

Whenever you have an array like the ones above (mine and the article’s author’s) which contain pointers, the array must be initialised in memory before the code is run. Since on libraries on Unix operating systems are Position-independent code, the actual pointer addresses aren’t known at library link time. They are only known once the library is loaded into memory, which means the dynamic linker must “patch up” the memory — that’s a relocation. For more information into what they are, I suggest reading my previous two blogs on relocation and position-independence: Moving code around and Moving code around more easily.

What I want to do here is to avoid having relocations at all. Having no relocations in my table means it can be stored in a read-only, sharable section of memory. With that, we have a faster load time of the library and we use less memory per application loading the library. So how can we have a table containing strings without relocations?

Back in 2006, when I wrote QtDBus, I faced a similar problem for the QDBusError class: I needed a way to map the enum containing the error codes to the error names. Inspired by the output from moc, the solution I found was to write a small Perl script that generated one very long string containing NULs (“\0″) and an array of indices into that table.

So instead of having a table like:

static const char *DOS_based_names[] =
    { 0, "Win32s", "Win95", "Win98", "WinMe" };

We’d instead have:

static const char DOS_based_names_string[] =
static const int DOS_based_names_indices[] = {
       0,    1,    8,   14,   20,   -1
ret = QLatin1String(DOS_based_names_string + DOS_based_names_indices[ver]);

The code above generates no relocations in the shared library and both the string and the integers can be safely stored in the .rodata section. Depending on the architecture, the amount of extra code can be trivial or not (see the end of the post for some assembly listings).

I found this technique so useful I applied it in many places in KDE. I even added a script to kdesdk called generate_string_table.pl that generate string tables. In fact, I generated the table above by running:

perl generate_string_table.pl DOS_based_names <<<'

The script works fine for arrays, but the principle is the same for an associative table. Instead of creating an array of ints, you’d create an array of structs mapping the original type to the offset in the string. Writing such a script is left as an exercise for the reader.

Assembly listings

ArchOld (array of pointers) New (array of offsets)
  ; ver = %eax; output = %eax; PIC register = %ecx
  movl    DOS_based_names@GOTOFF(%ecx,%eax,4), %eax
  ; ver = %eax; output = %edx; PIC register = %ecx
  leal    DOS_based_names_string@GOTOFF(%ecx), %edx
  addl    DOS_based_names_indices@GOTOFF(%ecx,%eax,4), %edx
  ; ver = %edi; output = %rax; PC-relative addressing
  leaq    DOS_based_names(%rip), %rax
  movslq  %edi, %rdi
  movq    (%rax,%rdi,8), %rax
  ; ver = %edi; output = %rax; PC-relative addressing
  leaq    DOS_based_names_indices(%rip), %rax
  movslq  %edi, %rdi
  movslq  (%rax,%rdi,4), %rdx
  leaq    DOS_based_names_string(%rip), %rax
  leaq    (%rdx,%rax), %rax
  ; in = r0, output = r0; PC-relative addressing
  ldr     r1, .L2
  add     r3, pc, r1
  ldr     r0, [r3, r0, asl #2]
  .word   DOS_based_names-(.LPIC0+8)
  ; in = r0, output = r0; PC-relative addressing
  ldr     r2, .L5
  add     r3, pc, r2
  add     r1, r3, r0, asl #2
  ldr     r0, [r1, #28]
  add     r0, r3, r0
  .word   DOS_based_names_string-(.LPIC1+8)
  ; in = $4; output = $2; PIC register = $28
  lw      $2,%got(DOS_based_names)($28)
  sll     $4,$4,2
  addiu   $2,$2,%lo(DOS_based_names)
  addu    $4,$4,$2
  lw      $2,0($4)
  ; in = $4; output = $2; PIC register = $28
  lw      $2,%got(DOS_based_names_indices)($28)
  sll     $4,$4,2
  addiu   $2,$2,%lo(DOS_based_names_indices)
  addu    $4,$4,$2
  lw      $2,%got(DOS_based_names_string)($28)
  lw      $3,0($4)
  addiu   $2,$2,%lo(DOS_based_names_string)
  addu    $2,$2,$3
  ; ver = in0; output = r8; PIC register = r1
  addl r14 = @gprel(DOS_based_names#), r1
  sxt4 in0 = in0
  shladd r14 = in0, 3, r14
  ld8 r8 = [r14]
  ; ver = in0; output = r8; PIC register = r1
  addl r14 = @gprel(DOS_based_names_indices#), r1
  sxt4 in0 = in0
  addl r8 = @gprel(DOS_based_names_string#), r1
  shladd r14 = in0, 2, r14
  ld4 r14 = [r14]
  sxt4 r14 = r14
  add r8 = r8, r14

If we change the int types to long in the 64-bit platforms, we get:

ArchOld (array of pointers) New (array of offsets)
  ; ver = %edi; output = %rax; PC-relative addressing
  leaq    DOS_based_names(%rip), %rax
  movq    (%rax,%rdi,8), %rax
  ; ver = %edi; output = %rax; PC-relative addressing
  leaq    DOS_based_names_indices(%rip), %rdx
  leaq    DOS_based_names_string(%rip), %rax
  addq    (%rdx,%rdi,8), %rax
  ; ver = in0; output = r8; PIC register = r1
  addl r14 = @ltoffx(DOS_based_names#), r1
  shladd r14 = in0, 3, r14
  ld8 r8 = [r14]
  ; ver = in0; output = r8; PIC register = r1
  addl r14 = @ltoffx(DOS_based_names_indices#), r1
  addl r8 = @ltoffx(DOS_based_names_string#), r1
  shladd r14 = in0, 3, r14
  ld8 r14 = [r14]
  add r8 = r8, r14


  1. avatar

    Hi, Thiago. There is a syntax error in line 2. Seems like it is still bound to mistakes (hoho ;-D)

  2. avatar

    What if DOS_based_names[] would be rather some static const char * const (emph on last const, so no longer a variable, but an anonymous symbol(?)), how much would this be different then?
    Is the usual compiler not smart enough to put all the strings next to each other, without the need to do a relocation for each array entry?

  3. avatar
    Thiago Macieira

    @Jan-Arve: which line 2?

  4. avatar
    Thiago Macieira

    @Friedrich: it depends on how it’s accessed. If you access it as an array, then the contents of the array must exist somewhere and then the relocations come into place again.

    The strings are next to each other, but each value in the array is a pointer to a specific position, so each one is a relocation. And even if only the first one were, the array would still be in .data (or .data.rel.ro).

  5. avatar

    Thiago: I guess Jan-Arve means an extra semi-colon on line 2 in the first code block.

    QLocale uses the same technique for storing locale data for all supported locales (see QTDIR/src/corelib/tools/qlocale_data_p.h) to ensure there are no relocations. (Thiago, you were the one suggested to do that, I am pointing that out to readers :)

    Btw, with Qt5 it is also possible to put QString objects in .rodata section, could be a great example for you to add to store those “dos names” strings as qstrings.

  6. avatar
    Thiago Macieira

    Yeah, an array of QStringLiteral results would be nice, but I doubt we’ll be able to do that…

  7. avatar
    Sune Vuorela

    Isn’t this what moc is generating for metaEnums ?

  8. avatar
    Frerich Raabe

    Jan-Arve: The syntax error in QSysInfo::WinVersion; m_ver; is also in the article from which Thiago quoted; I guess a little

    // sic!

    comment wouldn’t have hurt.

  9. avatar
    Frerich Raabe

    I also used the same technique for avoiding relocations; I think I learned them from the famous How To Write Shared Libraries paper by Ulrich Drepper.

    However, especially when I’m accessing the array in multiple places, it’s annoying to clutter the code with

    ret = QLatin1String(DOS_based_names_string + DOS_based_names_indices[ver]);

    Instead, I use a tiny wrapper class which wraps the two arrays with a nice operator[] implementation, like this:

    class DOS_based_names_array
        inline const char *operator[]( size_t idx ) const {
            assert( idx >= 0 );
            assert( idx <= sizeof( DOS_based_names_indices ) / sizeof( *DOS_based_names_indices ) );
            return DOS_based_names_string + DOS_based_names_indices[idx];
    DOS_based_names_array DOS_based_names;

    That way, I can access the array like

    printf( "%s\n", DOS_based_names[2] );

    Maybe it would even be possible to define the string/index arrays inside the class without too much of a performance impact.

  10. avatar
    The User

    Hehe, whenever I see assembly at Planet KDE I know who the author is. ;)

  11. avatar

    And it’s the same moc does with QMetaObject signal/slot/property names and now it’s clear why. Great informative blog. Thanks :)

  12. avatar
    Thiago Macieira

    @Sune: Yes, in fact, the entire output of moc is one very long string with NULs and an integer table. However, you may notice on the QMetaObject, QMetaEnum, etc. API that they are based around char arrays, not QStrings.

  13. avatar

    Your code may be more robust against mistakes (or rather a certain kind of mistakes) but it hardly is more readable. The first example immediately makes it clear that it does some kind of mapping. The second requires one to spend considerably more time figuring out what it does. This may be a matter of 2s vs 5s, but when browsing through a lot of code this sums up.

    Example 1 also presents the mapping in a far more structured way which makes wrong mappings obvious whereas your example rather hides them.

  14. avatar
    Thiago Macieira

    Reading generated code is really not recommended. This is supposed to be a generated string table because it’s more efficient than the one you can write by hand.

  15. avatar
    Olivier Goffart

    So I tried to generate the string table at compile time using C++0x.
    I only managed once i read the IndexList trick in your next article.

    So here it is: the lookup table that is not generated from a perl script, but using some template magic:

    The generated x86_64 assembly is exactly the same as your array of offset version.

    The next step would be to generating perfect hashes at compile time.

Comments have been disabled.

Page optimized by WP Minify WordPress Plugin