The other day, someone in the qt-interest mailing list posted the link to an article entitled: How to make fewer errors at the stage of code writing, the third part in a series of articles about writing better code. In this particular article, the author ran a static code-checker tool on the Qt source code and found some mistakes.
One of his suggestions to avoid mistakes was to use table-driven methods as a way to avoid mistakes, like the following:
struct { QSysInfo::WinVersion; m_ver; const char *m_str; } Table_WinVersionToString[] = { { WV_Me, "WinMe" }, { WV_95, "Win95" }, { WV_98, "Win98" }, { WV_NT, "WinNT" }, { WV_2000, "Win2000" }, { WV_2003, "Win2003" }, { WV_XP, "WinXP" }, { WV_VISTA,"WinVista" } };
Unfortunately, I see two problems with that solution. The first is that it does not prevent the mistake that the author found anymore than the original solution did (the mistake was to test QSysInfo::WV_2000 twice, trying to return “Win2003″ for the second test). A better solution, making this fool-proof, would not use a table, but an array. However, this would require that the values in the QSysInfo::WinVersion were sequential. I’d instead write the code as:
static const char *DOS_based_names[] = { 0, "Win32s", "Win95", "Win98", "WinMe" }; static const char *NT_based_names[] = { 0, "WinNT", "Win2000", "WinXP", "Win2003", "WinVista", "Win7" }; if (ver & QSysInfo::WV_DOS_based) ret = QLatin1String(DOS_based_names[ver]); else if (ver & QSysInfo::WV_NT_based) ret = QLatin1String(NT_based_names[ver >> 4]);
This code above also uses a table-driven method and is more secure against typing mistakes. It could use some extra checks to see if the versions IDs are in the table, if that’s possible to happen by construction. However, it doesn’t solve the second problem: relocations.
Whenever you have an array like the ones above (mine and the article’s author’s) which contain pointers, the array must be initialised in memory before the code is run. Since on libraries on Unix operating systems are Position-independent code, the actual pointer addresses aren’t known at library link time. They are only known once the library is loaded into memory, which means the dynamic linker must “patch up” the memory — that’s a relocation. For more information into what they are, I suggest reading my previous two blogs on relocation and position-independence: Moving code around and Moving code around more easily.
What I want to do here is to avoid having relocations at all. Having no relocations in my table means it can be stored in a read-only, sharable section of memory. With that, we have a faster load time of the library and we use less memory per application loading the library. So how can we have a table containing strings without relocations?
Back in 2006, when I wrote QtDBus, I faced a similar problem for the QDBusError class: I needed a way to map the enum containing the error codes to the error names. Inspired by the output from moc, the solution I found was to write a small Perl script that generated one very long string containing NULs (“\0″) and an array of indices into that table.
So instead of having a table like:
static const char *DOS_based_names[] = { 0, "Win32s", "Win95", "Win98", "WinMe" };
We’d instead have:
static const char DOS_based_names_string[] = "\0" "Win32s\0" "Win95\0" "Win98\0" "WinMe\0" "\0"; static const int DOS_based_names_indices[] = { 0, 1, 8, 14, 20, -1 }; ret = QLatin1String(DOS_based_names_string + DOS_based_names_indices[ver]);
The code above generates no relocations in the shared library and both the string and the integers can be safely stored in the .rodata section. Depending on the architecture, the amount of extra code can be trivial or not (see the end of the post for some assembly listings).
I found this technique so useful I applied it in many places in KDE. I even added a script to kdesdk called generate_string_table.pl that generate string tables. In fact, I generated the table above by running:
perl generate_string_table.pl DOS_based_names <<<' Win32s Win95 Win98 WinMe'
The script works fine for arrays, but the principle is the same for an associative table. Instead of creating an array of ints, you’d create an array of structs mapping the original type to the offset in the string. Writing such a script is left as an exercise for the reader.
Assembly listings
Arch | Old (array of pointers) | New (array of offsets) |
---|---|---|
x86 | ; ver = %eax; output = %eax; PIC register = %ecx movl DOS_based_names@GOTOFF(%ecx,%eax,4), %eax | ; ver = %eax; output = %edx; PIC register = %ecx leal DOS_based_names_string@GOTOFF(%ecx), %edx addl DOS_based_names_indices@GOTOFF(%ecx,%eax,4), %edx |
x86‑64 | ; ver = %edi; output = %rax; PC-relative addressing leaq DOS_based_names(%rip), %rax movslq %edi, %rdi movq (%rax,%rdi,8), %rax | ; ver = %edi; output = %rax; PC-relative addressing leaq DOS_based_names_indices(%rip), %rax movslq %edi, %rdi movslq (%rax,%rdi,4), %rdx leaq DOS_based_names_string(%rip), %rax leaq (%rdx,%rax), %rax |
ARM | ; in = r0, output = r0; PC-relative addressing ldr r1, .L2 .LPIC0: add r3, pc, r1 ldr r0, [r3, r0, asl #2] .L2: .word DOS_based_names-(.LPIC0+8) | ; in = r0, output = r0; PC-relative addressing ldr r2, .L5 .LPIC1: add r3, pc, r2 add r1, r3, r0, asl #2 ldr r0, [r1, #28] add r0, r3, r0 .L5: .word DOS_based_names_string-(.LPIC1+8) |
MIPS | ; in = $4; output = $2; PIC register = $28 lw $2,%got(DOS_based_names)($28) sll $4,$4,2 addiu $2,$2,%lo(DOS_based_names) addu $4,$4,$2 lw $2,0($4) | ; in = $4; output = $2; PIC register = $28 lw $2,%got(DOS_based_names_indices)($28) sll $4,$4,2 addiu $2,$2,%lo(DOS_based_names_indices) addu $4,$4,$2 lw $2,%got(DOS_based_names_string)($28) lw $3,0($4) addiu $2,$2,%lo(DOS_based_names_string) addu $2,$2,$3 |
IA‑64 | ; ver = in0; output = r8; PIC register = r1 addl r14 = @gprel(DOS_based_names#), r1 sxt4 in0 = in0 ;; shladd r14 = in0, 3, r14 ;; ld8 r8 = [r14] | ; ver = in0; output = r8; PIC register = r1 addl r14 = @gprel(DOS_based_names_indices#), r1 sxt4 in0 = in0 addl r8 = @gprel(DOS_based_names_string#), r1 ;; shladd r14 = in0, 2, r14 ;; ld4 r14 = [r14] ;; sxt4 r14 = r14 ;; add r8 = r8, r14 |
If we change the int types to long in the 64-bit platforms, we get:
Arch | Old (array of pointers) | New (array of offsets) |
---|---|---|
x86‑64 | ; ver = %edi; output = %rax; PC-relative addressing leaq DOS_based_names(%rip), %rax movq (%rax,%rdi,8), %rax | ; ver = %edi; output = %rax; PC-relative addressing leaq DOS_based_names_indices(%rip), %rdx leaq DOS_based_names_string(%rip), %rax addq (%rdx,%rdi,8), %rax |
IA‑64 | ; ver = in0; output = r8; PIC register = r1 addl r14 = @ltoffx(DOS_based_names#), r1 ;; shladd r14 = in0, 3, r14 ;; ld8 r8 = [r14] | ; ver = in0; output = r8; PIC register = r1 addl r14 = @ltoffx(DOS_based_names_indices#), r1 addl r8 = @ltoffx(DOS_based_names_string#), r1 ;; shladd r14 = in0, 3, r14 ;; ld8 r14 = [r14] ;; add r8 = r8, r14 |
15 comments
Jan-Arve
July 20, 2011 at 21:03 UTC (UTC 0)
Hi, Thiago. There is a syntax error in line 2. Seems like it is still bound to mistakes (hoho ;-D)
Friedrich
July 20, 2011 at 21:40 UTC (UTC 0)
What if DOS_based_names[] would be rather some
static const char * const
(emph on last const, so no longer a variable, but an anonymous symbol(?)), how much would this be different then?Is the usual compiler not smart enough to put all the strings next to each other, without the need to do a relocation for each array entry?
Thiago Macieira
July 20, 2011 at 21:54 UTC (UTC 0)
@Jan-Arve: which line 2?
Thiago Macieira
July 20, 2011 at 21:56 UTC (UTC 0)
@Friedrich: it depends on how it’s accessed. If you access it as an array, then the contents of the array must exist somewhere and then the relocations come into place again.
The strings are next to each other, but each value in the array is a pointer to a specific position, so each one is a relocation. And even if only the first one were, the array would still be in .data (or .data.rel.ro).
ddenis
July 20, 2011 at 23:08 UTC (UTC 0)
Thiago: I guess Jan-Arve means an extra semi-colon on line 2 in the first code block.
QLocale uses the same technique for storing locale data for all supported locales (see QTDIR/src/corelib/tools/qlocale_data_p.h) to ensure there are no relocations. (Thiago, you were the one suggested to do that, I am pointing that out to readers
Btw, with Qt5 it is also possible to put QString objects in .rodata section, could be a great example for you to add to store those “dos names” strings as qstrings.
Thiago Macieira
July 21, 2011 at 02:15 UTC (UTC 0)
Yeah, an array of QStringLiteral results would be nice, but I doubt we’ll be able to do that…
Sune Vuorela
July 21, 2011 at 02:35 UTC (UTC 0)
Isn’t this what moc is generating for metaEnums ?
Frerich Raabe
July 21, 2011 at 06:40 UTC (UTC 0)
Jan-Arve: The syntax error in
QSysInfo::WinVersion; m_ver;
is also in the article from which Thiago quoted; I guess a littlecomment wouldn’t have hurt.
Frerich Raabe
July 21, 2011 at 06:55 UTC (UTC 0)
I also used the same technique for avoiding relocations; I think I learned them from the famous How To Write Shared Libraries paper by Ulrich Drepper.
However, especially when I’m accessing the array in multiple places, it’s annoying to clutter the code with
Instead, I use a tiny wrapper class which wraps the two arrays with a nice
operator[]
implementation, like this:That way, I can access the array like
Maybe it would even be possible to define the string/index arrays inside the class without too much of a performance impact.
The User
July 21, 2011 at 10:21 UTC (UTC 0)
Hehe, whenever I see assembly at Planet KDE I know who the author is.
sebsauer
July 21, 2011 at 10:24 UTC (UTC 0)
And it’s the same moc does with QMetaObject signal/slot/property names and now it’s clear why. Great informative blog. Thanks
Thiago Macieira
July 21, 2011 at 11:45 UTC (UTC 0)
@Sune: Yes, in fact, the entire output of moc is one very long string with NULs and an integer table. However, you may notice on the QMetaObject, QMetaEnum, etc. API that they are based around char arrays, not QStrings.
mb
July 22, 2011 at 21:57 UTC (UTC 0)
Your code may be more robust against mistakes (or rather a certain kind of mistakes) but it hardly is more readable. The first example immediately makes it clear that it does some kind of mapping. The second requires one to spend considerably more time figuring out what it does. This may be a matter of 2s vs 5s, but when browsing through a lot of code this sums up.
Example 1 also presents the mapping in a far more structured way which makes wrong mappings obvious whereas your example rather hides them.
Thiago Macieira
July 23, 2011 at 01:05 UTC (UTC 0)
Reading generated code is really not recommended. This is supposed to be a generated string table because it’s more efficient than the one you can write by hand.
Olivier Goffart
July 23, 2011 at 22:50 UTC (UTC 0)
So I tried to generate the string table at compile time using C++0x.
I only managed once i read the IndexList trick in your next article.
So here it is: the lookup table that is not generated from a perl script, but using some template magic:
http://paste.kde.org/100651/
The generated x86_64 assembly is exactly the same as your array of offset version.
The next step would be to generating perfect hashes at compile time.