Jan 10

Architectures and ABIs detailed

Yesterday I wrote about instruction set and ABI manuals. Today I’d like to go into details about the ABIs I listed there. This was done mostly as a summary for me: it’s tiresome to search for the information in the manuals, especially since some of the manuals are PDFs without links. For example, I never remember what is the order of the registers used in parameter passing on x86-64. So what you’ll find here is a listing of what I found interesting for when I might need to read or write assembly code.

As a bonus for you, dear reader, I added a few words about each platform.

First, a summary with numbers.

 x86x86‑64IA‑64AArch32AArch64MIPSPOWERSPARC
Endianlittlelittlebothbothbothbothbigbig
Instruction width (bits)8 to 1128 to 11241 or 8216 or 323216 or 323232
# general-purpose registers816128 (a)16 (b)32323232 (a)
GPR width in bits326464+1 (c)326432 / 6432 / 6432 / 64
# Special GPRs architecturally + ABI1 + 01 + 01 + 31 + 11 + 21 + 31 + 12 + 8
# GPRs used in parameter passing04 or 6 (d)8484 or 8 (e)86
# scratch GPRs (f)47 or 9 (d)24+9651818 or 17 (e)117+9+7
# saved GPRs (f)38 or 6 (d)7+968109 or 10 (e)2015
Number of floating point registers8+88+1612816 or 3232323232
FPR width in bits80 / 128 (g)80 / 128 (g)826412832 or 646464
# FPR used in parameter passing0+04 or 8 (d)8882 or 8 (e)80

Notes:

  1. 128 registers on Itanium can be accessed, but the processor has a minimum of 144 and can have more; for SPARC, 32 registers can be accessed, but the processor has anywhere between 64 and 528
  2. in Thumb mode, some instructions can only access 8 of the 16 registers
  3. the extra bit, called the Not-A-Thing (NAT) bit, is only used in some special circumstances
  4. the first number applies to Windows, the second number applies to Unix systems
  5. the first number applies to o32 and o64; the second number applies to n32 and n64
  6. “scratch” registers are those that a function may overwrite and need not save, also known as “caller-saved”, including the registers used as parameter passing; “saved” registers are those that a function must save before using; the concept does not apply directly to the rotating registers found on the Itanium and on SPARC (see below)
  7. the 387 registers are 80-bits wide and the SSE registers are at least 128-bits wide; they have been extended to 256 bits with the AVX extensions

Details

i386 or x86 or IA-32

The x86 architecture is the oldest in consideration and its age shows. The 32-bit architecture debuted with the Intel 80386 (whence the name “i386″) in 1985. It expanded on the Intel 8086 16-bit assembly by expanding the registers to 32-bit among other things. This architecture is still in use today and even modern processors like my Intel® Core™ i7-2620M (Sandy Bridge) boot into 8086 real-mode. I have some applications running on my Linux that are still i386 (like Skype).

The name x86 is because the 80386 (family 3) was followed by the 80486 (family 4), the Pentium (family 5) and the Pentium Pro (P6 archiecture, family 6). Some Linux distributions compile their packages for higher architectures, so you’ll find .i586.rpm and .i686.rpm too. The name IA-32 means “Intel Architecture, 32-bit,” which was created to indicate the difference to IA-64.

The instructions on x86 have variable lengths and can be anywhere from 1 to 15 bytes, averaging usually between 3 and 5 bytes, making the code density around 4 instructions per 16 bytes. That means jump and call targets can use all 32 bits of the addressing space. For performance and ABI reasons, jump targets and functions are usually aligned to 16 bytes (the ABI requires the low 3 bits to be clear for C++ member functions).

The traditional parameter passing uses no registers for parameter passing and pushes the parameter values from right to left as 32-bit slots onto the stack, which is popped by the caller. The stack is memory, so it suffers some penalties for its use. For that reason, most compilers offer alternative calling conventions, which allow passing some values in registers, pushing from left to right, and/or having the stack popped by the callee. You can find them by the names of “regparm” (GCC), “stdcall”, “syscall”, “pascal”. On Windows, the Win32 API is actually “stdcall”, whereas on Linux you’ll seldom ever find public API using anything other than the default convention. You can find more details about them in the Wikipedia article about X86 calling conventions.

The base i386 processor has 8 general-purpose registers and 8 stacked 80-bit wide floating point registers. All the floating point registers are scratch and can hold IEEE 754 single, double- or extended-precision values, while the general-purpose registers are distributed as follows (on Linux at least, though I think it applies to Windows too):

Special registerESP
Registers used for return valuesEAX, EDX
Scratch registersEAX, ECX, EDX
Saved registersEBX, ESI, EDI, EBP
Floating-point register used for return valuesST(0), ST(1)
Scratch floating-point registersall (ST(0) to ST(7))

The ESP register is special for architectural reasons: instructions that manipulate the stack work on it exclusively. That includes the procedure call and return mechanism, which store the return address on the stack. All the other registers can be used in almost any condition, even though there are certain instructions preferring one register over another. Only a few special instructions refer exclusively to a particular register (ECX in looping instructions and ESI and EDI in streaming instructions).

The EBP register is most often used as the “frame pointer” register: its value is the memory address where the previous function’s frame pointer was saved. It is used to load and store the incoming and local values at a fixed position. When writing assembly, it’s important to remember too that the EBX register is often used as the PIC register and cannot be used. If you need to use it in an special instruction, you’ll need to save it and restore afterwards (such as pushing it onto the stack or by xchg’ing it with another register).

The x86 architecture gained 8 MMX technology registers with the Pentium MMX, which are aliased to the floating-point registers and are all thus scratch. Later, with the Pentium III, 8 SSE registers, 128-bit wide, were added and then extended to 256-bits with the Sandy Bridge family. They are also all scratch and they can hold IEEE 754 single- or double-precision floating-point values. They can also be used in a variety of scalar, integer or floating point SIMD instructions.

x86-64

When the x86 architecture gained 64-bit support, not only were the registers expanded to 64-bit, the register set itself was expanded to 16 general-purpose registers, 16 MMX-technology registers and 16 SSE-technology registers. The floating-point registers are unchanged, though, as they are considered legacy. Unlike the i386 before it, the 64-bit expansion did away with compatibility with the 16- and 32-bit assembler instructions. Programs running in 64-bit mode (the “long mode“) run with a slightly different list of instructions. (Note that the 16-bit assembly is technically source compatible with the 32-bit one, but it’s not binary compatible)

As with x86, instructions have variable length in bytes, but the ABI and performance requirements are the same, so jump targets and functions are often aligned to 16 bytes.

As this architecture was created after SSE registers were introduced, the SSE registers are part of the calling convention. In fact, the SSE and SSE2 instructions are the preferred way of manipulating single- and double-precision floating-point values. The ABI for this architecture was specified by AMD when it launched the first 64-bit processor and by Microsoft for its Windows operating system.

 UnixWindows
Special registerRSP
Function return addresstop of stack
GPRs used for return valuesRAX, RDX
GPRs used in paramter passingRDI, RSI, RDX, RCX, R8, R9RCX, RDX, R8, R9
Scratch GPRsRAX, RCX, RDX, RSI, RDI, R8-R11RAX, RCX, RDX, R8-R11
Saved GPRsRBX, RBP, R12-R15RBX, RBP, RSI, RDI, R12-R15
387 register used for return values (long double)ST(0), ST(1)
Scratch 387 registersall (ST(0) to ST(7))
Floating-point registers used for return valuesXMM0, XMM1
Floating-point parameter registersXMM0-XMM7XMM0-XMM3
SSE scratch registersall (XMM0-XMM15, YMM0-YMM15)

Like 32-bit x86, the RSP register is architecturally-special and it’s manipulated by the push, pop, call, ret and similar instructions. The RBP register is also used as a frame pointer. The x86-64 architecture does allow for RIP-relative addressing, which was introduced so that a PIC register wouldn’t be necessary. Yet RBX is still used by some compilers under some conditions like that, so it’s best to apply the same saving mechanisms as before.

On Windows, this architecture runs in LLP64 mode: long longs and pointers are 64-bit wide, but longs and ints are 32-bit. On Linux, this architecture can run in both LP64 mode (longs and pointers are 64-bit wide) and in ILP32 mode (ints, longs and pointers are 32-bit). The ILP32 mode, called “x32“, makes use of the 8 additional GPRs and 8 additional SSE registers along with this calling convention as an effort to renew the 32-bit x86 world.

Itanium (IA-64)

The Intel Itanium architecturewas the result of the joint project between Hewlett-Packard and Intel in the late 1990s and was released in 2001. It was designed to take the best of the expertise of the time and produce a new, future-proof architecture for years to come. It was intended to replace the old 32-bit x86 architecture, which is why it got the name of IA-64.

Itanium uses a concept called Very long instruction word (VLIW) and each instruction is 41 bits in length, with a few 82 bits for encoding of 61- to 64-bit immediates. Each 3 instructions are grouped in a “bundle” occupying 128 bits (16 bytes), so all jump and function targets are aligned to 16. Another concept used in the architecture is Explicitly parallel instruction computing (EPIC) where the compiler must tell the processor which instructions can be executed in parallel and which ones must wait for others. This is encoded in the assembly as “stop bits”, which are coded to the 5 remaining bits of the 128-bit bundle. Not all combinations of instructions and stop bits are possible, so Itanium code has often many “nop” instructions and is very big. Code density is 3 instructions per 16 bytes, including counting the “nop”.

The Itanium architecture is still the record-holder in terms of raw number of accessible registers. Application programmers have access to 128 general-purpose registers, 128 floating-point registers, 128 architecturally-specific registers, 64 1-bit predicate registers and 8 branch registers. That’s 4 times as many GPRs and FPRs as any other architecture I listed, plus the other special registers. They are divided thus:

General-purpose registers
Architecturally-special GPRr0 (reads are always 0, writes are discarded)
ABI-defined special GPRsr1 (gp), r12 (sp), r13 (tp)
GPRs used in return valuesr8-r11
GPRs used in integer parameter passingr32-r39 (in0 to in7), specially r8
Non-rotating scratch GPRsr2, r3, r8-r11, r14-r31
Non-rotating saved GPRsr4-r7
Rotating GPRsr32-r127 (in0-in96, loc0-loc96, out0-out96)
Floating-point registers
Architecturally-special FPRsf0 (0.0) and f1 (+1.0)
Registers used for return valuesf8-f15
Registers used in parameter passingf8-f15
Non-rotating scratch FPRsf6-f15
Non-rotating saved FPRsf2-f5, f16-f31
Rotating FPRsf32-f127
Predicate registers
Architecturally-special PRp0 (always 1)
Non-rotating scratch PRsp6-p15
Non-rotating saved PRsp1-p5
Rotating PRsp16-p63
Branch registers
Function return addressb0 (rp)
Scratch registersb0 (rp), b6, b7
Saved registersb1-b5

On function entry, the r8 register contains the address of a memory region for the return value if the struct or union being returned is larger than 32 bytes (i.e., doesn’t fit r8-r11).

The three architecturally-special registers (r0, f0 and f1) always have the same value when read: integer 0, floating point 0.0 and floating-point 1.0 respectively. This allows for the assembly to do away with some instructions by just making them alias to others: for example, there is no instruction to load small immediate values onto a GPR. Instead, the instruction is replaced by an addition instruction where one of the operads is r0. The same applies to the floating-point multiplication and addition instructions: the Itanium only has a 4-operand fused multiply-add, so pure additions are done by multiplying one of the sources by f1 and pure multiplications are done by using f0 as the other source.

The 96 upper GPRs, FPRs and the 48 upper PRs are rotating: that means that some instructions can cause the register names to rotate. The three types of registers can be used in rotating loops, where several iterations of the loop are running in parallel with different registers. When not used in rotating fashion, all those registers can be used as scratch.

In addition to loops, the 96 upper GPRs can be rotated on function calls and returns. For that reason, each function can consider it has up to 96 saved registers because those registers simply cannot be seen by functions it calls. They are saved by the Register Stack Engine, asynchronously and at processor-specified times. The architecture allows each function to select how many rotating registers it wants to use and how many of those are to be available to functions called (those are the outgoing registers), though when writing assembly, one specifies how many registers are incoming, local and outgoing, so the named registers are available in the function body. The ABI limits the number of outgoing registers to only 8 rotating registers.

A leaf function or one with tail-call optimisation may opt to keep the rotating registers unchanged. It has available 24 non-rotating scratch GPRs, 10 non-rotating FPRs and 96 rotating ones, 10 non-rotating PRs and 48 rotating, and 3 scratch branch registers (one of them containing the return address), plus the incoming registers. That’s more than enough for most leaf functions, without even using the stack. A tail-call optimisation, however, requires that the called function take no more arguments than this function took, as expanding the number of outgoing registers would destroy another register that must be saved (ar.pfs).

An interesting feature of the GPRs is that they are actually 65-bit in width: the extra bit is called the “Not a Thing” (NAT) bit, which is an indication of whether the other 64 contain a valid value or not. The Itanium has some instructions that allow a “speculative load”: the instruction will try to load the value from memory so long as it doesn’t cause a page fault. If the value could not be loaded, the NAT bit is set and software must later check it, once it determines that it really needs that value. Using the value contained in a GPR while the NAT bit is active, besides copying the contents to another register or saving the contents with a special spill instruction, causes an exception.

The floating-point registers are 82 bits in width, allowing each to hold intermediate values of higher precision than IEEE 754 extended-precision. The “application registers” are 128 special 64-bit registers, each of which with a special meaning. Some of those registers are read-only, some are used by certain instructions and are thus scratch, most have special purpose. In particular, the ar.pfs register must be saved across function calls.

Itanium is defined for LP64 and ILP32 mode for Unix and LLP64 mode for Windows. The ILP32 mode is supported by a special instruction for dealing with pointers: once loaded from 32-bit storage, the pointer is “pointer-extended” to 64-bit before it can be used.

The ABI for Itanium was specified by Intel in the document I linked to in the last blog. Interestingly, Intel specified almost everything relating to the Itanium, including a full C++ ABI. This became known as the Itanium C++ ABI and is what GCC uses in all platforms, not just Itanium.

ARM 32-bit mode (AArch32)

Instructions in the ARM architecture, when running in “ARM mode”, are all 32-bits wide. For that reason, all jump targets and function addresses are aligned to 4 bytes and the low 2 bits are always unused. However, when ARM code is in “interworking” with Thumb code, those two bits are special, which mean that function addresses on ARM require the use of all bits. This has implications for the C++ ABI: since all bits are used in function addresses, the bit indicating whether a pointer-to-member-function is virtual is moved to the adjustment field.

32-bit ARM has 16 registers, one of which is the program counter. All registers are 32 bits in width and can be used in all instructions alike, including the PC, which makes it possible to have branching with arithmetic instructions (for example, “add pc,pc,r0″). The PC register is special and all operations on it are not supported. Moreover, reading from it yields the address of the current instruction plus 8. “nop” instructions are not common in ARM assembly, so the code density is 4 instructions per 16 bytes. However, due to the limited range of immediates, ARM code is often littered with nearby constants that must be loaded, not executed.

The ARMv6 architecture mandates at least 16 floating-point 64-bit wide registers, and ARMv7 allows optionally for 16 more of them to exist. The registers are divided so:

General-purpose registers
Architecturally-special GPRr15 (pc)
ABI-defined special GPRr13 (sp)
GPRs used for returning valuesr0-r3
GPRs used in parameter passingr0-r3
Function return addressr14 (lr)
Scratch GPRsr0-r3, r12 (ip), r14 (lr)
Saved GPRsr4-r11
Floating-point registers
FPRs used for returning valuesd0-d7
FPRs used in parameter passingd0-d7
Scratch FPRsd0-d7, d16-d31
Saved FPRsd8-d15

The table above assumes that one is using the floating-point hardware registers to pass parameters, in what is called in the ARM world “hard float”. According to the ARM Architecture Procedure Call Standard, this is optional: if not enabled, the floating-point parameters are converted to their 32- or 64-bit representations and passed in the GPRs.

The floating-point registers can be accessed in 64-bit mode to hold one IEEE 754 double-precision value or as two 32-bit registers holding IEEE 754 single-precision values. Extended-precision is not supported by hardware — on the ARM ABI, the “long double” type is an alias to “double”. Each of the original sixteen 64-bit FPRs can be accessed as two 32-bit FPRs when one prefixes them with “s” instead of “d”: s(2N) corresponds to the lower half of dN, while s(2N+1) to the upper half. A pair of any two sequential FPRs, starting on an even-numbered register, can also be accessed as sixteen quad-word (128-bit) registers when prefixed with “q”.

The r13 (sp) register was chosen by the ABI more-or-less arbitrarily, as any other register could be used to store the current address of the top of the stack. However, this register becomes architecturally-specific when Thumb mode is in use.

Thumb sub-mode

ARM CPUs can also run a sub-mode called Thumb, in which most instructions are 16-bit in width. Older ARM processors can only run 16-bit Thumb instructions, while newer ones support additional 32-bit Thumb instructions. Thumb instructions are therefore not aligned to 4-byte boundaries. When ARM and Thumb code interwork, the lowest bit of jump and call addresses indicates the instruction mode: 0 indicates ARM code while 1 indicates Thumb code. However, when the PC register is accessed in Thumb mode, the lowest two bits are forced to zero, so a read of the PC always yields a 4-byte aligned value. Like in ARM mode, reads of the PC give the current instruction plus 8 (rounded down).

The 16-bit instructions are a reduced set, with restricted access to registers. The r13 (sp) register is hardcoded in some stack operations, which makes it architecturally-specific. The ABI itself does not change, but 16-bit instructions can only encode the lower 8 of the 16 registers, which means that 16-bit Thumb functions are limited to 4 scratch and 4 saved registers.

ARM 64-bit (AArch64)

The ARM 64-bit architecture has expanded the ARM 32-bit register set from 16 to 32 general-purpose registers and 32 floating-point registers. The program counter register is no longer architecturally visible and the stack pointer register is architecturally special.

There are still no ARMv8 chips available and I have not seen any compiler generating code for it. The information below comes from the public ARM manuals I could find. The instruction width appears to be 32-bit, like in AArch32 ARM mode (A32). The ABI was specified by ARM too.

General-purpose registers
Architecturally-special GPRSP and ZR
GPRs used for returning valuesr0-r7
GPRs used in parameter passingr0-r7, specially r8
Function return addressr30 (lr)
Scratch GPRsr0-r18, r30 (lr)
Saved GPRsr19-r29
Floating-point registers
FPRs used for returning valuesv0-v7
FPRs used in parameter passingv0-v7
Scratch FPRsv0-v7, v16-v31
Saved FPRsv8-v15 (lower 64-bit values only)

On function entry, the r8 register contains the address of a memory region for the return value if the type being returned would not be stored in registers if it were the first parameter in a function call.

The SP and ZR registers are actually the same register, the difference is how the instruction deals with them. Some instructions are specified to work on SP and will read and write values to it. Some others treat that register as a zero or discard the output when it is the destination. When used in those conditions, the assembly lists it as “ZR”, or, like Itanium, uses different mnemonics to indicate the absence of source.

The floating-point registers can be accessed as 32-, 64- or 128-bit wide registers, holding single-precision, double-precision and quad-precision values respectively. However, the hardware only supports floating-point math in single- and double-precision. The ABI asks that only the 64 lower bits of the registers v8 to v15 be saved. That means if a function stores data in the high bits, it must save them on its own before calling another.

In assembly code, one will not see the registers named with the prefixes “r” or “v”. Instead, the GPRs are prefixed with “w” to mean 32-bit access or “x” for 64-bit, while the FPRs are prefixed “s” (32-bit), “d” (64-bit) or “q” (128-bit). The “v” prefix is seen on SIMD operations.

The ARM 64-bit architecture is defined for LP64 mode, but it might be possible to run it in ILP32 mode to make use of the extra registers and improved calling convention.

MIPS

MIPS processors exist with both 32- and 64-bit registers, respectively called MIPS32 and MIPS64. Unlike the x86 and ARM architectures where the assembly language differs considerably between 32- and 64-bit, the MIPS64 instruction set is a complete superset of MIPS32 and differences in programs are only due to the ABI. The MIPS64 processors do not have a special architectural mode to run MIPS32 code. MIPS64 processors were mostly used with the IRIX operating system, which has been discontinued (SGI now sells Linux machines running on Itanium). MIPS32 processors are quite common in the embedded world, finding their use on many WiFi routers and Set-Top-Boxes.

MIPS instructions are 32-bit in width, but some processors support an extension called microMIPS16 and can run 16-bit wide instruction code. I have not investigated if the instruction streams can be mixed or a technique similar to the ARM-Thumb interworking is necessary. GCC does not seem able to generate microMIPS16 instructions.

MIPS processors have 32 general-purpose registers, which are 64-bit wide on MIPS64 and 32-bit on MIPS32. Additionally, an FPU and a second co-processor are optional. The FPU, if present, has 32 registers that are 32-bit wide and support (at least) single precision, double precision and 32-bit integers, with double-precision values stored in a pair of registers. 64-bit support in the FPU is possible, but optional.

 o32 and o64 ABIn32 ABIn64 ABI
Architecturally-special GPR$0 (zero)
ABI-specified special GPRs$26 (kt0), $27 (kt1), $29 (sp)
GPRs used for returning values$2, $3
GPRs used in parameter passing$4 to $7$4 to $11
Function return address$31 (ra)
Scratch GPRs$1 (at), $2 to $15, $24, $25, $28 (gp)$1 (at), $2 to $15, $24, $25
Saved GPRs$16 to $23, $30$16 to $23, $28 (gp), $30
FPRs used for returning values$f0 to $f3$f0 and $f2 (not $f1)
FPRs used in parameter passing$f12 to $f15$f12 to $f19
Scratch FPRs$f0 to $f19$f0 to $f19, $f20, $f22, $f24, $f26, $f28, $f30$f0 to $f23
Saved FPRs$f20 to $f31$f21, $f23, $f25, $f27, $f29, $f31$f4 to $f31

Some of the registers deserve special mention:

  • $1 is used by the assembler for some operations where an assembly mnemonic does not fit; it is known as “at” (assembler temporary)
  • $25 contains the address of the function on entry on PIC code; since the compiler usually doesn’t know whether the target function is PIC or not, it will most likely load its address on $25
  • $26 and $27 are reserved to hold values from the kernel and should not be modified
  • $28 (gp), unlike the previous architectures, on o32 and o64, the global pointer (the PIC register) is not stored in a saved register
  • $f20 to $f31: since the early MIPS double-precision operations operate on a register pair, the registers must be saved in pairs too (o32 only)

In particular, the o32 ABI uses only the floating point registers always in pairs: the odd-numbered registers are never used alone. The o64, n32 and n64 ABIs allow using them independently and assume they can hold a double-precision value.

An interesting feature of the MIPS assembly, that it shares with SPARC, is that the first instruction after a taken branch is still executed.

POWER and PowerPC

The Power Architecture defines processors with both 32- and 64-bit registers. Unlike the x86 and ARM architectures where the assembly language differs considerably between 32- and 64-bit, the 64-bit instruction set is a complete superset of the 32-bit one and differences in programs are only due to the ABI.

The Power Architecture specifies two profiles for its processors: the Server Platform, which is mandatorily 64-bit, and the Embedded Platform. It has 32 general-purpose registers, one of which is special. If the FPU is present, it provides 32 registers capable of holding 64-bit double-precision floating point values. It also has an optional vector unit extension, known as Altivec.

The Power Architecture ABI document I found specifies many optional functionality, including soft-float. I am listing here the use of the floating point registers for parameter passing and a common-sense profile.

General-purpose registers
Architecturally-special GPRr0
ABI-defined special GPRr1 (sp)
GPRs used for return valuesr3-r6
GPRs used for parameter passingr3-r10
Function return addressLR
Scratch GPRsr0, r3-r12
Saved GPRsr2 (tp), r13-r31
Floating-point registers
FPR used for return valuesf1
FPRs used in parameter passingf1-f8
Scratch FPRsf0-f13
Saved FPRsf14-f31
Vector registers
VR used for return valuesv2
VRs used in parameter passingv2-v13
Scratch VRsv0-v19
Saved VRsv20-v31

Notes:

  • lr: it’s a special register that contains the value of the return address after a function call and must be saved (it’s a scratch register)
  • r0: it is a valid register containing values, but some instructions cannot access it; instead, they will always read a zero value and will discard the result

The register names above are prefixed with a letter for convenience. The POWER assembly unfortunately uses no prefixes for registers, addresses or absolute values: when you see a number like “2″ in the disassembly, you need to understand the instruction in question to determine if that refers to r2, f2 or a value of 2.

An interesting feature of the assembly is the Enforce In-Order Execution of I/O instruction, whose mnemonic reads “EIEIO”.

SPARC

The SPARC architecture began as 32-bit but was extended to 64-bit with the SPARCv9 in 1993. The SPARC processors are most commonly known as UltraSPARC today. 32-bit processors are not sold anymore, however, ILP32 applications still exist and run unmodified in current processors as the difference is only in the ABI.

The SPARC architecture has 32 general-purpose registers, of 64-bit in width on SPARCv9 and above. Like Itanium, the SPARC has a rotating register window: 24 GPRs are rotating and 8 are fixed. Unlike Itanium, the rotation window has a fixed size. The save instruction rotates it by 16, making the registers %r8 to %r15 become %r24 to %r31 and 16 clean registers at %r8 to %r23, while the restore instruction does the opposite and restores the 16 registers that had been saved. So, unlike Itanium, the outgoing parameters are in lower-numbererd registers and the incoming ones are in higher numbered ones.

The general-purpose register set is divided in four groups of 8 registers because of the window. The lowest 8 registers are fixed and are named %g0 to %g7; the next 8 are shared with a function being called, so they are the outgoing registers and named %o0 to %o7; the next 8 are only visible in the current function and are therefore named %l0 to %l7; finally, the upper 8 registers are shared with this function’s caller and are named %i0 to %i7.

Because of the register rotation, the definition of “scratch” and “saved” does not apply directly: the registers a function must preserve for its caller are not the same registers that its callee will preserve. The following table shows the registers in the point of view of the function after is has rotated the register window.

General-purpose registers
Architecturally-special GPR%g0, %i7 (partially)
ABI-defined special GPRs%g2-%g7, %o6 (%sp), %i6 (%fp)
GPRs used in return values%i0
GPRs used in passing parameters%i0-%i5
Function return address%i7
GPRs preserved by a callee%l0-%l7, %i0-%i7
GPRs destroyed by a callee%g1, %o0-%o5, %o7
Floating-point registers
FPRs used in passing parametersNone, they are passed in the outgoing registers or stack
FPRs used in returning values%f0, %f1
Scratch FPRsAll

The registers %g2 to %g4 are reserved by the ABI for the application, for uses to be decided by the application and compiler, while registers %g5 to %g7 are to be considered read-only by the application and compiler.

When a function is called, it has available 6 rotating registers containing its incoming values and can be considered scratch. Additionally, %g1 also is scratch. For that reason, leaf functions or functions with tail-call optimisation do not have to rotate the register window if they are satisifed with those 7 registers.

The rotating window also makes the %sp register become the %fp upon rotation, allowing for easy save of that value. Unlike other architectures, it’s the %fp register that must be preserved for the caller, so a leaf function can use the %sp register as a scratch if it needs to (provided it has rotated the register window).

Like MIPS, SPARC’s branch instructions also execute the instruction immediately following the branch, what’s called the “delay slot”. Disassemblers often indent that instruction further for clarity. However, unlike other architectures, the SPARC call instruction does not save the return address, but its own address. To return, a function must jump to %i7 + 8.

The FPU contains 32 registers that are 64-bit wide, numbered %f0, %f2, %f4, … %f62. Each pair of registers, starting at one numbered multiple of four (%f0, %f4, %f8, … %f60) can be accessed in a quad-precision way. Single-precision access is done in sequentially-numbered registers %f0, %f1, %f2, … %f31, where each pair aliases one 64-bit register. Note that SPARC is a big-endian platform, so the upper halves of a larger register are found in the lower numbered register.

Jan 09

Assembly developer’s library

Every now and then, when coding in C++, I find myself needing to know some assembly to understand what’s going on. Sometimes, it’s because I am actually writing assembly code, such as when I was writing the new atomic classes for Qt. More often, it’s because I need to read the assembly generated by the compiler to figure out if it’s optimal or if it’s doing something weird.

So I often found myself downloading the same manuals over and over. I decided to put together a small library of manuals I use often and those I seldom use, but might want to some day. This is the list.

ArchitectureInstruction set manualABI description (calling convention)
i386 (IA-32)Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2traditional / many / varied (Wikipedia article)
x86-64 64-bit a.k.a. x64Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2System V ABI for x86-64 (LP64) and Windows x64 calling convention (LLP64)
x86-64 32-bit (ILP32) a.k.a. x32Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2x32 ABI
Itanium (IA-64)Intel® Itanium® Architecture Developer’s Manual, Vol. 3

Itanium® Software Conventions and Runtime Architecture Guide (both ILP32 and LP64)
ARM 32-bit (AArch32)ARM assembler referenceARM Architecture Procedure Call Standard (AAPCS)
ARM 64-bit (AArch64)ARMv8 Instruction Set (registration required)ARM 64-bit Architecture Procedure Call Standard (AAPCS64)
MIPS32The MIPS32® Instruction Set and The microMIPS32™ Instruction Set (registration required)o32 ABI, n32 and n64
MIPS64The MIPS64® Instruction Set (registration required)o64 ABI, n32 and n64
POWER Architecture (includes PowerPC)Power Instruction Set Architecture Version 2.06Power Architecture 32-bit ABI Supplement 1.0 Unified (applies to 64-bit too I think)
SPARCSPARC archictecture V9SPARC psABI 3.0

Of course there are more architectures. Those are just the ones that are (somewhat) relevant to me a Qt developer. Also, please note that I have not looked with detail into the POWER and SPARC manual, other than a cursory glance to ensure that they contained the relevant information. For example, one site I found says that Linux 32-bit on PowerPC uses an ABI different than that defined by power.org.

I have not listed quick reference guides, many of which exist. I don’t use them because I often need details of the instructions.

What do you usually use when you code in assembly?

Additional IA-32 and x86-64 resources

The IA-32 ans x86-64 architectures contain a very big number of extensions that are added in different generations of the processors, both by Intel and AMD. The manuals above include the main extensions: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AVX, plus the AES, FMA, F16C, RDRAND ones. It does not include extensions specific to AMD processors, like 3dNow! (effectively deprecated), SSE4a, FMA4 or XOP.

Other useful manuals:

Additionally, most of the non-general purpose instructions have intrinsic functions associated with them, so you can write really low-level code in C or C++ without actually having to write assembly. Unfortunately, I haven’t found a good, downloadable, and up-to-date intrinsics reference manual. The Intel(R) C++ Intrinsics Reference is the closest I’ve found, but it stops at SSE4 intrinsics, not including the AVX or AES ones.

If you know of one, please leave me a comment.

Glossary

ABI
Application Binary Interface
ILP32
int, long, pointer are 32-bit, virtually all 32-bit platforms (all of the ABIs listed here)
LP64
long and pointer are 64-bit, the 64-bit environment of all Unix platforms here
LLP64
long long and pointer are 64-bit, but long is 32-bit; the 64-bit environment of Windows

Dec 22

Winter is coming

Yeah, I know the title of this blog is mostly a cliché these days, but I feel like I am entitled to it. I am, after all, 75% done with the fifth book in the Song of Ice and Fire series, A Dance with Dragons. It’s also December 22, which is the first day of Winter in the Northern Hemisphere. But, as I told my friend Espen, I am spending most of this Winter in Summer, a feat accomplished by changing hemispheres and earning me a punch in the arm.

More to the point, Winter is also coming for Qt 5.0: we are approaching feature freeze. The exact date, I can’t tell you, because we don’t know yet. That’s actually the subject of this blog: we need to find a date and I need your help to get there.

There are two competing directions for setting the feature freeze date. The first, which pushes it forward, is the need to complete the features that can only be done in 5.0 — features that, if they fail to enter 5.0, cannot be added to a 5.x release and would need to wait for 6.0. The other is the direction set by Lars, our Chief Maintainer, that we want to release 5.0 by mid-2012, before the Summer break. We know, from past experience, that the quality control a Qt release undergoes is no trivial task, so the more time we give us, the better the result will be.

Therefore, working backwards from a (gold) release date in late June, we need a feature freeze and alpha release sometime in January. In fact, given that this is a major release, with binary compatibility breaks, rearchitecturing of the core GUI functionality, replacing all the ports with QPA, we should have already frozen three months ago.

What we need to do now is to determine what needs to be in Qt 5.0. What are the features need to be complete by the time we freeze? And who’s working on them? For that reason, we have created a wiki page to list such features. Update the wiki page if you know of a project that must go into 5.0, but please include the state of completeness of that item. If you don’t know who’s working on it or if no one is, make that explicit. You can also leave your comments in the blog or post to the development mailing list.

Note that all the features listed in the page may not get into 5.0. This is just the first step in setting the feature freeze date: judge what’s needed and how ready it is, then make a decision on when we can freeze so we most likely can get the wanted features in. Features that are not being worked on or are late may not go in, or exceptions may be granted.

Finally, if you have some spare brain cycles during the holiday season, help in completing the features listed in the wiki is most welcome. Help us make a great Qt 5.0!

Dec 15

Qt 4.8 released

Sinan is writing on the Qt Labs blog that Qt 4.8.0 is released. I won’t duplicate the entire contents of his post, so I’ll just give you the information you’re looking for: the download link.

You can download it from the Download Page:

Happy hacking!

Oct 21

Qt-Project.org is live

As you may have noticed in Lars’s blog the new Qt Project website and organisation is live! Yeah! It’s the product of many people’s work over the course of a year and a half, changing the way how over 200 engineers work on their daily lives.

The change is just in time for the Qt Developer Days event in Munich, which starts next Monday with a Qt Contributors’ Unconference Day. I’ll be there and we’ll be discussing how to get started. It’s also just soon after Research In Motiion opened up its Native BlackBerry SDK with support for Qt.

Here are some resources you may want to get started with:

  • Mailing lists: Subscribe at http://lists.qt-project.org/mailman/listinfo to the mailing lists that might be of your interest. There’s no description available now for them, but you can guess what they are as per the name.
  • Bug tracking: If you don’t have an account yet, create one by singing up at the Qt Bugreports website, which is still in a .nokia.com domain but should change hopefully soon
  • Code review: sign up first for the Bugrepotrs account above, then head to Codereview website and log in with the same credentials. There, set your real name and add one or more email addresses you’re known by.

The Codereview website is where the reviews and approvals will all happen, and the Development mailing list is where all discussions will happen. If you plan on being involved, you should be on both.

In order to contribute a code change to Qt, you’ll need to provide an SSH public key in order to authenticate yourself and you’ll need to agree to the terms of the terms of the Qt Contribution Agreement, now on version 1.1. If you choose not to do that, you won’t be able to contribute code, but you can of course contribute in many other ways, including reviewing and offering advice on how to improve other people’s code.

You may want to add this to your ~/.ssh/config


Host codereview.qt-project.org
‍‍Port 29418
‍‍User insert-username-here
‍‍IdentityFile insert-path-to-ssh-key-here

And this is the SSH key and fingerprint for the website:

Fingerprint: 11:24:25:51:5d:ab:4f:b1:15:49:10:3a:68:6d:ec:0f
[codereview.qt-project.org]:29418,[87.238.53.162]:29418 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCvXdApmCFiAyXDiYU5+z6762Qv8+vrmM3+9YrxDKByyphaxblLJC9txPv3D/w7rzSyiMMHL/5ssCemwz+6QBqnemFl4B+FNv81fpZFsqCg5afrTi62WFllGWIQAiYb2JZmkmSAbxm+sAxLE1ritp+Syxz8Gb8WR27G/3TSHerdBQ==

Oct 18

QUrl in Qt 5: woes of hostname validity

A couple of days ago I posted on Google+ a comment when I was frustrated trying to update the QUrl hostname-parsing code. Turns out that rewriting the parser wasn’t that difficult for QUrl, but dealing with hostnames is very much so. The old code in QUrl simply deals with it directly, even what’s supposed to be IPv4 and IPv6 addresses.

Trying to validate them according to the Augmented Backus–Naur Form grammar found in Appendiix A of RFC 3986 is a extremely difficult. What’s more, the grammar is very strict and doesn’t allow for common forms of v4-compat and v4-mapped IPv6 addresses (that is, “::192.88.99.1″ and “::ffff:10.0.0.1″).

So I took it upon myself to rewrite the IP-parsing and reconstructing routines, which were previously in src/network/kernel/qhostaddress.cpp. I wrote a lot of unit tests for it and tried to match the behaviour of inet_aton(3) for IPv4 and of inet_pton(3) for IPv6. That means accepting some non-standard, old behaviour like an IP address of “127.1″ or “2130706433″ for “127.0.0.1″.

Why am I accepting that? Well, when I talked to Adam Barth on IRC, he pointed me to his result list of comparing broken URLs found in the wild and how different browsers handled them. QUrl in Qt 4 probably fails or has broken behaviour for most, if not all, of the entries in the “LayoutTests/fast/url/host.html” section.

So when rewriting the parser, the first thing I noticed is that the hostnames may come in percent-encoded form, so we need to decode them to find a proper address that may fit the rules. For example:

InputDecoded asType
%31%39%32.%30%2e%32%2E1192.0.2.1IPv4Address
%5b%66f02%3a%3a1][ff02::1]IPv6Address
%65%78%61%6d%70%6c%65%2e%63%6f%6dexample.comreg-name

Table 1. Percent-encoded hostnames

The next thing I noted from the tests is this particular URL: "http://0Xc0.0250.01/" (I used a fixed-width font here so you can see the difference). This particular URL is using characters found in Fullwidth Latin Latters range of Unicode (from U+FF00 to U+FFEF). They are exactly the same letters and numbers as found in the regular range, but they occupy one full width, like the ideographs in the CJK block. The regular codepoints used in mostly Latin text, like this blog, is considered halfwidth in Unicode parlance.

What’s interesting about that URL is that when you apply the rules of the ToASCII transformation of the IDNA process, the step called Nameprep (described in RFC 3491), the fullwidth forms are transformed into their halfwidth counterparts. So the URL above, after going through the ToASCII process, becomes simply “http://0xc0.0250.01″ and, despite having non-digits, the new IPv4 address parser accepts as “192.168.0.1″. So let’s add to our table:

InputDecoded asType
0Xc0.0250.01192.168.0.1IPv4Address
Example.comexample.comreg-name

Table 2. Unicode latin fullwidth hostnames

(note that IPv6Address is not on the table, it will be important later)

In other words, a hostname can be encoded in either the percent-encoded form or in Unicode and still be a regular IPv4 address. To make matters worse, it can be encoded in both!

The RFC describing URIs and URLs (RFC 3986) has a companion describing IRIs (Internationalised Resource Identifiers): RFC 3987. The IRI spec requires that a Unicode codepoint be equivalent to its percent-encoded UTF-8 form. That is, the letter “é” (U+00E9 LATIN SMALL LETTER E WITH ACUTE) is equivalent to “%C3%A9″. If that is so, then the Unicode fullwidth forms can be encoded in UTF-8 percent encoded too. If we encode the hostnames found on table 2 above, we get:

InputDecoded asType
%ef%bc%90 %ef%bc%b8 %ef%bd%83 %ef%bc%90 %ef%bc%8e %ef%bc%90 %ef%bc%92 %ef%bc%95 %ef%bc%90 %ef%bc%8e %ef%bc%90 %ef%bc%91 192.168.0.1IPv4Address
%ef%bc%a5 %ef%bd%98 %ef%bd%81 %ef%bd%8d %ef%bd%90 %ef%bd%8c %ef%bd%85 %ef%bc%8e %ef%bd%83 %ef%bd%8f %ef%bd%8d example.comreg-name

Table 3. Hostnames in Unicode fullwidth latin letters and percent-encoded
(spaces are for legibility purposes only)

Could it get any uglier? I thought it could. If there are fullwidth characters that transform to regular numbers and letters, is there one that transforms to the percent sign? Well, turns out that there is: “%”. If we apply NKFC to that, we obtain a regular ‘%’. And if you pay close attention to Adam Barth’s list, you see it being used (lines 39-47): “http://%ef%bc%85%ef%bc%90%ef%bc%90.com/” and http://%ef%bc%85%ef%bc%94%ef%bc%91.com/” (the fullwidth percent is “%ef%bc%85″).

At this point, I was about to pull my hair out (thankfully, I had a haircut last week, so I can’t get a grip on my hair). Which operation should I do first: decoding the percent-encoding or applying Nameprep (ToASCII)? Moreover, what’s stopping me from writing “%ef%bc%85″ (the percent-encoded representation of the fullwidth percent) in its fullwidth form (“%EF%BC%85”)? And then encoding that in percent-encoding (“%ef%bc%85 %ef%bc%a5 %ef%bc%a6 %ef%bc%85 %ef%bc%a2 %ef%bc%a3 %ef%bc%85 %ef%bc%98 %ef%bc%95″)? And then repeating the process ad nauseam?

If you’re still with me, we’ve just found a problem: this is infinite recursion. We have to put a stop to it.

Then I remembered another detail: there are also a fullwidth character for slash (“/”), question-mark (“?”) and hash (“#”), all of which are special in URL encoding. Those characters, especially the slash, were the source of a security problem a year or two ago, in which you could hide it in a specially-crafted domain name: for example, in “www.bank.com.xn--6g7c.com”, a blind ToUnicode operation results in “www.bank.com./.com”. Since this attack appeared, QUrl enforces strict STD 3 compliancy. After the Nameprep operation, QUrl will apply these steps from RFC 3490 Section 4.1:

(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.

(b) Verify the absence of leading and trailing hyphen-minus; that
is, the absence of U+002D at the beginning and end of the
sequence.

With this, the “%” and the “[” charactery are rejected and the hostname is considered invalid. For that reason, a hostname containing “%” or its UTF-8 percent-encoding is never valid and we put a stop to the iteration.

That also means the hostname field of QUrl will continue to reject anything that doesn’t conform to the rules above (an exception was made for accepting the underscore character), even if the ABNF for URIs would otherwise accept them. In particular, none of the “sub-delims” or non-URL characters are permitted, either in decoded or percent-encoded forms. That’s why “ed2k://” URLs are not allowed: they use the pipe character (“|”) in the hostname and that fails to comply with STD 3.

I’m almost done with QUrl. Yesterday, after completing the code, I started to run the Unit tests, which are down to 87 failures (from 269 the first time I ran, after fixing the crashes). I should finish with 90% of the failures by tonight, by a mix of fixing the code that isn’t correct and fixing tests that are now wrong.

Oct 12

QUrl in Qt 5: validity

In the previous blog about QUrl in Qt 5, Kenneth suggested I talk to Adam Barth about the issues with URLs in WebKit. So I went to the #webkit channel on Freenode, pinged him and we discussed a bit. He pointed me to a lot of unit tests that WebKit runs related to URL parsing and interpreting, including some of his own results about the acceptance of different browsers (using WebKit and not) to some of the garbage we find out there.

Turns out he has a very extreme view on the subject of URLs and URIs. His position was that there is no standard for what a URL truly is and the RFCs trying to define it (RFC 3986 and RFC 3987) are to be ignored. My position — which matters to you because I’m doing QUrl for Qt 5 — is that the RFC standards are valid and specify how to handle those URIs. Everything else is undefined behaviour and could be rightfully rejected, but we won’t because there’s just too much out there.

RFC 3986 defines an ABNF grammar in Appendix A for parsing of URIs. Qt 4′s QUrl followed this grammar strictly. If you look at the source code, you’ll find matches for exactly the same terms as defined by the grammar. I’m not sure if this is fast, however: could the parser be faster if we had coded it differently? We’ll soon find out, as I’m about to rewrite it.

As a turn of events, however, QUrl started not from the URI but instead from a broader definition of URI-reference which can be either the URI as we’re used to or a relative-ref. The latter is what you usually find in fields where URLs are expected, like the HREF attribute for the A element or the SRC attribute for the IMG element in HTML. That meant that the QUrl::isValid() function was mostly useless, as most inputs were considered valid. What people expected to be invalid did match the relative-ref part of the grammar and the data ended up in the URL’s path component.

So despite being strictly-conforming, the parser was actually quite liberal. Couple that with the QUrl::TolerantMode parsing which corrected mistakes in the percent-encoding, QUrl almost never rejected a URL. The only thing it started to reject were bad hostnames because I considered them a security issue (homograph attacks). QUrl started to apply strict STD 3 conformance and rejected anything malformed there.

For Qt 5, I will relax the parser even further and I’ll accept some of the really strange inputs that I found in WebKit’s unit tests. QUrl in Qt 5 will accept strictly-conforming URLs as expected and will only produce standards-compliant URIs and URLs. The new parser I’ll write is actually closer to what people expect a URL to be. Take this example from the QUrl documentation:

Instead of following the grammar to parse it, I’ll just delimit at the expected boundaries and then try to correct the components as extracted. I mean, I’ll try — we’ll see if I manage or if I need to scrap this method. Hopefully, this will be a faster algorithm.

What does this mean to you? If you were passing QUrl some strict-conforming URIs and URLs, nothing will happen. In fact, it should be 1:1 and give you exactly the same as you gave it. If you had URLs that decoded some percent-encoded characters or UTF-8 sequences without causing it to become ambiguous, QUrl will also still accept your input.

If you had really broken URLs which QUrl accepted and corrected in Qt 4, there’s a good chance that QUrl in Qt 5 will continue to accept and interpret the same way. That’s because the set of unit tests for QUrl is quite extensive and I’ll do my best to keep compatibility.

Finally, if you had really really broken URLs, specially those with broken hostnames, I haven’t decided yet. I will accept some more URLs but, as I said, I consider them undefined behaviour. They may be accepted or they may be rejected — what’s more, the behaviour might change in new versions of Qt.

If your application breaks because of parsing of URLs, please report the bug. I will pay attention to each report. If we can prove that QUrl is failing to comply with the RFC, then the bug is proven and we’ll need to fix Qt. If your input fails to comply, I’ll need convincing arguments why QUrl should accept and correct your input.

PS: ed2k URIs will never be accepted.

Oct 10

Making a Qt Developer Days presentation

First of all, my apologies for not continuing the blog onQUrl yet. For a couple of reasons, I have not been able to continue the work there yet. I will get to it soon.

One of those reasons is the subject of this blog: my Qt Developer Days presentation. This year, the organisers decided to make one double session for all things related to the Qt Project and roadmap, presented by Lars Knoll, Marius Storm-Olsen and myself. With the Munich event is only 2 weeks away from now, the past week and this week has been dedicated mostly to finishing it.

Qt Developer Days is a very high-level event with very good presentations and presenters. Usually the presenters are the trolls: the people actually working on Qt come to present the work they’ve been doing and the technologies they’ve developed and are maintaining. In the past, these people invariably were working for Trolltech or Nokia, but there have always been speakers from other companies like KDAB or ICS. This year, reflecting the opening up of the Qt Project, there will be an unprecedented number of companies and other affiliations represented. My friends at Cutehacks, Woboq, and INdT will be coming.

One of those presenters is me. This will be my fifth year attending and the fourth presenting. Going to Developer Days is a blast, as I get the opportunity to talk not only to other Qt developers, whom I see only a few times a year, but also to customers and users of Qt. Most of the time, people come very open-minded and are eager to learn, so they will pay a lot attention to you. More than that, they try to squeeze every bit of information from you. In 2009 and 2010, I spent my time outside of my presentations basically going around and talking to people, answering their questions or introducing them to others who would have the information they needed.

Dev Days isn’t all joy though: the hardest part is making the presentation. Every year it’s like that: I look forward to Dev Days for about 11 months, then get stressed in the month leading to Munich, which is when I have to make the presentations. For 2008, I had two technical presentations; for 2009 I had one technical and one Qt in Use (Business) presentation. Last year, I had it easy: one co-presentation with my (now) colleague Rusty Lynch and a session discussing the Open Governance, which required only 5 slides. This year, it’s going to be different: it’s a double session, so we have to talk for 2×55 minutes. And coordinating three presenters, living in two different continents, is not easy.

The first thing we did was to figure out what we wanted to talk about, which is when we wrote our abstract. Next, we tried to identify our audience: the people who go to Qt Developer Days are very often commercial customers, who in their majority are writing desktop applications for in-house use. There’s also a good portion of embedded systems developers. And due to Nokia and MeeGo, a growing number of mobile developers. Most importantly for our purpose, the average Dev Days attendee is very interested in upcoming technologies, but not so much in actually contributing. Since we have a double session, we’ve thought of dividing according to such an audience: the average attendee for the first part, pitching a little about contributing, then dedicating the second part to the contribution process and the structure of the Qt project.

That is our current thinking, but with more than two weeks until the actual presentation, we can still change. (Yes, I know the deadline for the presentation was last Friday…)

Personally, the way I do presentations is to create a “wireframe:” I open LibreOffice in the Outline mode and start typing the slide titles, given the abstract and theme of the presentation. I try to apply a couple of techniques I’ve learnt during the years: one is to have the “red thread” — and no, it has nothing to do with multithreading. The idea is to have one common theme that is always present in every slide, building up to compose the bigger picture. Another is called MECE: Mutually Exclusive, Complementary Exhaustive. It means each slide does not overlap with the others in terms of ideas (or worse, contradict), but the grouping of slides forms a whole, with nothing forgotten. Finally, there’s the Pyramid Principle, in which you structure your ideas hierarchically: you construct your argument by having supporting ideas, each with supporting ideas, then you simply “trasverse the graph”.

With your presentation in mind, then you have to put yourself in the audience’s place and mindset. The audience comes to your presentation with a preconception of what you’re going to talk about, which might be right or wrong. In the first few slides, you have to answer the question “What’s in it for me:” why should the audience pay attention to the presentation. If you fail to explain that in the first few slides, when people are still quite attentive, the attention might wander away.

There are techniques to regain the audience’s attention, though. Be very attentive to your voice: do not speak in a monotone. Every now and then, break the cycle, by either blanking the projector or by doing something in which the audience needs to pay attention to you, not the slides. In fact, it’s a good idea not to have much text in your slides because of that: you didn’t come to read the slides to the audience and not having something to drone on, you’re forced to think. If you can, add images, graphs or code to your presentation, but don’t overdo it.

Most importantly, engage with the audience. Make the audience feel that they’re part of the session. So ask questions which cannot be answered by “yes” or “no” — questions starting with “what” or “how” are usually good ones — and give some time to think. And also ask a question before closing a topic and moving on to the next one, to make sure you leave no stragglers behind. If you need to stop, go back, and repeat something, so be it.

Questions during the presentation are usually a matter of taste of the presenter. I prefer to have the questions asked whenever they are most relevant, so I instruct the audience to interrupt me and ask when they feel they need to. I recommend doing that, but paying attention that you don’t spend too much time addressing one person’s needs and lose 99%+ of your audience in the process. Be sure to thank the person for questions (“I’m glad you asked”). If you’ve made a good use of the Pyramid Principle, you’ll probably get questions you’re about to answer anyway.

And at the end, make sure you have left enough time for questions and that your audience knows where to find more information, by repeating links and email address that you might have mentioned during the body of the presentation. Usually, time for questions is not a problem in Dev Days, as a 75-minute slot is quite long and talking for 55 minutes is very hard. But it’s good to plan ahead anyway. And in case there are no questions at the end, you should have one or two up your sleeve that you can ask and answer, or have a ringer do it for you.

I strongly recommend rehearsing with some colleagues a week or two before the event, so you catch opportunities for improvement, where to ask questions, interact, etc. On the actual day of the presentation, make sure your laptop works with the projector, the slides can be read from the back of the room and arrive a few minutes early, to put yourself in the right mind. And then relax and let the presentation speak for itself.

Sep 30

The future of moc

Two days ago, on his blog on some Qt Creator news, Christian wrote:

We’re currently prototyping what would happen if we replaced our own C++ code model with clang’s.

The first comment in the blog was praising that research, but that got me thinking: what do we gain from it? Qt Creator already has a full C++ code model, so what’s the advantage of replacing that with clang’s?

The first thing I thought was reduced maintenance: by using clang’s code, we don’t have to maintain ours. Is that all? What else can we gain from having this code, or the experience in writing it?

Read the rest of this entry »

Sep 20

C++11 support in Qt 5

Marc Mutz posted today a blog calling for immediate support of C++11 in C++-based projects. He also linked to Herb Sutter’s blog saying the standard was unanimously approved in the ISO voting. In his blog, Marc calls for Qt 5 and KDE 5 to require C++11 in the compilers.

I agree with the principle and I’d love to take up on what he’s asking for. But it’s unrealistic for Qt 5.0. Requiring C++11 will take at least one or two more years, when compiler support is a little more widespread. For KDE 5 (or KDE Frameworks), support may come sooner, as the need to support older compilers is smaller.

That said, last month I posted an update to the qt5-feedback mailing list a proposal to update Qt’s use of STL. In particular, I called for the decoupling of the language features from the use of the C++ standard library, putting the emphasis on the embedded systems and cross-compiler compatibility.

I also proposed completely dropping support for compilers that don’t support C++98, especially support for pre-C++98 STL. Of course, we have to be pragmatic, as even in 2011 some mainstream and widely-used C++ compilers fail to support certain advanced C++98 features like template-template parameters (not supported on Sun CC last I checked) or template friends (no support outside of GCC). And, of course, there are compiler bugs or shortcomings we sometimes have no choice but to work around.

STL cannot be used in Qt’s API in a way that it changes the binary compatibility guarantees, but it can be used with Qt as well as used internally. For example, it’s fine to have template code in QVector that supports std::vector and it’s fine to use std::thread (C++11) internally. My submission for the new atomic classes even uses std::atomic where it’s available.

When it comes to C++11 support, I proposed using it to provide access to new / advanced functionality, provided it still respected the limitations above (cannot affect binary compatibility). That is, if you have a C++11 compiler, you’ll get more features, but not having it will not impede your use of Qt. And you’ll be using the same Qt build that someone with an older compiler uses. At least, for the time being.

One feature I’d like to see supported only in C++11 is the new signal-slot connection syntax. C++11 allows us to properly write perfect-forwarding of an arbitrary number of parameters, whereas C++98 requires putting an upper limit on the parameter count and jumping through hoops for those few. Trust me, I’ve done the research and you can read the works of others who have tried to do perfect forwarding in C++98.

If we can find a couple more such interesting features, our users will start using C++11 and start requiring those features from their compiler vendors. In summary, I don’t think we can mandate C++11 just yet, but we can make it attractive enough that people will want it. So let’s not be afraid of C++11, let’s drive its adoption!

Older posts «

» Newer posts

Page optimized by WP Minify WordPress Plugin