Previous Post | Top | Next Post |
TOC
This was originally written and created around 2013 and may require to be updated. (2021)
GCC
The gccintro
package provides a good tutorial “Introduction to GCC by Brian
J. Gough” for the GCC basics to compile C programs.
GCC version
Check gcc version and defaults:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.8.1-9' --with-bu...
Thread model: posix
gcc version 4.8.1 (Debian 4.8.1-9)
Basic options
Basic GCC syntax from the top few lines of its manpage:
gcc [-c|-S|-E] [-std=standard]
[-g] [-pg] [-Olevel]
[-Wwarn...] [-pedantic]
[-Idir...] [-Ldir...]
[-Dmacro[=defn]...] [-Umacro]
[-foption...] [-mmachine-option...]
[-o outfile] [@file] infile...
The manpage for gcc is too long. Here are the part I should remember.
-c
: preprocess=Yes compile=Yes assemble=Yes link=No-S
: preprocess=Yes compile=Yes assemble=No link=No-E
: preprocess=Yes compile=No assemble=No link=No-std=standard
: specify standard conformance- C :
-ansi
is-std=c89
, default is-std=gnu89
- C++ :
-ansi
is-std=c++98
, default is-std=gnu++98
- C :
-Wall
: enables all the warnings.-pedantic
: warnings by strict ISO C and ISO C++ conformance-g
: produce debug information forgdb
(1)-pg
: produce extra code forgprof
(1)-O0
: no optimization-O1
: some optimization-O2
: lots of optimization-O3
: yet more optimization-I<dir>
: search the directory for header files-L<dir>
: search the directory for library files-l<library>
: search the library when linking-D<macro>[=<defn>]
: predefine the macro ( or 1)-U<macro>
: undefine the macro-f<option>
: set the machine-independent flag-m<machine-option>
: set the machine-dependent flag-o<outfile>
: output in the file@<file>
: read command-line options from .-v
verbose output. (list defined symbols etc.)-Q
compiler print out each function name etc.-Wp,<option>
: pass option as an directly to the preprocessor.-Wa,<option>
: pass option as an directly to the assembler.-Wl,<option>
: pass option as an directly to the linker.-fpic
: generate position-independent code (smaller code)-fPIC
: generate position-independent code (larger code)-fpie
: generated code for position independent executable (smaller code)-fPIE
: generated code for position independent executable (larger code)-pie
: generated position independent executable (smaller code, linker option)-PIE
: generated position independent executable (larger code, linker option)
Please note that gcc
uses no space after the command switch and a single leading -
even for long option.
The current C defualt is -std=gnu90
which is GNU dialect of ISO C90 including some C99 features.
The current C++ defualt is -std=gnu++98
which is GNU dialect of 1998 ISO C++ standard plus amendments including some C++11 features.
TIP: The meaning of inline
in C is different between the default -std=gnu90
and the rest of the world (-std=gnu99|-std=c99|...
). See An Inline Function is As Fast As a Macro.
Assembler code
The GCC with the -S
option produces the assembler code output written in the
AT&T assembler style as shown in the “Hello World!” example.
It is not so difficult to grock roghly what the GCC generated assembler code does. (Writing some code in the assembler from scratch requires serious knowledge.)
Some basic register names, command mnemonic names, and command mnemonic suffix conventions need to be noted.
- Command mnemonic names and quasi-C equivalents:
- “
mov op1, op2
” : “op2 = op1
” - “
mov (op1), op2
” : “op2 = *op1
” - “
lea (op1), op2
” : “op2 = op1
” (load effective address) - “
add op1, op2
” : “op2 += op1
” - “
sub op1, op2
” : “op2 -= op1
” - “
test op1, op2
” : set flags based on “op2 & op1
” - “
cmp op1, op2
” : set flags based on “op2 - op1
” - “
call op1
” : call function atop1
** Push the next instruction address%rip + 2
to the stack and jump to theop1
address - “
ret
” : return to the callee procedure ** Pop the next instruction address%rip
from the stack. - “
jmp op
” : jump unconditional toop
- “
je op
” : jump equal to toop
- “
jne op
” : jump not-equal toop
- “
jg op
” : jump greater toop
- “
jge op
” : jump greater-or-equal toop
- “
jl op
” : jump less toop
- “
jle op
” : jump less-or-equal toop
- “
jz op
” : jump zero toop
- “
jnz op
” : jump non-zero toop
- “
- Command mnemonic suffix indicating data width:
- “
b
” : 8 bit (byte) - “
w
” : 16 bit (word) - “
l
” : 32 bit (long) - “
q
” : 64 bit (quadruple word)
- “
- Command mnemonic arguments:
- “
$
” : immediate value following - “
Offs(Base,Index,Scale)
” : the value stored at the addressBase + Index * Scale + Offs
(where,Scale
= 1, 2, 4, 8).
- “
- Register names and their data width:
- 64 bit:
%rax
,%rbx
,%rcx
,%rdx
,%rdi
,%rsi
,%rbp
,%rsp
, … - 32 bit:
%eax
,%ebx
,%ecx
,%edx
,%edi
,%esi
,%ebp
,%esp
, …
- 64 bit:
TIP: “mov op1, op2
” moves data “op1
-> op2
” in the AT&T assembler style
(GCC default); while “mov op1, op2
” moves data “op1
<- op2
” in the Intel
assembler style (NASM default). These are in the opposite order.
Examples of assembly codes
AT&T | Intel | quasi-C |
---|---|---|
movq $0x12345678, %rax |
mov rax, 12345678h |
rax = 0x12345678 |
movq $0xff, %rax |
mov rax, 0ffh |
rax = 0xff |
movq -8(%rbp), %rax |
mov rax, [rbp-8] |
rax = *(rbp - 8) |
movq -0x10(%rbp, %rdx, 8), %rax |
mov rax, [rbp+rdx*8-10h] |
rax = *(rbp + rdx * 8 - 0x10) |
movq (%rcx), %rax |
mov rax, [rcx] |
rax = *(rcx) |
movq %rcx, %rax |
mov rax, rcx |
rax = rcx |
leaq 8(,%rcx,8), %rax |
lea rax, [rcx*8+8] |
rax = rcx * 8 + 8 |
leaq (%rbx,%rcx,4), %rax |
lea rax, [rbx+rcx*4] |
rax = rbx + rcx * 4 |
Some basic 64-bit (= 8 bytes) integer ABI conventions under the x86-64
(amd64
) Linux need to be noted.
- registers for function call return values
- “
%rax
” : the 1st function return integer value (a.k.a. accumulator register) - “
%rdx
” : the 2nd function return integer value (a.k.a. data register) - “
%xmm0
” : the 1st function return double precision floating point value (128-bit SSE2 register) - “
%xmm1
” : the 2nd function return double precision floating point value (128-bit SSE2 register)
- “
- registers for managing the stack
- “
%rsp
” : the stack pointer to the top of the stack. (%rsp
a.k.a. stack pointer register) - “
%rbp
” : the frame pointing to the base of the stack frame. (%rbp
a.k.a. stack base pointer register)
- “
- temporary registers
- “
%r10
”, “%r11
”, “%xmm8
” - “%xmm15
”
- “
- callee-saved registers
- “
%rbx
”, “%r12
” - “%r15
”
- “
- registers to pass function arguments
- “
%rdi
” : the 1st function argument integer passed (a.k.a. destination index register) - “
%rsi
” : the 2nd function argument integer passed (a.k.a. source index register) - “
%rdx
” : the 3rd function argument integer passed (a.k.a. data register) - “
%rcx
” : the 4th function argument integer passed (a.k.a. counter register) - “
%r8
” : the 5th function argument integer passed - “
%r9
” : the 6th function argument integer passed - “
%xmm2
” - “%xmm7
” : the function argument floating point passed
- “
- stack data usages (the stack grows from the high address to the low address)
- stack “
*(%rbp + 8*(n-5))
” : the last (n-th) function argument passed - …
- stack “
*(%rbp + 16)
” : the 7th function argument passed - stack “
*(%rbp + 8)
” : the function return address - stack “
*(%rbp)
” : the old%rbp
value (%rbp
: frame pointer) - stack “
*(%rbp - 8)
” : the 1st local variable - stack “
*(%rbp - 16)
” : the 2nd local variable - stack “
*(%rbp - 24)
” : the 3rd local variable - …
- stack “
*(%rsp)
” : the top local variable (%rsp
: stack pointer)
- stack “
There are some memory alignment requirements of x86-64 under GCC/Linux.
- 8-byte aligned: long, double, pointer
- 16-byte aligned: SSE2 instructions, long double
TIP: These register usages and function call conventions are architecture and OS specific. For example, i386 passes all function arguments in the stack by pushing them in the right-to-left order.
TIP: There are some strange situation on fdivp
and fdivrp
:
Debian Bug #372528: as/i386: reverses fdivp and fdivrp
String in C function
This tricky problem of string in C function becomes simple when you inspect the code under the assembler.
Here is a C code string-array.c
which manipulates a string.
string-array.c with “char[]”
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
char foo[] = "abcdefgh";
printf("Before foo[] = '%s'\n\n", foo);
foo[3] = '@';
printf("After foo[] = '%s'\n\n", foo);
return EXIT_SUCCESS;
}
This string-array.c
compiles fine and runs without problem.
Compile and run of string-array.c
$ gcc -o string-array string-array.c
$ ./string-array
Before foo[] = 'abcdefgh'
After foo[] = 'abc@efgh'
Here is a similar looking buggy C code string-pointer.c
.
string-pointer.c with “char *”
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
char* bar = "abcdefgh";
printf("Before bar* = '%s'\n\n", bar);
bar[3] = '@';
printf("After bar* = '%s'\n\n", bar);
return EXIT_SUCCESS;
}
This string-pointer.c
compiles fine but fails to run.
Compile and run of string-pointer.c
$ gcc -o string-pointer string-pointer.c
$ ./string-pointer
Segmentation fault
This reason can be elucidated by looking into their assembler codes by
compiling with the -S
option.
Assembler code from string-array.c
$ gcc -S string-array.c
$ cat string-array.s
.file "string-array.c"
.section .rodata
.LC0:
.string "Before foo[] = '%s'\n\n"
.LC1:
.string "After foo[] = '%s'\n\n"
.text
.globl main
.type main, @function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movabsq $7523094288207667809, %rax
movq %rax, -16(%rbp)
movb $0, -8(%rbp)
leaq -16(%rbp), %rax
movq %rax, %rsi
movl $.LC0, %edi
movl $0, %eax
call printf
movb $64, -13(%rbp)
leaq -16(%rbp), %rax
movq %rax, %rsi
movl $.LC1, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (Debian 4.8.1-9) 4.8.1"
.section .note.GNU-stack,"",@progbits
Here, upon execution of main function, the stack space for storing data[]
is
dynamically secured and the value of “abcdefgh
” is stored into the stack
space by the somewhat obfuscated assignment operation as below:
movl $1684234849, -16(%rbp)
movl $1751606885, -12(%rbp)
- local data: low address = address pointed by
%rbp
with offset -16- 1684234849 = 0x64636261
- ,, =
'd'
* 0x1000000 +'c'
* 0x10000 +'b'
* 0x100 +'a'
- local data: high address = address pointed by
%rbp
with offset -12- 1751606885 = 0x68676665
- ,, =
'h'
* 0x1000000 +'g'
* 0x10000 +'f'
* 0x100 +'e'
Please note x86-64 (=amd64) is little endian architecture (LSB first memory mapping) thus 'a'
= 0x61 comes first in the stack.
Assembler code from string-pointer.c
$ gcc -S string-pointer.c
$ cat string-pointer.s
.file "string-pointer.c"
.section .rodata
.LC0:
.string "abcdefgh"
.LC1:
.string "Before bar* = '%s'\n\n"
.LC2:
.string "After bar* = '%s'\n\n"
.text
.globl main
.type main, @function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movq $.LC0, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, %rsi
movl $.LC1, %edi
movl $0, %eax
call printf
movq -8(%rbp), %rax
addq $3, %rax
movb $64, (%rax)
movq -8(%rbp), %rax
movq %rax, %rsi
movl $.LC2, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (Debian 4.8.1-9) 4.8.1"
.section .note.GNU-stack,"",@progbits
Here, the value of “abcdefgh
” is stored in the section marked as .rodata
,
i.e., read-only. So the ./string-pointer
command tries to overwrite this
read-only data and causes segmentation error.
This execution time error can be moved to compilation time error by adding
“const
” to the line defining the string.
Compile error for string-const-pointer.c with “const char *”
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
const char* bar = "abcdefgh";
printf("Before bar* = '%s'\n\n", bar);
bar[3] = '@';
printf("After bar* = '%s'\n\n", bar);
return EXIT_SUCCESS;
}
This string-const-pointer.c
fails to compile.
Compile error of string-const-pointer.c
$ gcc -o string-const-pointer string-const-pointer.c
string-const-pointer.c: In function ‘main’:
string-const-pointer.c:8:5: error: assignment of read-only location ‘*(bar + 3u)’...
bar[3] = '@';
^
Buffer overflow protection
Enabling macro _FORTIFY_SOURCE
with -D
option substitutes high risk
functions in the GNU libc
library to protect against the buffer overflow
risk. This requires gcc
to be run with -O1
or higher optimization. This
works on all CPU architectures as long as the source code is linked to the GNU
libc
library.
-D_FORTIFY_SOURCE=2 -O2
GCC’s Stack Smashing Protector (SSP) to protect against the buffer overflow risk of unknown cause was developed by IBM and originally called ProPolice. This only works on some CPU architectures. SSP can be enabled by the GCC flag:
- SSP on for all:
-fstack-protector-all
- SSP on:
-fstack-protector
- SSP off:
-fno-stack-protector
Let’s try these compiler options using an example bof.c
code having the buffer overflow risk.
bof.c with the buffer overflow risk:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define DESTLEN 8
int main(int argc, char** argv)
{
char dest[DESTLEN];
if (argc == 2) {
printf(">>> Before the possible buffer over flow >>>\n");
strcpy(dest, argv[1]);
printf("<<< After the possible buffer over flow <<<\n");
} else {
fprintf(stderr,"Usage: %s ARG\n", argv[0]);
fprintf(stderr," Length(ARG) < %i bytes\n", DESTLEN);
exit(EXIT_FAILURE);
}
return EXIT_SUCCESS;
}
Buffer overflow protection: None
$ gcc -fno-stack-protector -o bof-unsafe bof.c
$ ./bof-unsafe "0123456789"
>>> Before the possible buffer over flow >>>
<<< After the possible buffer over flow <<<
Buffer overflow protection: -D_FORTIFY_SOURCE=2
$ gcc -D_FORTIFY_SOURCE=2 -O2 -o bof-fortify bof.c
$ ./bof-fortify "0123456789"
*** buffer overflow detected ***: ./bof-fortify terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x2aaaaadcbd17]
/lib/x86_64-linux-gnu/libc.so.6(+0xfbcd0)[0x2aaaaadcacd0]
./bof-fortify[0x400578]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2aaaaacf0995]
./bof-fortify[0x4005f5]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fe:01 3146660 /path/to...
00600000-00601000 rw-p 00000000 fe:01 3146660 /path/to...
01231000-01252000 rw-p 00000000 00:00 0 [heap]
2aaaaaaab000-2aaaaaacc000 r-xp 00000000 fe:01 655581 /lib/x86...
2aaaaaacc000-2aaaaaad0000 rw-p 00000000 00:00 0
2aaaaaafa000-2aaaaaafc000 rw-p 00000000 00:00 0
2aaaaaccc000-2aaaaaccd000 r--p 00021000 fe:01 655581 /lib/x86...
2aaaaaccd000-2aaaaaccf000 rw-p 00022000 fe:01 655581 /lib/x86...
2aaaaaccf000-2aaaaae71000 r-xp 00000000 fe:01 656361 /lib/x86...
2aaaaae71000-2aaaab071000 ---p 001a2000 fe:01 656361 /lib/x86...
2aaaab071000-2aaaab075000 r--p 001a2000 fe:01 656361 /lib/x86...
2aaaab075000-2aaaab077000 rw-p 001a6000 fe:01 656361 /lib/x86...
2aaaab077000-2aaaab07b000 rw-p 00000000 00:00 0
2aaaab07b000-2aaaab090000 r-xp 00000000 fe:01 655396 /lib/x86...
2aaaab090000-2aaaab290000 ---p 00015000 fe:01 655396 /lib/x86...
2aaaab290000-2aaaab291000 rw-p 00015000 fe:01 655396 /lib/x86...
7fff5d517000-7fff5d538000 rw-p 00000000 00:00 0 [stack]
7fff5d5d7000-7fff5d5d9000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscal...
>>> Before the possible buffer over flow >>>
Aborted
Buffer overflow protection: -fstack-protector --param=ssp-buffer-size=4
$ gcc -fstack-protector --param=ssp-buffer-size=4 -o bof-safe bof.c
$ ./bof-safe "0123456789"
*** stack smashing detected ***: ./bof-safe terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x2aaaaadcbd17]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x2aaaaadcbce0]
./bof-safe[0x400732]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2aaaaacf0995]
./bof-safe[0x4005b9]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fe:01 3146663 /path/to...
00600000-00601000 rw-p 00000000 fe:01 3146663 /path/to...
00d82000-00da3000 rw-p 00000000 00:00 0 [heap]
2aaaaaaab000-2aaaaaacc000 r-xp 00000000 fe:01 655581 /lib/x86...
2aaaaaacc000-2aaaaaad0000 rw-p 00000000 00:00 0
2aaaaaafa000-2aaaaaafc000 rw-p 00000000 00:00 0
2aaaaaccc000-2aaaaaccd000 r--p 00021000 fe:01 655581 /lib/x86...
2aaaaaccd000-2aaaaaccf000 rw-p 00022000 fe:01 655581 /lib/x86...
2aaaaaccf000-2aaaaae71000 r-xp 00000000 fe:01 656361 /lib/x86...
2aaaaae71000-2aaaab071000 ---p 001a2000 fe:01 656361 /lib/x86...
2aaaab071000-2aaaab075000 r--p 001a2000 fe:01 656361 /lib/x86...
2aaaab075000-2aaaab077000 rw-p 001a6000 fe:01 656361 /lib/x86...
2aaaab077000-2aaaab07b000 rw-p 00000000 00:00 0
2aaaab07b000-2aaaab090000 r-xp 00000000 fe:01 655396 /lib/x86...
2aaaab090000-2aaaab290000 ---p 00015000 fe:01 655396 /lib/x86...
2aaaab290000-2aaaab291000 rw-p 00015000 fe:01 655396 /lib/x86...
7fff36900000-7fff36921000 rw-p 00000000 00:00 0 [stack]
7fff369fe000-7fff36a00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscal...
>>> Before the possible buffer over flow >>>
<<< After the possible buffer over flow <<<
Aborted
Buffer overflow protection: -fstack-protector-all
$ gcc -fstack-protector-all -o bof-safest bof.c
$ ./bof-safest "0123456789"
*** stack smashing detected ***: ./bof-safest terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x2aaaaadcbd17]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x2aaaaadcbce0]
./bof-safest[0x400732]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x2aaaaacf0995]
./bof-safest[0x4005b9]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fe:01 3146665 /path/to...
00600000-00601000 rw-p 00000000 fe:01 3146665 /path/to...
01c1b000-01c3c000 rw-p 00000000 00:00 0 [heap]
2aaaaaaab000-2aaaaaacc000 r-xp 00000000 fe:01 655581 /lib/x86...
2aaaaaacc000-2aaaaaad0000 rw-p 00000000 00:00 0
2aaaaaafa000-2aaaaaafc000 rw-p 00000000 00:00 0
2aaaaaccc000-2aaaaaccd000 r--p 00021000 fe:01 655581 /lib/x86...
2aaaaaccd000-2aaaaaccf000 rw-p 00022000 fe:01 655581 /lib/x86...
2aaaaaccf000-2aaaaae71000 r-xp 00000000 fe:01 656361 /lib/x86...
2aaaaae71000-2aaaab071000 ---p 001a2000 fe:01 656361 /lib/x86...
2aaaab071000-2aaaab075000 r--p 001a2000 fe:01 656361 /lib/x86...
2aaaab075000-2aaaab077000 rw-p 001a6000 fe:01 656361 /lib/x86...
2aaaab077000-2aaaab07b000 rw-p 00000000 00:00 0
2aaaab07b000-2aaaab090000 r-xp 00000000 fe:01 655396 /lib/x86...
2aaaab090000-2aaaab290000 ---p 00015000 fe:01 655396 /lib/x86...
2aaaab290000-2aaaab291000 rw-p 00015000 fe:01 655396 /lib/x86...
7ffffde13000-7ffffde34000 rw-p 00000000 00:00 0 [stack]
7ffffdffe000-7ffffe000000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscal...
>>> Before the possible buffer over flow >>>
<<< After the possible buffer over flow <<<
Aborted
- http://en.wikipedia.org/wiki/Buffer_overflow_protection
- GCC extension for protecting applications from stack-smashing attacks (IBM Research, August 22, 2005)
- http://www.ipa.go.jp/security/awareness/vendor/programmingv2/contents/c904.html (Japanese: Part of secure programing course by IPA, Japanese government funded.)
- “Introduction to GCC by Brian J. Gough” in the gccintro package.
Previous Post | Top | Next Post |