prologue
CSAPP的笔记。不完整,只记录大概听懂的和新学到的东西。
lecture 2
unsigned
和signed
作比较的时候,signed
会转为unsigned
,可能会导致结果错误
int x = -1;
unsigned int y = 0;
if(x > y) // 结果为真,-1的二进制是全1,转为unsigned,就大于0
lecture 3
int x = -3;
cout << (x >> 1); // (++x >> 1)
结果是-2
,是错误的,负数用移位做除法时应该加上一个偏移,加1
,这时候结果就对了
lecture 4 floating point
- 浮点数的表示,
IEEE
的标准,最高一位是符号位,之后是exp
指数,然后是frac
尾数。 exp
全0
为denormalized
,非全0
为normalized
,前者frac
有隐藏的前导0
,后者frac
有隐藏的前导1
。exp=原来的指数e+bias
,使得0
表示最小的指数,bias
与sizeof float/double
规模有关。- 越靠近
0
越密集,越往外越稀疏。 - 浮点数的默认舍入是
nearest even
,正好在中间的数舍入到最近的偶数上,其他的正常舍入 - 如果有很大的数和很小的数做运算,那么浮点数运算顺序可能影响结果,可能不符合结合律
lecture 5 machine level programing
- 汇编代码中以点开头的语句不是指令,是给调试器或链接器看的
intel
语法一般前面是dest
后面是src
,AT&T
语法一般前面是src
后面是dest
movq
指令中的q
是quad
,4字,8字节,一个字是2字节(Intel
历史8086
,16位2字节)
(%rdx, %rcx) 这个寻址就是 %rdx + %rcx 作为地址
(%rdx, %rcx, 4) 这个寻址就是 %rdx + 4 * %rcx 作为地址
0x80(, %rdx, 2) 这个寻址就是 2 * %rdx + 0x80 作为地址
- 编译器经常使用
lea
指令配合移位指令来做乘法,因为乘法指令比较慢
Linux-64传参
- 从左往右,第一个参数
RDI
,之后,RSI, RDX, RCX, R8, R9
,更多的用栈。 - 浮点数会使用其他特定的寄存器见lecture-8
lecture 6 machine level programing
registers
CF carry flag (for unsigned)
OF overflow flag (for signed)
两个操作数符号相同但是结果符号变了,就是溢出,两个操作数符号不一样就不会溢出
set
系列指令,根据上一条指令修改的标志寄存器来给指定的寄存器赋值0或1,比如
cmpq %rsi, %rdi compare指令会修改flags寄存器
setg %al setg指令根据flags寄存器来给al赋值
movzbl %al, %eax move zero extension byte to long, 把al的值赋给%eax,左边的扩展为0。这里就是把左边清零。
ret x64下,32位的结果,寄存器高32位都是0,但是2字节的就只影响2字节,这是AMD定的规矩
switch case语句
- 汇编会生成一个表(
jump table
使得 $O(1)$ 时间找到对应的case
)JTab[]
对应所有case
,如果case
中有负数,会加上一个bias使得最小的case是0 - 如果若干
case
间隔很大又很稀疏,那么汇编会变成if-else
, 多的话会使用BST数据结构,使得查找时间是 $log$ 级别
lecture 7 machine level programing
- ABI标准,application binary interface,针对各个操作系统
- r10, r11 caller saved
- rbx, r12, r13, r14, rbp callee saved, push to stack before altering, restore before returning
lecture 8 machine level programing
struct
元素在内存里都是8字节或4字节对其的,所以可能有对其空间浪费了,使用贪心的声明可以最小化这种浪费,即先声明大的- 浮点数寄存器传参,使用
XMM
寄存器,%xmm0, %xmm1...
,结果保存在%xmm0
- 所有的
XMM
寄存器都是caller saved - MMX(Multi Media eXtensions)多媒体扩展指令集
- SIMD(Single Instruction Multiple Data,单指令多数据)
- SSE is short for Streaming SIMD Extension
- Advanced Vector Extensions (Intel AVX) AVX2 AVX512 浮点性能翻倍,整数性能增加33%
- 引入新寄存器,XMM YMM AVX512也引入新寄存器
lecture 9 machine level programing
- x64 linux memeroy 43bit physical, 48bit virtual
- canary,64位
%fs:0x28
其最低字节是00,off-by-one也没事 - 32位的是从
%gs:0x14
这里取canary,这个都是TLS结构体中的 - canary最后一个字节是0,防止使用类似
printf
的函数把canary泄露出来,打印的时候遇到canary最低字节的00就停止了,后面的内容不会被打印出来 union
用法类似struct
,struct
会给所有成员分配空间,但是union
成员,分配的空间是最大成员的空间,所有人共用这个空间,修改一个成员的值,内存就变了,其他成员也变了。实际使用很少,一个使用场景是需要两个类型中的一个,有的需要类型A,有的需要类型B
lecture 10 program optimization
- memory aliasing、function call are optimization blocker
- 循环里面
b[i] += a[i]
类似的编译器不会优化,因为不确定有没有内存的重叠,b
修改有可能影响到a
,所以汇编代码就会循环读内存,加,写内存,引入局部变量后,就可以省略掉循环写内存的步骤。因为还是需要循环读内存所以优化并不明显。
lecture 11 the memory hierarchy
- hard disk drive, platters盘片, arm, spindle主轴, track磁道, sectors扇区 separated by gaps 现在的磁盘,靠外的磁道有更多的扇区,减少空间浪费
- 现在
IO
使用PCI express
,之前使用PCI
总线,PCI
是广播的,所有设备共用BUS
,所有设备都能看到BUS
上的数据。PCIE
是点对点的,由某种开关仲裁 - CPU访问磁盘时,发送三元组,(command, logical block number, memory address);磁盘控制器将逻辑block number转为物理扇区, 读取数据通过BUS送给内存,完成之后DMA产生中断通知CPU
- locality程序的局部性,局部性越好,程序性能越好
lecture 12 cache memory
- 计组
cache
部分,直接映射,全相联,组相联
如何提高程序的局部性
1. focus on the inner loops, where bulk of computations and memory accesses occur
2. try to maximize spatial locality by reading data objects with sequentially with stride 1
3. try to maximize temporal locality by using a data object as often as possible once it’s read from memory
lecture 13 linking
- BSS段, ELF文件在磁盘上,未初始化的全局变量是不占磁盘空间的,但是加载到内存里要占空间
- 编译器使用的是偏移,不知道真实的地址是多少
- 需要linker重定向的东西编译器使用0占位,并留下一个relocation entry
- call的时候linker填的是一个偏移,RIP/EIP加上这个偏移就是实际要call的地址,PC-relative addressing
- ELF file 结构看PPT
- 几种不同的symbol, symbol分强弱,函数名、初始化的全局变量是强的,未初始化的是弱的
- linker不允许重名的强symbol, 一个 strong 多个weak, 选strong, 多个weak随机选一个
- 函数内声明
static
的数据不在栈上,在data
段
why linkers? one reason is that
- 改变一个,只需要重新编译一个文件,然后重新链接即可,不用重新编译其他的
what do linkers do?
- symbol resolution
1.programs define and reference symbols (global variables and functions)
2.symbol definitions are stored in object file (by assembler) in symbol table
symbol table is an array of structs, each entry includes name, size and location of symbol
3.during symbol resolution step, the linker associates each symbol reference with exactly one symbol definition - relocation
1.merges separate code and data sections into single section
2.relocate symbols from their relative locations in the .o files to their final absolute memory locations
in the executable
3.updates all references to these symbols to reflect their new positions
Linux内存布局,简化的
kernel
user stack, 向下增长
memory-mapped region for shared libraries
run-time heap
data segment(.data .bss)
read-only code segment 0x400000 start
unused 0
- interposition 技术,类似钩子,可以用于调试
lecture 14 exceptional control flow
- kernel是永久驻留在内存中的部分
- 故意的异常就是陷阱,最常见的陷阱就是系统调用,控制权转交给内核
- linux system call, always check return value unless return void
程序员视角可以把进程状态看作有3种
- running, either executing or waiting to be executed and will eventually be scheduled
- stopped, suspended and will not be scheduled until further notice
- terminated, stopped permanently
创建进程
int fork(void)
- returns 0 to the child process, child’s pid to parent process
- child is almost identical to parent:
- child get an identical but separate copy of the parent’s virtual address space
- child get copies of the parent’s open file descriptors
pid = fork();
if(!pid){//child;}
//parent 谁先执行是不确定的,内核说了算
- 子进程完成之后变成僵尸进程,显示
defunct
,父进程可以使用wait
或waitpid
回收(reap)子进程 - 父进程不回收的话,
pid
为1的init
进程就会回收孤儿进程。 - 长时间运行的程序需要回收子进程,不然僵尸进程太多,占用太多内存,造成内存泄漏
- 常见用法,
fork
子进程,然后子进程去execve
一个程序
lecture 15 exceptional control flow: signals
init
进程是第一个启动的进程,pid=1
signals
- a signal is a small message that notifies a process that an event of some type has occurred in the system
- similar to exceptions and interrupts
- sent from the kernel (sometimes at the request of another process) to a process
- signal type is identified by small integer ID’s
- the only information in a signal is its ID and the fact that it arrived
ID | Name | Default Action | Corresponding Event |
---|---|---|---|
2 | SIGINT | Terminate | user typed ctrl+c |
9 | SIGKILL | Terminate | kill program (cannot override or ignore) |
11 | SIGSEGV | Terminate | segmentation violation |
14 | SIGALRM | Terminate | timer signal |
17 | SIGCHLD | Ignore | child stopped or terminate |
sending a signal
kernel sends a signal to a destination process by updating some state in the context (进程空间上下文) of the destination process
receiving a signal
- a destination process receives a signal when it is forced by the kernel to react in some way to the delivery of the signal
- some possible ways to react:
- ignore the signal
- terminate the process (with optional core dump)
- catch the signal by executing a user-level function called
signal handler
pending and blocked signals
- a signal is pending if sent but not yet received
- there can be at most 1 pending signal of any type
- signals are not queued, if a process has a pending signal of type k, the subsequent signals of type k that are sent to that process are discarded
- a process can block the receipt of certain signals
- blocked signals can be delivered, but will not be received until the signal is unblocked
- a pending signal is received at most once
pending/blocked bits
- kernel maintains pending and blocked bit vectors in the context of each process
- pending: represents the set of pending signals
- kernel sets bit k in pending when a signal of type k is delivered
- kernel clears bit k in pending when a signal of type k is received
- blocked: represents the set of blocked signals
- can be set and cleared by using the
sigprocmask
function - also referred to as the
signal mask
- can be set and cleared by using the
- pending: represents the set of pending signals
sending signals: process groups
- every process belongs to exactly one process group
getpgrp()
返回当前进程的pgidsetpgid()
change pgid of a process
sending signals with /bin/kill program
/bin/kill
program sends arbitrary signal to a process or process groupkill
这个程序并不一定总是杀死某个进程- examples:
/bin/kill -9 24818
send SIGKILL to process 24818/bin/kill -9 -24817
send SIGKILL to every process in process group 24817
sending signals from the keyboard
- typing ctrl-c (ctrl-z) causes the kernel to send a SIGINT (SIGTSTP) to every job in the foreground process group
- SIGTSTP, default action is to stop (suspend) each process
receiving signals
- suppose kernel is returning from an exception handler and is ready to pass control to process p
- kernel computes
pnb = pending & ~blocked
- the set of pending nonblocked signals for process p
if(pnb == 0)
- pass control to next instrution in the logical flow for process p
else
- choose least nonzero bit k in
pnb
and force process p to receive signal k - the receipt of the signal triggers some action by p
- repeat for all nonzero bit in
pnb
- pass control to next instrution in the logical flow for process p
- choose least nonzero bit k in
default action
- 每个信号都有默认的动作,这个默认的动作可以使用下面这个函数修改
handler_t *signal(int signum, handler_t *handler)
- 参数的
handler
可以是SIG_IGN
(ignore)或SIG_DFL
(恢复default)内部自带的,也可以是自己写的函数的地址
blocking & unblocking signals
- implicit blocking mechanism, kernel blocks any pending signals of type currrently being handled
- explicit blocking and unblocking mechanism,
sigprocmask()
function - supportting functions,
sigemptyset()
,sigfillset()
,sigaddset()
,sigdelset()
safe signal handling
- handlers are tricky because they are concurrent with main program and share the same global data structures, shared data structures can become corrupted. below are some guidelines.
- keep your handlers as simple as possible
- call only async-signal-safe functions in your handlers
- save and restore errno on entry and exit
- protect accesses to shared data structures by temporarily blocking all signals
- declare global variables as volatile, to prevent compiler from storing them in a register
- declare global flags as volatile sig_atomic_t
后面听不懂了。
lecture 16 system level IO
unix IO overview
- a linux file is a sequence of bytes
- all IO devices are represented as files
- even the kernel is represented as a fiel
- elegant mapping of files to devices allows kernel to export simple interface called Unix IO:
- opening and closing files,
open()
andclose()
- reading and writing a file,
read()
andwrite()
- changing the current file position (seek). file position indicates next offset into file to read or write
- opening and closing files,
file types
- each file has a type indicating its role in the system
- regular file, contains arbitrary data
- directory, index for a related group of files
- socket, for communicating with a process on another machine
- other file types beyond our scope
- named pipes
- symbolic links
- character and block devices
regular files
- contains arbitrary data
- applications often distinguish between text files and binary files, but the kernel doesn’t know the difference
- newline char, for linux and Mac OS,
\n(0xa)
, line feed, LF; windows and Internet protocols,\r\n(0xd 0xa)
, carriage return, CR, followed by line feed, LF
directories
- directories consists of an array of links, each link maps a filename to a file
- each directory contains at least two entries,
.
is a link to itself, and..
is a link to parent directory
opening & closing files
fd = open(path, format)
, format is predifined, bitwiseretval = close(fd)
, 关闭一个已经关掉的文件就会出错,返回值小于0
reading & writing files
nbytes = read(fd, buf, sizeof buf)
nbytes = write(fd, buf, sizeof buf)
这两个系统调用很费时
file descriptor table for each process
open file table, shared
IO redirection
- how does a shell implement IO redirection
- by calling
dup2(oldfd, newfd)
,dup2(4, 1)
标准输出的重定向,fd table中fd4会被复制到fd1中实现重定向
standard IO
- C library contains a collection of higher-level standard io functions
fopen;fclose;fread;fwrite;fgets;fputs;fscanf;fprintf
buffering in std io
- std io functions use bufferd io
- 如果多次调用
printf("h");printf("e");....fflush(stdout);
,最终只会有1次系统调用。 - 追踪系统调用可以用
strace
命令
- 如果多次调用
几个IO之间
- 最基本的IO用于底层,比如处理信号
- stdio适合文件和终端,不适合网络,网络需要自己写IO。老教授的合著者写了个RIO
lecture 17 virtual memory
page table
- a page table is an array of page table entries (PTEs) that maps virtual pages to physical pages
- each process has a page table
- 地址转换如OS课程所讲
lecture 18 virtual memory
这节课视频不是PPT录屏,看不清,听不懂,跳过
lecture 19 & 20 dynamic memory allocation
malloc package
malloc
申请的空间在32位系统中8字节对齐,64位系统中16字节对齐
教授更多讲的是如何管理内存,就是自己写malloc
和free
,并没有直接讲glibc malloc
。glibc malloc
可以看sploitfun写的,prologue提到的。
c pointer declarations
int (*(*f())[13])()
David O'Hallaron: if you ever use anything like that in your code, shame on you!
lecture 21 22 network programming
之前写的分享c/c++_socket
lecture 23 concurrent programming
approaches for writing concurrent servers
- process-based
- kernel automatically interleaves(交织) multiple logical flows
- each flow has its own private address space
- event-based
- programmer manually interleaves multiple logical flows
- all flows share the same address space
- uses technique called
I/O multiplexing
- thread-based
- kernel automatically interleaves multiple logical flows
- each flow shares the same address space
- hybrid of process-based and event-based
process-based concurrent echo server
int main(int argc, char **argv)
{
int listenfd, connfd;
socklen_t clientlen;
struct sockaddr_storage clientaddr;
Signal(SIGCHLD, sigchld_handler);
listenfd = Open_listenfd(argv[1]);
while(1)
{
clientlen = sizeof(struct sockaddr_storage);
connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
if(Fork() == 0)
{
Close(listenfd); // child closes its listening socket
echo(connfd); //child services client
Close(connfd); //child closes connection with client
exit(0);
}
Close(connfd); // parent closes connected socket
}
}
void sigchld_handler(int sig)
{
while(waitpid(-1, 0, WNOHANG) > 0);
}
Issues with process-based servers
- listening server process must reap zombie children
- to avoid fatal memory leak
- parent process must close its copy of connfd
- kernel keeps reference count for each socket/open file
- after fork, refcnt(connfd) = 2
- connection will not be closed until refcnt(connfd) = 0
event-based server
- server maintains set of active connections
- array of connfd’s
- repeat:
- determine which descriptors (connfd’s or listenfd) have pending inputs
- e.g., using
select
orepoll
functions - arrival of pending input is an event
- e.g., using
- if listenfd has input, then
accept
connection- and add new connfd to array
- service all connfd’s with pending inputs
- determine which descriptors (connfd’s or listenfd) have pending inputs
thread-based server
- very similar to process-based, but using threads instead of processes
A process with multiple threads
- multiple threads can be associated with a process
- each thread has its own logical control flow
- each thread shares the same code, data and kernel context
- each thread has its own stack for local variabls, but not protected from other threads
- each thread has its own thread id(tid)
logical view of threads
Threads associated with process form a pool of peers, unlike process which form a tree hierarchy
threads vs. processes
how threads and processes are similar
- each has its own logical control flow
- each can run concurrently with others (possibly on different cores)
- each is context switched
how threads and processes are different
- threads share all code and data (except local stack), processes typically do not
- threads are somewhat less expensive than processes. process control (creating and reaping) twice as expensive as thread control.
- linux numbers: ~20K cycles to create and reap a process, ~10K cycles (or less) to create and reap a thread
posix(Portable Operating System Interface) threads (pthreads) interface
- pthreads: standard interface for ~60 functions that manipulate threads from C programs
- creating and reaping threads.
pthread_create(), pthread_join()
- determining thread ID,
pthread_self()
- terminating threads,
pthread_cancle(), pthread_exit(), exit(), RET
- synchronizing access to shared variables,
pthread_mutex_init(), pthread_mutex_lock(), pthread_mutex_unlock
- detached mode, default state is joinable, joinable thread can be reaped and killed by other threads, but detached thread runs independently of other threads and got reaped automatically by kernel when it terminates.
pthread_detach()
- creating and reaping threads.
pthread hello world program
void *thread(void* vargp) //thread routine
{
puts("hello world.\n");
return NULL;
}
int main()
{
pthread_t tid;
Pthread_create(&tid, NULL, thread, NULL);
//args: thread_ID, thread_attributes, thread_routine, thread arguments(void*p)
Pthread_join(tid, NULL);
//args: thread_ID, return value(void**p)
exit(0);
}
pros and cons of thread-based designs
- pros: easy to share data structures between threads
- pros: threads are more efficient than processes
- cons: unintentional sharing can introduce subtle and hard-to-reproduce errors. the ease with which data can be shared is both the greatest strength and the greatest weakness of threads.
summary: approaches to concurrency
- process based
- hard to share resources: easy to avoid unintended sharing
- high overhead(开销) in adding / removing clients
- event based
- tedious and low level
- total control over scheduling
- very low overhead
- can not create as fine grained a level of concurrency
- does not make use of multi-core
- thread based
- easy to share resources, perhaps too easy
- medium overhead
- not much control over scheduling policies
- difficult to debug, event orderings not repeatable
lecture 24 sychronization: basics
shared variables in threaded C programs
def: a variable x is shared if and only if multiple threads reference at least one instance of x.
requires answers to the following questions to know “shared”
- what is the memory model for threads
- how are instances of variables mapped to memory
- how many threads might reference each of these instances
threads memory model
- conceptual model:
- multiple threads run within the context of a single process
- each thread has its own separate thread context
- thread ID, stack, stack pointer, PC, condition codes, general purpose registers
- all threads share the remaining process context
- code, data, heap, and shared library segments of the process virtual address space
- open files and installed handlers
- operationally, this model is not strictly enforced:
- register values are truly separate and protected
- but any thread can read and write the stack of any other thread
mapping variable instances to memory
- global variables, virtual memory contains exactly one instance of any global variable
- local variables, each thread stack contains one instance of each local variable
- local static variables, virtual memory contains exactly one instance of any local static variable