CSAPP-note

历史记录

清除记录

猜你想搜

AcWing热点
App
登录/注册

作者：

Rick@XDU , 2021-12-24 22:08:28 , 所有人可见 , 阅读 240

prologue

CSAPP的笔记。不完整，只记录大概听懂的和新学到的东西。

lecture 2

unsigned和signed作比较的时候，signed会转为unsigned，可能会导致结果错误

int x = -1;
unsigned int y = 0;
if(x > y)  // 结果为真，-1的二进制是全1，转为unsigned，就大于0

lecture 3

int x = -3;
cout << (x >> 1); // (++x >> 1)

结果是-2，是错误的，负数用移位做除法时应该加上一个偏移，加1，这时候结果就对了

lecture 4 floating point

浮点数的表示，IEEE的标准，最高一位是符号位，之后是exp指数，然后是frac尾数。
exp全0为denormalized，非全0为normalized，前者frac有隐藏的前导0，后者frac有隐藏的前导1。
exp=原来的指数e+bias，使得0表示最小的指数，bias与sizeof float/double规模有关。
越靠近0越密集，越往外越稀疏。
浮点数的默认舍入是nearest even，正好在中间的数舍入到最近的偶数上，其他的正常舍入
如果有很大的数和很小的数做运算，那么浮点数运算顺序可能影响结果，可能不符合结合律

lecture 5 machine level programing

汇编代码中以点开头的语句不是指令，是给调试器或链接器看的
intel语法一般前面是dest 后面是src，AT&T语法一般前面是src 后面是dest
movq指令中的q是quad，4字，8字节，一个字是2字节(Intel历史8086，16位2字节)

(%rdx, %rcx)     这个寻址就是 %rdx + %rcx 作为地址
(%rdx, %rcx, 4)  这个寻址就是 %rdx + 4 * %rcx 作为地址
0x80(, %rdx, 2)  这个寻址就是 2 * %rdx + 0x80 作为地址

编译器经常使用lea指令配合移位指令来做乘法，因为乘法指令比较慢

Linux-64传参

从左往右，第一个参数RDI，之后，RSI, RDX, RCX, R8, R9，更多的用栈。
浮点数会使用其他特定的寄存器见lecture-8

lecture 6 machine level programing

registers

CF carry flag (for unsigned)
OF overflow flag (for signed)

两个操作数符号相同但是结果符号变了，就是溢出，两个操作数符号不一样就不会溢出

set系列指令，根据上一条指令修改的标志寄存器来给指定的寄存器赋值0或1，比如

cmpq   %rsi, %rdi   compare指令会修改flags寄存器
setg   %al          setg指令根据flags寄存器来给al赋值
movzbl %al, %eax    move zero extension byte to long, 把al的值赋给%eax，左边的扩展为0。这里就是把左边清零。
ret                 x64下，32位的结果，寄存器高32位都是0，但是2字节的就只影响2字节，这是AMD定的规矩

switch case语句

汇编会生成一个表(jump table使得 $O(1)$ 时间找到对应的case)JTab[]对应所有case，如果case中有负数，会加上一个bias使得最小的case是0
如果若干case间隔很大又很稀疏，那么汇编会变成if-else, 多的话会使用BST数据结构，使得查找时间是 $log$ 级别

lecture 7 machine level programing

ABI标准，application binary interface，针对各个操作系统
r10, r11 caller saved
rbx, r12, r13, r14, rbp callee saved, push to stack before altering, restore before returning

lecture 8 machine level programing

struct元素在内存里都是8字节或4字节对其的，所以可能有对其空间浪费了，使用贪心的声明可以最小化这种浪费，即先声明大的
浮点数寄存器传参，使用XMM寄存器，%xmm0, %xmm1...，结果保存在%xmm0
所有的XMM寄存器都是caller saved
MMX（Multi Media eXtensions）多媒体扩展指令集
SIMD（Single Instruction Multiple Data，单指令多数据）
SSE is short for Streaming SIMD Extension
Advanced Vector Extensions (Intel AVX) AVX2 AVX512 浮点性能翻倍，整数性能增加33%
引入新寄存器，XMM YMM AVX512也引入新寄存器

lecture 9 machine level programing

x64 linux memeroy 43bit physical, 48bit virtual
canary，64位 %fs:0x28 其最低字节是00，off-by-one也没事
32位的是从%gs:0x14这里取canary，这个都是TLS结构体中的
canary最后一个字节是0，防止使用类似printf的函数把canary泄露出来，打印的时候遇到canary最低字节的00就停止了，后面的内容不会被打印出来
union 用法类似struct，struct会给所有成员分配空间，但是union成员，分配的空间是最大成员的空间，所有人共用这个空间，修改一个成员的值，内存就变了，其他成员也变了。实际使用很少，一个使用场景是需要两个类型中的一个，有的需要类型A，有的需要类型B

lecture 10 program optimization

memory aliasing、function call are optimization blocker
循环里面b[i] += a[i]类似的编译器不会优化，因为不确定有没有内存的重叠，b修改有可能影响到a，所以汇编代码就会循环读内存，加，写内存，引入局部变量后，就可以省略掉循环写内存的步骤。因为还是需要循环读内存所以优化并不明显。

lecture 11 the memory hierarchy

hard disk drive, platters盘片, arm, spindle主轴, track磁道, sectors扇区 separated by gaps 现在的磁盘，靠外的磁道有更多的扇区，减少空间浪费
现在IO使用PCI express，之前使用PCI总线，PCI是广播的，所有设备共用BUS，所有设备都能看到BUS上的数据。PCIE是点对点的，由某种开关仲裁
CPU访问磁盘时，发送三元组，(command, logical block number, memory address)；磁盘控制器将逻辑block number转为物理扇区, 读取数据通过BUS送给内存，完成之后DMA产生中断通知CPU
locality程序的局部性，局部性越好，程序性能越好

lecture 12 cache memory

计组cache部分，直接映射，全相联，组相联

如何提高程序的局部性
1. focus on the inner loops, where bulk of computations and memory accesses occur
2. try to maximize spatial locality by reading data objects with sequentially with stride 1
3. try to maximize temporal locality by using a data object as often as possible once it’s read from memory

lecture 13 linking

BSS段， ELF文件在磁盘上，未初始化的全局变量是不占磁盘空间的，但是加载到内存里要占空间
编译器使用的是偏移，不知道真实的地址是多少
需要linker重定向的东西编译器使用0占位，并留下一个relocation entry
call的时候linker填的是一个偏移，RIP/EIP加上这个偏移就是实际要call的地址，PC-relative addressing
ELF file 结构看PPT
几种不同的symbol， symbol分强弱，函数名、初始化的全局变量是强的，未初始化的是弱的
linker不允许重名的强symbol，一个 strong 多个weak, 选strong, 多个weak随机选一个
函数内声明static的数据不在栈上，在data段

why linkers? one reason is that

改变一个，只需要重新编译一个文件，然后重新链接即可，不用重新编译其他的

what do linkers do?

symbol resolution
1.programs define and reference symbols (global variables and functions)
2.symbol definitions are stored in object file (by assembler) in symbol table
symbol table is an array of structs, each entry includes name, size and location of symbol
3.during symbol resolution step, the linker associates each symbol reference with exactly one symbol definition
relocation
1.merges separate code and data sections into single section
2.relocate symbols from their relative locations in the .o files to their final absolute memory locations
in the executable
3.updates all references to these symbols to reflect their new positions

Linux内存布局，简化的

kernel
user stack, 向下增长

memory-mapped region for shared libraries

run-time heap                           
data segment(.data .bss)          
read-only code segment                           0x400000 start
unused                                           0

interposition 技术，类似钩子，可以用于调试

lecture 14 exceptional control flow

kernel是永久驻留在内存中的部分
故意的异常就是陷阱，最常见的陷阱就是系统调用，控制权转交给内核
linux system call, always check return value unless return void

程序员视角可以把进程状态看作有3种

running, either executing or waiting to be executed and will eventually be scheduled
stopped, suspended and will not be scheduled until further notice
terminated, stopped permanently

创建进程

int fork(void)

returns 0 to the child process, child’s pid to parent process
child is almost identical to parent:
1. child get an identical but separate copy of the parent’s virtual address space
2. child get copies of the parent’s open file descriptors

pid = fork();
if(!pid){//child;}
//parent             谁先执行是不确定的，内核说了算

子进程完成之后变成僵尸进程，显示defunct，父进程可以使用wait或waitpid回收(reap)子进程
父进程不回收的话，pid为1的init进程就会回收孤儿进程。
长时间运行的程序需要回收子进程，不然僵尸进程太多，占用太多内存，造成内存泄漏
常见用法，fork子进程，然后子进程去execve一个程序

lecture 15 exceptional control flow: signals

init进程是第一个启动的进程，pid=1

signals

a signal is a small message that notifies a process that an event of some type has occurred in the system
similar to exceptions and interrupts
sent from the kernel (sometimes at the request of another process) to a process
signal type is identified by small integer ID’s
the only information in a signal is its ID and the fact that it arrived

ID	Name	Default Action	Corresponding Event
2	SIGINT	Terminate	user typed ctrl+c
9	SIGKILL	Terminate	kill program (cannot override or ignore)
11	SIGSEGV	Terminate	segmentation violation
14	SIGALRM	Terminate	timer signal
17	SIGCHLD	Ignore	child stopped or terminate

sending a signal

kernel sends a signal to a destination process by updating some state in the context (进程空间上下文) of the destination process

receiving a signal

a destination process receives a signal when it is forced by the kernel to react in some way to the delivery of the signal
some possible ways to react:
- ignore the signal
- terminate the process (with optional core dump)
- catch the signal by executing a user-level function called signal handler

pending and blocked signals

a signal is pending if sent but not yet received
- there can be at most 1 pending signal of any type
- signals are not queued, if a process has a pending signal of type k, the subsequent signals of type k that are sent to that process are discarded
a process can block the receipt of certain signals
- blocked signals can be delivered, but will not be received until the signal is unblocked
a pending signal is received at most once

pending/blocked bits

kernel maintains pending and blocked bit vectors in the context of each process
- pending: represents the set of pending signals
  - kernel sets bit k in pending when a signal of type k is delivered
  - kernel clears bit k in pending when a signal of type k is received
- blocked: represents the set of blocked signals
  - can be set and cleared by using the sigprocmask function
  - also referred to as the signal mask

sending signals: process groups

every process belongs to exactly one process group
getpgrp()返回当前进程的pgid
setpgid()change pgid of a process

sending signals with /bin/kill program

/bin/killprogram sends arbitrary signal to a process or process group
kill这个程序并不一定总是杀死某个进程
examples:
- /bin/kill -9 24818send SIGKILL to process 24818
- /bin/kill -9 -24817send SIGKILL to every process in process group 24817

sending signals from the keyboard

typing ctrl-c (ctrl-z) causes the kernel to send a SIGINT (SIGTSTP) to every job in the foreground process group
- SIGTSTP, default action is to stop (suspend) each process

receiving signals

suppose kernel is returning from an exception handler and is ready to pass control to process p
kernel computes pnb = pending & ~blocked
- the set of pending nonblocked signals for process p
if(pnb == 0)
- pass control to next instrution in the logical flow for process p
else
- choose least nonzero bit k in pnb and force process p to receive signal k
- the receipt of the signal triggers some action by p
- repeat for all nonzero bit in pnb
- pass control to next instrution in the logical flow for process p

default action

每个信号都有默认的动作，这个默认的动作可以使用下面这个函数修改

handler_t *signal(int signum, handler_t *handler)

参数的handler可以是SIG_IGN(ignore)或SIG_DFL(恢复default)内部自带的，也可以是自己写的函数的地址

blocking & unblocking signals

implicit blocking mechanism, kernel blocks any pending signals of type currrently being handled
explicit blocking and unblocking mechanism, sigprocmask()function
supportting functions, sigemptyset(), sigfillset(), sigaddset(), sigdelset()

safe signal handling

handlers are tricky because they are concurrent with main program and share the same global data structures, shared data structures can become corrupted. below are some guidelines.
keep your handlers as simple as possible
call only async-signal-safe functions in your handlers
save and restore errno on entry and exit
protect accesses to shared data structures by temporarily blocking all signals
declare global variables as volatile, to prevent compiler from storing them in a register
declare global flags as volatile sig_atomic_t

后面听不懂了。

lecture 16 system level IO

unix IO overview

a linux file is a sequence of bytes
all IO devices are represented as files
even the kernel is represented as a fiel
elegant mapping of files to devices allows kernel to export simple interface called Unix IO:
- opening and closing files, open() and close()
- reading and writing a file, read() and write()
- changing the current file position (seek). file position indicates next offset into file to read or write

file types

each file has a type indicating its role in the system
- regular file, contains arbitrary data
- directory, index for a related group of files
- socket, for communicating with a process on another machine
other file types beyond our scope
- named pipes
- symbolic links
- character and block devices

regular files

contains arbitrary data
applications often distinguish between text files and binary files, but the kernel doesn’t know the difference
newline char, for linux and Mac OS, \n(0xa), line feed, LF; windows and Internet protocols, \r\n(0xd 0xa), carriage return, CR, followed by line feed, LF

directories

directories consists of an array of links, each link maps a filename to a file
each directory contains at least two entries, . is a link to itself, and .. is a link to parent directory

opening & closing files

fd = open(path, format), format is predifined, bitwise
retval = close(fd), 关闭一个已经关掉的文件就会出错，返回值小于0

reading & writing files

nbytes = read(fd, buf, sizeof buf)
nbytes = write(fd, buf, sizeof buf)这两个系统调用很费时

file descriptor table for each process
open file table, shared

IO redirection

how does a shell implement IO redirection
by calling dup2(oldfd, newfd), dup2(4, 1)标准输出的重定向，fd table中fd4会被复制到fd1中实现重定向

standard IO

C library contains a collection of higher-level standard io functions
fopen;fclose;fread;fwrite;fgets;fputs;fscanf;fprintf

buffering in std io

std io functions use bufferd io
- 如果多次调用printf("h");printf("e");....fflush(stdout);，最终只会有1次系统调用。
- 追踪系统调用可以用strace命令

几个IO之间

最基本的IO用于底层，比如处理信号
stdio适合文件和终端，不适合网络，网络需要自己写IO。老教授的合著者写了个RIO

lecture 17 virtual memory

page table

a page table is an array of page table entries (PTEs) that maps virtual pages to physical pages
each process has a page table
地址转换如OS课程所讲

lecture 18 virtual memory

这节课视频不是PPT录屏，看不清，听不懂，跳过

lecture 19 & 20 dynamic memory allocation

malloc package

malloc申请的空间在32位系统中8字节对齐，64位系统中16字节对齐

教授更多讲的是如何管理内存，就是自己写malloc和free，并没有直接讲glibc malloc。glibc malloc可以看sploitfun写的，prologue提到的。

c pointer declarations

int (*(*f())[13])() 
David O'Hallaron: if you ever use anything like that in your code, shame on you!

lecture 21 22 network programming

之前写的分享c/c++_socket

lecture 23 concurrent programming

approaches for writing concurrent servers

process-based
- kernel automatically interleaves(交织) multiple logical flows
- each flow has its own private address space
event-based
- programmer manually interleaves multiple logical flows
- all flows share the same address space
- uses technique called I/O multiplexing
thread-based
- kernel automatically interleaves multiple logical flows
- each flow shares the same address space
- hybrid of process-based and event-based

process-based concurrent echo server

int main(int argc, char **argv)
{
    int listenfd, connfd;
    socklen_t clientlen;
    struct sockaddr_storage clientaddr;

    Signal(SIGCHLD, sigchld_handler);
    listenfd = Open_listenfd(argv[1]);
    while(1)
    {
        clientlen = sizeof(struct sockaddr_storage);
        connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
        if(Fork() == 0)
        {
            Close(listenfd);  // child closes its listening socket
            echo(connfd);     //child services client
            Close(connfd);    //child closes connection with client
            exit(0);   
        }
        Close(connfd);     // parent closes connected socket
    }
}

void sigchld_handler(int sig)
{
    while(waitpid(-1, 0, WNOHANG) > 0);
}

Issues with process-based servers

listening server process must reap zombie children
- to avoid fatal memory leak
parent process must close its copy of connfd
- kernel keeps reference count for each socket/open file
- after fork, refcnt(connfd) = 2
- connection will not be closed until refcnt(connfd) = 0

event-based server

server maintains set of active connections
- array of connfd’s
repeat:
- determine which descriptors (connfd’s or listenfd) have pending inputs
  - e.g., using select or epoll functions
  - arrival of pending input is an event
- if listenfd has input, then accept connection
  - and add new connfd to array
- service all connfd’s with pending inputs

thread-based server

very similar to process-based, but using threads instead of processes

A process with multiple threads

multiple threads can be associated with a process
each thread has its own logical control flow
each thread shares the same code, data and kernel context
each thread has its own stack for local variabls, but not protected from other threads
each thread has its own thread id(tid)

logical view of threads

Threads associated with process form a pool of peers, unlike process which form a tree hierarchy

threads vs. processes

how threads and processes are similar

each has its own logical control flow
each can run concurrently with others (possibly on different cores)
each is context switched

how threads and processes are different

threads share all code and data (except local stack), processes typically do not
threads are somewhat less expensive than processes. process control (creating and reaping) twice as expensive as thread control.
- linux numbers: ~20K cycles to create and reap a process, ~10K cycles (or less) to create and reap a thread

posix(Portable Operating System Interface) threads (pthreads) interface

pthreads: standard interface for ~60 functions that manipulate threads from C programs
- creating and reaping threads. pthread_create(), pthread_join()
- determining thread ID, pthread_self()
- terminating threads, pthread_cancle(), pthread_exit(), exit(), RET
- synchronizing access to shared variables, pthread_mutex_init(), pthread_mutex_lock(), pthread_mutex_unlock
- detached mode, default state is joinable, joinable thread can be reaped and killed by other threads, but detached thread runs independently of other threads and got reaped automatically by kernel when it terminates. pthread_detach()

pthread hello world program

void *thread(void* vargp) //thread routine
{
    puts("hello world.\n");
    return NULL;
}

int main()
{
    pthread_t tid;
    Pthread_create(&tid, NULL, thread, NULL);
    //args: thread_ID, thread_attributes, thread_routine, thread arguments(void*p)
    Pthread_join(tid, NULL);
    //args: thread_ID, return value(void**p)
    exit(0);
}

pros and cons of thread-based designs

pros: easy to share data structures between threads
pros: threads are more efficient than processes
cons: unintentional sharing can introduce subtle and hard-to-reproduce errors. the ease with which data can be shared is both the greatest strength and the greatest weakness of threads.

summary: approaches to concurrency

process based
- hard to share resources: easy to avoid unintended sharing
- high overhead(开销) in adding / removing clients
event based
- tedious and low level
- total control over scheduling
- very low overhead
- can not create as fine grained a level of concurrency
- does not make use of multi-core
thread based
- easy to share resources, perhaps too easy
- medium overhead
- not much control over scheduling policies
- difficult to debug, event orderings not repeatable

lecture 24 sychronization: basics

shared variables in threaded C programs

def: a variable x is shared if and only if multiple threads reference at least one instance of x.

requires answers to the following questions to know “shared”

what is the memory model for threads
how are instances of variables mapped to memory
how many threads might reference each of these instances

threads memory model

conceptual model:
- multiple threads run within the context of a single process
- each thread has its own separate thread context
  - thread ID, stack, stack pointer, PC, condition codes, general purpose registers
- all threads share the remaining process context
  - code, data, heap, and shared library segments of the process virtual address space
  - open files and installed handlers
operationally, this model is not strictly enforced:
- register values are truly separate and protected
- but any thread can read and write the stack of any other thread

mapping variable instances to memory

global variables, virtual memory contains exactly one instance of any global variable
local variables, each thread stack contains one instance of each local variable
local static variables, virtual memory contains exactly one instance of any local static variable

0 评论

App 内打开