Basic Concepts

Data flows on CFG

In CFG, the application-specific Data(Abstraction) will flows through the Nodes(BBs/Statement), Edges(control flow), CFG(a program).

Most static analyses are may analysis:

  • May Analysis:

    Output maybe true. (over-approximation) – 近似值也可以

  • Must Analysis:

    Output must true. (Under-approximation) – 不想要近似值

Preliminaries of DFA

DFA(here): Data Flow Analysis

  • Each red point represents as a program states.

  • Each green filed for data-flow values.

  • domain: 不同的 DFA 有不同的 domain 值,domain 包含了所有该应用可能出现的值

    domain values 有可能是 variables, expressions …

Forwards and Backwards

  • Forward: yellow part.
  • Backward: red part.
  • Meet: BBs 间的运算方式

Understand the three data flow analyses

Reaching Definitions

  • Generates a definition and kills all the other definitions including variable v
  • Forwards direction
  • Domain: definitions
  • Meet: $\bigcup$
  • Initialization: For every BBs(including entry) initialized as NULL set.
  • Stop: any OUT sets wouldn’t change.

An example:

需要注意:

  1. Meet 操作是 union 还是 intersection
  2. IN[s] 的 s 是否有几个 BBs,看箭头数量即可
  3. IN 不变,OUT 也不变,存在 monotonicity(单调性),可以到达 fixed point.

Live Variables

  • Domain:Variables
  • OUT:后续已知 IN 的并集
  • $use_B$ :表示在 B 中使用了未被 redefine 的 vars
  • $def_B$:表示 vars 在 B 中被重定义了,$IN[B]$ 的对应 var 值为 dead value,需要被 kill 掉

注意:

  1. Initialization: for each BBs including exit initialed as IN[Block] = $\varnothing$

  2. Meet: $\bigcup$

  3. Direction: Backwards

An example:

Tips:

  • 最好把 IN 和 OUT 都表示出来,这样不容易混
  • Domain 是 variables,容易与 Reaching Definitions 的 definitions 混淆
  • 应用:寄存器值替换,如果一个值是 dead value,可以清空值所在的寄存器,节省空间

Available Expressions

  • Meet: $\bigcap$

    碎碎念: 如果是 $\bigcup$ 则结果为 may,如果是 $\bigcap$ 则结果为 must

  • Domain: expression

    Define 是每个 block 内的定义式,expression 有可能出现在不同的 block 内

  1. OUT[entry] = $\varnothing$

  2. OUT[other blocks] = $\bigcup$ –> 11111...111

    如果不全部初始化为 1,之后取交集会陷入全 0 的死循环

An example:

碎碎念: 注意 Available expression 的定义,转移过程中仅关注表达式的值是否还能被再次利用,如果不能(不能说明值已更新),则进行 expression 的替换

Differences and Similarities of the three data flow analyses

Fill the blank:

Answer:

  • Boundary: 对 Exit 和 Entry 的限制
  • Initialization: DFA 中对 BBs 的初始化,跟 Direction 和 Meet 相关

Interative Alg. and why it is able to terminate

My Answer:

Because when INs are not change, the OUTs will not change, after iterations, the INs and OUTs will reach a fixed point.

Right Answer: