Arya

Efficient Grammar Fuzzing(二)
[+] 摸鱼一整天,破产姐妹真好看!![+] 我永远喜欢max和Caroline~0x01 扩展一棵树接下来我们可...
扫描右侧二维码阅读全文
25
2019/01

Efficient Grammar Fuzzing(二)

[+] 摸鱼一整天,破产姐妹真好看!!
[+] 我永远喜欢max和Caroline~

0x01 扩展一棵树

接下来我们可以将这种方法应用于扩展一棵树的节点,因此,首先我们需要在树中搜索未展开的节点

定义函数possible_expansions()来计算有多少棵未展开的树

class GrammarFuzzer(GrammarFuzzer):
    def possible_expansions(self, node):
        (symbol, children) = node
        if children is None:
            return 1

        return sum(self.possible_expansions(c) for c in childre

打印输出

f = GrammarFuzzer(EXPR_GRAMMAR)
print(f.possible_expansions(derivation_tree))

结果为
2
接下来我们定义方法any_possible_expansions(),当树中还存在未展开的子节点时,则方法返回True

class GrammarFuzzer(GrammarFuzzer):
    def any_possible_expansions(self, node):
        (symbol, children) = node
        if children is None:
            return True

        return any(self.any_possible_expansions(c) for c in children)

打印输出结果

f = GrammarFuzzer(EXPR_GRAMMAR)
f.any_possible_expansions(derivation_tree)

True

0x02 控制扩展

从上面的代码中,我们可以不断地扩展树,那么,我们如何停止这些树无限的扩展呢?举个例子,在上面的语法中,<factor>可以进入递归循环,也可以扩展到<integer>为止,这样我们就可以避免进入无限的递归中,同样对于<integer>我们希望首选扩展<digit>,这样就比<digit><integer>增加的扩展树小很多。

那么问题来了,需要优先选择扩展项,那么必须知道每个扩展符号所耗费的代价,因此我们再定义两个函数,分别用函数symbol_cost()返回所有符号扩展的最小代价以及使用函数expansion_cost()返回扩展中所有扩展代价的总和,且当遍历过程中再次遇到非终端符号,将其代价赋值为∞,表示递归

class GrammarFuzzer(GrammarFuzzer):
    def symbol_cost(self, symbol, seen=set()):
        expansions = self.grammar[symbol]
        return min(self.expansion_cost(e, seen | {symbol}) for e in expansions)
#计算扩展中所有代价的总和
    def expansion_cost(self, expansion, seen=set()):
        symbols = nonterminals(expansion)
        if len(symbols) == 0:
            return 1  # no symbol

        if any(s in seen for s in symbols):
            return float('inf')#当再次遍历到同一个符号的时候

        # the value of a expansion is the sum of all expandable variables
        # inside + 1
        return sum(self.symbol_cost(s, seen) for s in symbols) + 1

其实代码看到这里时,就知道这里用的算法果然是动态规划,不过不得不说的是用python写动归的代码明显比用C写动归写起来方便一些,接下来举了两个例子
我们首先检测最简单的<digit>,它扩展的代价只有1,

f = GrammarFuzzer(EXPR_GRAMMAR)
assert f.symbol_cost("<digit>") == 1

运行准确无误

接下来我们检测<expr>,从之前的代码我们可以推出<expr>的最小代价是5(<expr> → <term> → <factor> → <integer> → <digit> → 1),即尽可能的避开所有的递归扩展

assert f.symbol_cost("<expr>") == 5

正确无误

接下来我们定义一个函数expand_node_by_cost()用来确定所有子项的最小成本,然后使用choose()函数从列表中选择一个子项,在默认情况下,该函数是扩展符号所需要的最小代价,如果多个子节点都有相同的代价,函数将在这些子节点之间随机选择

class GrammarFuzzer(GrammarFuzzer):
    def expand_node_by_cost(self, node, choose=min):
        (symbol, children) = node
        assert children is None

        # Fetch the possible expansions from grammar...
        expansions = self.grammar[symbol]

        possible_children_with_cost = [(self.expansion_to_children(expansion),
                                        self.expansion_cost(
                                            expansion, {symbol}),
                                        expansion)
                                       for expansion in expansions]

        costs = [cost for (child, cost, expansion)
                 in possible_children_with_cost]
        chosen_cost = choose(costs)#选择一个所需代价最小的子项
        children_with_chosen_cost = [child for (child, child_cost, _) in possible_children_with_cost
                                     if child_cost == chosen_cost]
        expansion_with_chosen_cost = [expansion for (_, child_cost, expansion) in possible_children_with_cost
                                      if child_cost == chosen_cost]

        index = self.choose_node_expansion(node, children_with_chosen_cost)

        chosen_children = children_with_chosen_cost[index]
        chosen_expansion = expansion_with_chosen_cost[index]
        chosen_children = self.process_chosen_children(
            chosen_children, chosen_expansion)

        # Return with a new list
        return (symbol, chosen_children)

我们接着定义一个函数expand_node_min_cost()min()(即子项的最小代价)作为参数传递给choose()函数

class GrammarFuzzer(GrammarFuzzer):
    def expand_node_min_cost(self, node):
        if self.log:
            print("Expanding", all_terminals(node), "at minimum cost")

        return self.expand_node_by_cost(node, min)

接下来我们可以调用函数expand_tree_once()以及函数expand_node_min_cost()输出最小代价的树(其实写到这里的时候,我有一种我一直在算最小生成树的错觉,滑稽)

class GrammarFuzzer(GrammarFuzzer):
    def expand_node(self, node):
        return self.expand_node_min_cost(node)
f = GrammarFuzzer(EXPR_GRAMMAR, log=True)
display_tree(derivation_tree)

画出图像
Image10-1.png
接下来我们随机选择一个子节点进行最小代价符号扩展

if f.any_possible_expansions(derivation_tree):
    derivation_tree = f.expand_tree_once(derivation_tree)
    display_tree(derivation_tree)

随机选择结果为<expr>
Image10-2.png
我们也可以重复执行上述语句,最终全部展开,当然不需要将这条语句写无数遍,能简单就解决的事情绝不复杂化,不是嘛

while f.any_possible_expansions(derivation_tree):
    derivation_tree = f.expand_tree_once(derivation_tree)    

完成对每个子节点的扩展
Expanding <integer> at minimum cost
Expanding <factor> at minimum cost
Expanding <factor> at minimum cost
Expanding <integer> at minimum cost
Expanding <integer> at minimum cost
Expanding <digit> at minimum cost
Expanding <digit> at minimum cost
Expanding <digit> at minimum cost
展开所有的子节点

display_tree(derivation_tree)

Image10-3.png

0x02 节点扩展

上面定义的函数expand_node_min_cost()在选择扩展的节点时,总是选择消耗代价最小的节点,然而有的时候我们希望有些节点可以尽可能的扩展到最大值,这与expand_node_min_cost()所实现的结果是相反的,于是我们再定义一个函数expand_node_max_cost()在选择节点时,总是在代价最高的节点中选择

class GrammarFuzzer(GrammarFuzzer):
    def expand_node_max_cost(self, node):
        if self.log:
            print("Expanding", all_terminals(node), "at maximum cost")

        return self.expand_node_by_cost(node, max)

为了演示一下expand_node_max_cost(),我们可以重新定义expand_node()方便调用expand_node_max_cost()

class GrammarFuzzer(GrammarFuzzer):
    def expand_node(self, node):
        return self.expand_node_max_cost(node)
        
derivation_tree = ("<start>",
                   [("<expr>",
                     [("<expr>", None),
                      (" + ", []),
                         ("<term>", None)]
                     )])

f = GrammarFuzzer(EXPR_GRAMMAR, log=True)display_tree(derivation_tree)

接下来可以不断调用函数按照最大代价扩展随机展开节点

if f.any_possible_expansions(derivation_tree):
    derivation_tree = f.expand_tree_once(derivation_tree)
    display_tree(derivation_tree)

Image10-4.png
因为最大扩展是没有尽头的,所以就别想着写一个while循环解决一切了(哈哈哈

0x03 三个扩展阶段
昨天的博客里提到了随机的扩展,也就是不加任何控制的符号扩展,今天说了最小的扩展以及最大扩展,那么我们也可以将这三种扩展同时放在一个函数expand_tree()中使用,工作方式如下:

  • Max cost expansion.通过最大代价来扩展树,直到到扩展到符合需要为止,此时我们可以设置min_nonterminals值为0.
  • Random expansion.随即扩展树,直到达到max_nonterminals的代价为止
  • Min cost expansion.扩展成为代价最小的树

我们通过让expand_node函数调用这三种不同的扩展方式,使这三种扩展方式运用在一次树扩展中。通过将expand_node函数设置为先调用函数expand_node_max_cost,然后调用函数expand_node_randomly,最后调用函数expand_node_min_cost,同时,在前两个阶段我们可以通过设置min_nonterminals以及max_nonterminals值以便于对扩展程度进行限制


class GrammarFuzzer(GrammarFuzzer):
    def log_tree(self, tree):
        """Output a tree if self.log is set; if self.display is also set, show the tree structure"""
        if self.log:
            print("Tree:", all_terminals(tree))
            if self.disp:
                display_tree(tree)
            # print(self.possible_expansions(tree), "possible expansion(s) left")

    def expand_tree_with_strategy(self, tree, expand_node_method, limit=None):
        """Expand tree using `expand_node_method` as node expansion function        until the number of possible expansions reaches `limit`."""
        self.expand_node = expand_node_method
        while ((limit is None
                or self.possible_expansions(tree) < limit)
               and self.any_possible_expansions(tree)):
            tree = self.expand_tree_once(tree)
            self.log_tree(tree)
        return tree

    def expand_tree(self, tree):
        """Expand `tree` in a three-phase strategy until all expansions are complete."""
        self.log_tree(tree)
        tree = self.expand_tree_with_strategy(
            tree, self.expand_node_max_cost, self.min_nonterminals)
        tree = self.expand_tree_with_strategy(
            tree, self.expand_node_randomly, self.max_nonterminals)
        tree = self.expand_tree_with_strategy(
            tree, self.expand_node_min_cost)

        assert self.possible_expansions(tree) == 0

        return tree

接下来可以对进行测试


derivation_tree = ("<start>",
                   [("<expr>",
                     [("<expr>", None),
                      (" + ", []),
                         ("<term>", None)]
                     )])

f = GrammarFuzzer(
    EXPR_GRAMMAR,
    min_nonterminals=3,
    max_nonterminals=5,
    log=True)derivation_tree = f.expand_tree(derivation_tree)

运行结果如下


Tree: <expr> + <term>
Expanding <expr> at maximum cost
Tree: <term> + <expr> + <term>
Expanding <term> randomly
Tree: <factor> / <term> + <expr> + <term>
Expanding <expr> randomly
Tree: <factor> / <term> + <term> + <expr> + <term>
Expanding <expr> at minimum cost
Tree: <factor> / <term> + <term> + <term> + <term>
Expanding <factor> at minimum cost
Tree: <integer> / <term> + <term> + <term> + <term>
Expanding <term> at minimum cost
Tree: <integer> / <term> + <term> + <term> + <factor>
Expanding <term> at minimum cost
Tree: <integer> / <term> + <term> + <factor> + <factor>
Expanding <term> at minimum cost
Tree: <integer> / <factor> + <term> + <factor> + <factor>
Expanding <term> at minimum cost
Tree: <integer> / <factor> + <factor> + <factor> + <factor>
Expanding <factor> at minimum cost
Tree: <integer> / <factor> + <factor> + <factor> + <integer>
Expanding <integer> at minimum cost
Tree: <integer> / <factor> + <factor> + <factor> + <digit>
Expanding <digit> at minimum cost
Tree: <integer> / <factor> + <factor> + <factor> + 1
Expanding <integer> at minimum cost
Tree: <digit> / <factor> + <factor> + <factor> + 1
Expanding <digit> at minimum cost
Tree: 5 / <factor> + <factor> + <factor> + 1
Expanding <factor> at minimum cost
Tree: 5 / <integer> + <factor> + <factor> + 1
Expanding <integer> at minimum cost
Tree: 5 / <digit> + <factor> + <factor> + 1
Expanding <digit> at minimum cost
Tree: 5 / 3 + <factor> + <factor> + 1
Expanding <factor> at minimum cost
Tree: 5 / 3 + <integer> + <factor> + 1
Expanding <integer> at minimum cost
Tree: 5 / 3 + <digit> + <factor> + 1
Expanding <digit> at minimum cost
Tree: 5 / 3 + 6 + <factor> + 1
Expanding <factor> at minimum cost
Tree: 5 / 3 + 6 + <integer> + 1
Expanding <integer> at minimum cost
Tree: 5 / 3 + 6 + <digit> + 1
Expanding <digit> at minimum cost
Tree: 5 / 3 + 6 + 2 + 1

这样我们就获得了同时使用了三种方式生成的树。

那么最后的最后,语法也学完了,树也有了,当然少不了将前面我们写过的fuzzing代码运用到派生树,通过对前面已有的fuzzer代码调用,随机生成不同的字符串,再以树的形式进行展开

class GrammarFuzzer(GrammarFuzzer):
    def fuzz_tree(self):
        # Create an initial derivation tree
        tree = self.init_tree()
        # print(tree)

        # Expand all nonterminals
        tree = self.expand_tree(tree)
        if self.log:
            print(repr(all_terminals(tree)))
        if self.disp:
            display_tree(tree)
        return tree

    def fuzz(self):
        self.derivation_tree = self.fuzz_tree()
        return all_terminals(self.derivation_tree)

我们可以调用fuzz()生成新的语法

f = GrammarFuzzer(EXPR_GRAMMAR)
f.fuzz()

'126.79 / 0 / 95.2 / 6'
派生树的图像如下

display_tree(f.derivation_tree)

Image10-5.png
我们也可以通过fuzzer()生成随机的url,再以树的形式展开

f = GrammarFuzzer(URL_GRAMMAR)
f.fuzz()

'https://user:password@cispa.saarland?def=abc&abc=73'

派生树如下

display_tree(f.derivation_tree)

Image10-6.png

那么此时我们可以计算调用函数fuzz()所需要的时间并与simple_grammar_fuzzer()所需时间进行比较

trials = 50xs = []ys = []f = GrammarFuzzer(EXPR_GRAMMAR, max_nonterminals=20)for i in range(trials):
    with Timer() as t:
        s = f.fuzz()
    xs.append(len(s))
    ys.append(t.elapsed_time())
    print(i, end=" ")
print()


average_time = sum(ys) / trials
print("Average time:", average_time)

输出在50次循环中的平均耗时
Average time: 0.036824420400953385
明显低于simple_grammar_fuzzer()所需时间

当然,及时我们现在的派生树测试很快,但是我们的输入也很少,调用fuzzer()也仅仅生成了一个输入测试值。

当然这个问题现在就不解决了,留到下一章吧,哈哈哈哈

学到的知识点
[+] 派生树对于输入结构表示的重要性
[+] 学会了基于对派生树的语法模糊化

Last modification:January 25th, 2019 at 10:08 pm
If you think my article is useful to you, please feel free to appreciate

Leave a Comment