Commit d68b5851 by Xu Zhou

Introduction.

parent 72fac147
...@@ -75,22 +75,25 @@ Nowadays, software is applied widely in our life, which accompanied with the occ ...@@ -75,22 +75,25 @@ Nowadays, software is applied widely in our life, which accompanied with the occ
Fuzzing is an effective method to test bugs or find security vulnerabilities in software. To test a program, a fuzzing tool usually first generates a lot of different test cases. Then it uses each test case as input to run the target and monitor whether the execution get some exceptions. In this way, it detects bugs or security vulnerabilities. Many big companies use fuzzing in their software testing, such as Microsoft [sage], Google [oss], Amazon [Cloud9] etc. Fuzzing is popular in research and security community [AFL]. Thousands of security vulnerabilities are desceovered by fuzzing [AFL, CVE list]. According to the knowledge and information acquired from the target programs, fuzzing can be divied into white-box fuzzing, black-box fuzzing, and greybox fuzzing. A white-box fuzzer has full knowledge of the source code (eg. internal logic and structure) and uses the control structure of the procedural design to derive test cases. Fuzzers that use symbolic execution [] to accurately direct the execution path of a program execution can also be classified into white-box fuzzing. In contrast, The black-box fuzzer doesn’t have any knowledge of target program, thus it generates test cases randomly and swiftly. The grey-box fuzzer try to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of target programs. Currently, grey-box fuzzing is practical and widely used in testing and vulnearability detection as it is lightweight, fast and easy-to-use []. In this paper, we focus on improving grey-box fuzzing. Fuzzing is an effective method to test bugs or find security vulnerabilities in software. To test a program, a fuzzing tool usually first generates a lot of different test cases. Then it uses each test case as input to run the target and monitor whether the execution get some exceptions. In this way, it detects bugs or security vulnerabilities. Many big companies use fuzzing in their software testing, such as Microsoft [sage], Google [oss], Amazon [Cloud9] etc. Fuzzing is popular in research and security community [AFL]. Thousands of security vulnerabilities are desceovered by fuzzing [AFL, CVE list]. According to the knowledge and information acquired from the target programs, fuzzing can be divied into white-box fuzzing, black-box fuzzing, and greybox fuzzing. A white-box fuzzer has full knowledge of the source code (eg. internal logic and structure) and uses the control structure of the procedural design to derive test cases. Fuzzers that use symbolic execution [] to accurately direct the execution path of a program execution can also be classified into white-box fuzzing. In contrast, The black-box fuzzer doesn’t have any knowledge of target program, thus it generates test cases randomly and swiftly. The grey-box fuzzer try to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of target programs. Currently, grey-box fuzzing is practical and widely used in testing and vulnearability detection as it is lightweight, fast and easy-to-use []. In this paper, we focus on improving grey-box fuzzing.
Greybox-fuzzing usually have three stages: (1) it select a seed from the seed list and mutate the seed to generate a group of test cases, (2) it feed the program with each test case and run it to collect the execution information (paths, branches, etc.), and (3) it uses the execution information to select new seeds from test cases (test case that trigger new branch is selected as seed). A typical greybox-fuzzing tool is American Fuzzy Lop (AFL) []. In AFL, the information it collect during each run is executed branches. It compress the branches into a 64K bitmap to make bitmap access fast. Greybox-fuzzing usually have three stages: (1) it selects a seed from the seed list and mutates the seed to generate a group of test cases, (2) it feeds the program with each test case and executes it to collect the feedback information (paths, branches, etc.), and (3) it uses the feedback information to select new seeds from test cases (test case that triggers new branch is selected as new seed). A typical greybox-fuzzing tool is American Fuzzy Lop (AFL) []. In AFL, the information it collect during each execution is taken-branches. It compresses the taken-branch information into a 64K bitmap to make its accessing fast.
Like AFL, greybox fuzzing is simple and effective. However, greybox-fuzzing is still compute-intensive and cost a lot of CPU hous to fully test a program. For example, it usually takes several days or even several months to find the bug of a program. Current researches mainly forcus on improving the algorithms to make fuzzing faster []. CollAFL solves the bitmap confliction problem of AFL and make seed selection sensitive to all new branches. ... Greybox fuzzing like AFL is simple and effective. However, greybox-fuzzing is still compute-intensive and cost a lot of CPU hous to fully test a program. For example, it usually takes several days or even several months to find the bug of a program. Current researches mainly forcus on improving the algorithms to make fuzzing faster [...]. Böhme et al. designed AFL-fast which assigned more mutation energy to interesting paths []. Gan et al introduced CollAFL which mitigated the path collisions by providing more accurate coverage information []. Böhme also implemented a directed grey-box fuzzing tool AFLGO towards the dangerous locations which tend to produce vulnerabilities. Zhang et al. leverage hardware mechanism (Intel Processor Trace) to collect branch information, and feed this information back to the fuzzing process. All of these extensions gained higher coverage and found more bugs than AFL.
Unlike all those works, we see this efficiency problem of greybox fuzzing in a different point of view. Instead of trying to improve a single fuzzing instrance, we try to parallize fuzzing instrances to accelerate a single fuzzing job. In this way, we can trade resource and energe for time in software testing. Our method is based on two facts. First, parallel computing is ubiquitous nowdays. We can get massive cheap computing resource easily (e.g., by using the Amazon spot instance). Second, time is much valuable in software testing and security. Consider the situation a newly developped software is about to be release. It is worth to speed more cost to make the release on schedule. Besides, parallel fuzzing optimzation is orthogonal with algorithm improve approaches. Any improved fuzzing algorithm can be easily applied in our parallel fuzzing system. Unlike all those works, we see this efficiency problem of greybox fuzzing in a different point of view. Instead of trying to improve a single fuzzing instrance, we try to parallize fuzzing instrances to accelerate a single fuzzing job. In this way, we can trade resource and energe for time in software testing. Our method is based on two facts. First, parallel computing is ubiquitous nowdays. We can get massive cheap computing resource easily (e.g., by using the Amazon spot instance []). Second, time is much valuable in software testing and security. Consider the situation a newly developped software is about to be release. It is worth to spend more money to make the release on schedule. Besides, parallel fuzzing optimzation is orthogonal with algorithm improve approaches. Any improved fuzzing algorithm can be easily applied in a parallel fuzzing system.
Current parallel fuzzing approaches are either too old or have drawbacks. Grid fuzzer leverage grid computing to parallel fuzz jobs []. It distribute fuzzing tasks statically. This method is not suitable for greybox fuzzing as work cannot be statically distributed as the feedback mechanism of greybox fuzzing. XXX presents a distributed fuzzing framework which can manage the computing resources in a cluster and schedule the resources to many submitted fuzzing jobs. However, this work is not intend to accelerate a single fuzzing job. XXX and XXX can parallel AFL to fuzz a single software with distributed nodes. However, they only parallel the random mutation part of ALF and fail to parallel the certain mutation part. Current parallel fuzzing approaches are either too old or have drawbacks. Grid fuzzer leverages grid computing to parallelize fuzz jobs []. It distributes fuzzing tasks statically. This method is not suitable for greybox fuzzing as work cannot be statically determined because of the feedback mechanism in greybox fuzzing. XXX presents a distributed fuzzing framework which can manage the computing resources in a cluster and schedule the resources to many submitted fuzzing jobs. However, this work is not intend to accelerate a single fuzzing job [Qinghua]. XXX and XXX can parallel AFL to fuzz a single software with distributed computing nodes. However, they only parallel the random mutation part of ALF and fail to parallel the certain mutation part.
In this paper, we intend to design a fully parallelized greybox fuzzer and take the advantages of distributed computing nodes to speedup the fuzzing process of a software. The challenges are (1) how to distributed work load to different nodes, (2) how to synchronize fuzzing status, e.g., seeds, shared data structures, etc., and (3) In this paper, we intend to design a fully parallelized greybox fuzzer and take the advantages of distributed computing nodes to speedup the fuzzing process of a software. The challenges are (1) how to automatically deploy the fuzzing environment to all the nodes in a distributed system, (2) how to balance workloads assigned to different nodes, and (3) how to synchronize fuzzing status, e.g., seeds, shared data structures, etc. We design and implement p-fuzz to solve these challenges. We use docker [] to build a fuzzing environment and copy this environment to all fuzzing nodes in the distributed system. We apply a contending strategy to dynamically distribute workloads to different nodes, i.e., each node will contend for new seed to fuzz the program. We leverage a key-value database to share seeds and other data structures. A fuzzing node fetches seed from the database and begins its fuzzing process: it first mutates the seed to generate test cases, then it uses each test case to run the software and monitor its execution. When it finds new executed branches, it add the corresponding test case to the database as a new seed.
What we do We implement our design and test it using the LAVA benchmarks. The result show that ....
Furthermore, existing fuzzers have been effective mainly in discovering superficial vulnerabilities and fail to uncover the vulnerabilities in deep paths without valid guidance. Through collecting the feedback information of target programs, grey-box fuzzers show the competitiveness of mutating test cases with valid guidance. It is implemented by lightweight instrumentation or other mechanisms to get program execution feedback, such as code coverage for the fuzzing process. American Fuzz Lop(AFL) is a state-of-the-art grey-box fuzzer whose principles are speed, reliability, and ease of use. AFL instruments the compiled program to get the edge coverage information. It adopts a deterministic strategy and non-deterministic strategy to generate test cases by mutating input seeds. The “interesting” change will be recorded for further detecting.
Several works achieved significant results based on making an extension of the AFL. Böhme et al. designed AFL-fast which assigned more mutation energy to interesting paths. Gan et al introduced CollAFL which mitigated the path collisions by providing more accurate coverage information. Böhme also implemented a directed grey-box fuzzing tool AFLGO towards the dangerous locations which tend to produce vulnerabilities. All of these extensions gained higher coverage and found more bugs than AFL. Zhang et al. leverage hardware mechanism (Intel Processor Trace) to collect branch information, and feed this information back to the fuzzing process.
Through collecting the feedback information of target programs, grey-box fuzzers show the competitiveness of mutating test cases with valid guidance. It is implemented by lightweight instrumentation or other mechanisms to get program execution feedback, such as code coverage for the fuzzing process. American Fuzz Lop(AFL) is a state-of-the-art grey-box fuzzer whose principles are speed, reliability, and ease of use. AFL instruments the compiled program to get the edge coverage information. It adopts a deterministic strategy and non-deterministic strategy to generate test cases by mutating input seeds. The “interesting” change will be recorded for further detecting.
Nevertheless, there are some common challenges in single-computer structure fuzzing. Nevertheless, there are some common challenges in single-computer structure fuzzing.
...@@ -98,7 +101,7 @@ Nevertheless, there are some common challenges in single-computer structure fuzz ...@@ -98,7 +101,7 @@ Nevertheless, there are some common challenges in single-computer structure fuzz
\item The fuzzing process contains too many tedious mutations. \item The fuzzing process contains too many tedious mutations.
\item The fuzzing efficiency and effectiveness are relatively low because of executing too much-repeated work. \item The fuzzing efficiency and effectiveness are relatively low because of executing too much-repeated work.
\end{itemize} \end{itemize}
In order to address the challenges, many researchers leverage parallel computing technology to speed up the fuzzing process. Parallel computing indicates a type of computation where many calculations are executed simultaneously. It collects a group of computing resources to decompose the heavy fuzzing task. Test cases produce by mutation are allocated to each computer, which balances the system workload. Some researchers have proceeded works about parallel or distributed fuzzing. Xie using grid computing for large scale fuzzing in 2010, which reduce almost two-thirds of fuzzing time. It was implemented by dividing fuzzing jobs into tasks, storing them in a server and scheduling remote clients to download them. Lian et al. proposed a dynamic resource-aware approach for parallel fuzzing. Some distributed fuzzing tools based on the parallel function of AFL are implemented in the client-server model. The clients synchronize their queue to server continuously which benefit from each other's work. In order to address the challenges, many researchers leverage parallel computing technology to speed up the fuzzing process. Test cases produce by mutation are allocated to each computer, which balances the system workload. Some researchers have proceeded works about parallel or distributed fuzzing. Xie using grid computing for large scale fuzzing in 2010, which reduce almost two-thirds of fuzzing time. It was implemented by dividing fuzzing jobs into tasks, storing them in a server and scheduling remote clients to download them. Lian et al. proposed a dynamic resource-aware approach for parallel fuzzing. Some distributed fuzzing tools based on the parallel function of AFL are implemented in the client-server model. The clients synchronize their queue to server continuously which benefit from each other's work.
Despite the parallel computing technology accelerate the fuzzing process, it faces several challenges and difficulties: Despite the parallel computing technology accelerate the fuzzing process, it faces several challenges and difficulties:
...@@ -111,6 +114,11 @@ Despite the parallel computing technology accelerate the fuzzing process, it fac ...@@ -111,6 +114,11 @@ Despite the parallel computing technology accelerate the fuzzing process, it fac
By summarizing these research above, we propose a parallel fuzzing platform —p-fuzz, which not only alleviates the problems but also speeds up the fuzzing process by leveraging abundant parallel resources. By summarizing these research above, we propose a parallel fuzzing platform —p-fuzz, which not only alleviates the problems but also speeds up the fuzzing process by leveraging abundant parallel resources.
\section{background} \section{background}
\subsection{Parallel computing technology} \subsection{Parallel computing technology}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment