working on Introduction.

72fac147 · Xu Zhou · 2093d49d · 2093d49d · 72fac147
Commit 72fac147 authored Feb 28, 2019 by Xu Zhou
Hide whitespace changes
Inline Side-by-side

Showing with 18 additions and 3 deletions

.DS_Store Conference-LaTeX-template_7-9-18/.DS_Store +0 -0

conference_041818.tex Conference-LaTeX-template_7-9-18/conference_041818.tex +18 -3

No files found.
--- a/Conference-LaTeX-template_7-9-18/.DS_Store
+++ b/Conference-LaTeX-template_7-9-18/.DS_Store
--- a/Conference-LaTeX-template_7-9-18/conference_041818.tex
+++ b/Conference-LaTeX-template_7-9-18/conference_041818.tex
@@ -17,7 +17,7 @@
    T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
 \begin{document}

-\title{p-fuzz: an efficient fuzzing tool with parallel computing mechanism \\
+\title{p-fuzz: an efficient greybox-fuzzing tool with parallel computing mechanism \\
 {\footnotesize \textsuperscript{*}Note: Sub-titles are not captured in Xplore and
 should not be used}
 \thanks{Identify applicable funding agency here. If none, delete this.}
@@ -67,18 +67,33 @@ Fuzz is an effective technology in software testing and security vulnerability d
 \end{abstract}

 \begin{IEEEkeywords}
-component, formatting, style, styling, insert
+fuzz, parallel computing, AFL, vulnerability
 \end{IEEEkeywords}

 \section{Introduction}
 Nowadays, software is applied widely in our life, which accompanied with the occurrence of security vulnerabilities. Attackers always utilize these bugs and errors in codes to make target crash or grab some sensitive data. Therefore, it is urgently needed for us to pay close attention to find an effective approach to test software.

-Software testing has two major technologies: symbolic execution and fuzzing. The symbolic execution abstracts the input values as symbols, which could lead symbolic engine to explore as many as possible execution paths at the same time. Then getting the results by solving constraints. However, the whole process of symbolic execution results in state space explosion which is still a bottleneck.  As opposed to symbolic execution, fuzzing provides invalid, unexpected or random data to the inputs of a program which significantly enhances the performance of software testing. Fuzzing tools can be classified into three types based on the knowledge and information acquired from the source code of target programs, they are white-box, black-box, and grey-box fuzzer. The white-box fuzzer has full knowledge of the source code (eg. internal logic and structure) and uses the control structure of the procedural design to derive test cases. In contrast, The black-box fuzzer doesn’t have any knowledge of source code but it generates test cases randomly and swiftly. The grey-box fuzzer try to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of target programs. 
+Fuzzing is an effective method to test bugs or find security vulnerabilities in software. To test a program, a fuzzing tool usually first generates a lot of different test cases. Then it uses each test case as input to run the target and monitor whether the execution get some exceptions. In this way, it detects  bugs or security vulnerabilities. Many big companies use fuzzing in their software testing, such as Microsoft [sage], Google [oss], Amazon [Cloud9] etc.   Fuzzing is popular in research and security community [AFL]. Thousands of security vulnerabilities are desceovered by fuzzing [AFL, CVE list]. According to the knowledge and information acquired from the target programs, fuzzing can be divied into white-box fuzzing, black-box fuzzing, and greybox fuzzing. A white-box fuzzer has full knowledge of the source code (eg. internal logic and structure) and uses the control structure of the procedural design to derive test cases. Fuzzers that use symbolic execution [] to accurately direct the execution path of a program execution can also be classified into white-box fuzzing. In contrast, The black-box fuzzer doesn’t have any knowledge of target program, thus it generates test cases randomly and swiftly. The grey-box fuzzer try to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of target programs. Currently, grey-box fuzzing is practical and widely used in testing and vulnearability detection as it is lightweight, fast and easy-to-use []. In this paper, we focus on improving grey-box fuzzing.
+
+Greybox-fuzzing usually have three stages: (1) it select a seed from the seed list and mutate the seed to generate a group of test cases, (2) it feed the program with each test case and run it to collect the execution information (paths, branches, etc.), and (3) it uses the execution information to select new seeds from test cases (test case that trigger new branch is selected as seed).  A typical greybox-fuzzing tool is American Fuzzy Lop (AFL) []. In AFL, the  information it collect during each run is executed branches. It compress the branches into a 64K bitmap to make bitmap access fast. 
+
+Like AFL, greybox fuzzing is simple and effective. However, greybox-fuzzing is still compute-intensive and cost a lot of CPU hous to fully test a program. For example, it usually takes several days or even several months to find the bug of a program. Current researches mainly forcus on improving the algorithms to make fuzzing faster []. CollAFL solves the bitmap confliction problem of AFL and make seed selection sensitive to all new branches. ...
+
+Unlike all those works, we see this efficiency problem of greybox fuzzing in a different point of view. Instead of trying to improve a single fuzzing instrance, we try to parallize fuzzing instrances to accelerate a single fuzzing job. In this way, we can trade resource and energe for time in software testing. Our method is based on two facts. First, parallel computing is ubiquitous nowdays. We can get massive cheap computing resource easily (e.g., by using the Amazon spot instance). Second, time is much valuable in software testing and security. Consider the situation a newly developped software is about to be release. It is worth to speed more cost to make the release on schedule. Besides, parallel fuzzing optimzation is orthogonal with algorithm improve approaches. Any improved fuzzing algorithm can be easily applied in our parallel fuzzing system. 
+
+Current parallel fuzzing approaches are either too old or have drawbacks. Grid fuzzer leverage grid computing to parallel fuzz jobs []. It distribute fuzzing tasks statically. This method is not suitable for greybox fuzzing as work cannot be statically distributed as the feedback mechanism of greybox fuzzing. XXX presents a distributed fuzzing framework which can manage the computing resources in a cluster and schedule the resources to many submitted fuzzing jobs. However, this work  is not intend to accelerate a single fuzzing job. XXX and XXX can parallel AFL to fuzz a single software with distributed nodes. However, they only parallel the random mutation part of ALF and fail to parallel the certain mutation part.
+
+In this paper, we intend to design a fully parallelized greybox fuzzer and take the advantages of distributed computing nodes to speedup the fuzzing process of a software. The challenges are (1) how to distributed work load to different nodes, (2) how to synchronize fuzzing status, e.g., seeds, shared data structures, etc., and (3) 
+
+What we do 
+
+

 Furthermore, existing fuzzers have been effective mainly in discovering superficial vulnerabilities and fail to uncover the vulnerabilities in deep paths without valid guidance. Through collecting the feedback information of target programs, grey-box fuzzers show the competitiveness of mutating test cases with valid guidance. It is implemented by lightweight instrumentation or other mechanisms to get program execution feedback, such as code coverage for the fuzzing process. American Fuzz Lop(AFL)  is a state-of-the-art grey-box fuzzer whose principles are speed, reliability, and ease of use. AFL instruments the compiled program to get the edge coverage information. It adopts a deterministic strategy and non-deterministic strategy to generate test cases by mutating input seeds. The “interesting” change will be recorded for further detecting. 
 Several works achieved significant results based on making an extension of the AFL. Böhme et al. designed AFL-fast which assigned more mutation energy to interesting paths. Gan et al introduced CollAFL which mitigated the path collisions by providing more accurate coverage information.  Böhme also implemented a directed grey-box fuzzing tool AFLGO  towards the dangerous locations which tend to produce vulnerabilities. All of these extensions gained higher coverage and found more bugs than AFL. Zhang et al. leverage hardware mechanism (Intel Processor Trace) to collect branch information, and feed this information back to the fuzzing process. 

 Nevertheless, there are some common challenges in single-computer structure fuzzing.
+
 \begin{itemize}
 \item The fuzzing process contains too many tedious mutations.
 \item The fuzzing efficiency and effectiveness are relatively low because of executing too much-repeated work.