Commit b0564648 by songcongxi

提交版

parent 7b995993
......@@ -4,7 +4,7 @@
\usepackage{cite}
\usepackage{amsmath,amssymb,amsfonts}
\usepackage{colortbl}
\usepackage{graphicx}
\usepackage{textcomp}
\usepackage{xcolor}
......@@ -15,55 +15,50 @@
\usepackage{verbatim}
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
\definecolor{mygray}{gray}{.9}
\begin{document}
\title{p-fuzz: an efficient greybox-fuzzing tool with parallel computing mechanism \\
{\footnotesize \textsuperscript{*}Note: Sub-titles are not captured in Xplore and
should not be used}
\thanks{Identify applicable funding agency here. If none, delete this.}
\title{P-fuzz: a parallel sustainable grey-box fuzzing tool \\
{\footnotesize \textsuperscript{*}}
\thanks{This work is partially supported by the The National Key Research and Development Program of China (2016YFB0200401), by program for New Century Excellent Talents in University, by National High-level Personnel for Defense Technology Program (2017-JCJQ-ZQ-013), by the HUNAN Province Science Foundation 2017RS3045.}
}
\author{\IEEEauthorblockN{1\textsuperscript{st} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address}
\author{\IEEEauthorblockN{1\textsuperscript{st} Congxi Song}
\IEEEauthorblockA{\textit{National University of Defense Technology (NUDT)} \\
Changsha, China \\
congxi1994@sohu.com}
\and
\IEEEauthorblockN{2\textsuperscript{nd} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address}
\IEEEauthorblockN{2\textsuperscript{nd} Xu Zhou}
\IEEEauthorblockA{\textit{National University of Defense Technology (NUDT)} \\
Changsha, China \\
zhouxu@nudt.edu.cn}
\and
\IEEEauthorblockN{3\textsuperscript{rd} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address}
\IEEEauthorblockN{3\textsuperscript{rd} Qidi Yin}
\IEEEauthorblockA{\textit{National University of Defense Technology (NUDT)} \\
Changsha, China \\
robertjames@qq.com}
\and
\IEEEauthorblockN{4\textsuperscript{th} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address}
\IEEEauthorblockN{4\textsuperscript{th} Hangwei Zhang}
\IEEEauthorblockA{\textit{National University of Defense Technology (NUDT)} \\
Changsha, China \\
2436492601@qq.com}
\and
\IEEEauthorblockN{5\textsuperscript{th} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address}
\IEEEauthorblockN{5\textsuperscript{th} Xinglu He}
\IEEEauthorblockA{\textit{University of Defense Technology (NUDT)} \\
Changsha, China \\
hexinglu1992@gmail.com}
\and
\IEEEauthorblockN{6\textsuperscript{th} Given Name Surname}
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
\textit{name of organization (of Aff.)}\\
City, Country \\
email address}
\IEEEauthorblockN{6\textsuperscript{th} Kai Lu}
\IEEEauthorblockA{\textit{University of Defense Technology (NUDT)} \\
Changsha, China \\
kailu@nudt.edu.cn}
}
\maketitle
\begin{abstract}
Fuzz is an effective technology in software testing and security vulnerability detection. Unfortunately, fuzzing is an extremely compute-intensive job, which may cause thousands of computing hours to find a bug. Current novel works generally improve fuzzing efficiency by developping delicate algorithms. In this paper, we propose another direction of improvement in this filed, i.e., leveraging parallel computing to improve fuzzing effiency. In this way, we develop p-fuzz, a parallel fuzzing system that can utilize massive distributed computing resources to fuzz a single program. p-fuzz uses a no-sql database to share the fuzzing status such as seeds, covered paths, etc. All fuzzing nodes get jobs from the database, and update their fuzzing status to the database. We control the synchronization period to a coarse granularity so that the database will not be a bottleneck. P-fuzz is implemented based on AFL. We compare p-fuzz with AFL and XXX in our experiment. The result shows that we can easily gain a speedup of AFL by simply using 4 nodes, i.e., using 3X more resources.
Fuzzing is an effective technology in software testing and security vulnerability detection. Unfortunately, fuzzing is an extremely compute-intensive job, which may cause thousands of computing hours to find a bug. Current novel works generally improve fuzzing efficiency by developing delicate algorithms. In this paper, we propose another direction of improvement in this filed, i.e., leveraging parallel computing to improve fuzzing efficiency. In this way, we develop P-fuzz, a parallel fuzzing system that can utilize massive distributed computing resources to fuzz a single program. P-fuzz uses a database to share the fuzzing status such as seeds, covered branches information, etc. All fuzzing nodes get jobs from the database and update their fuzzing status to the database. We control the synchronization period to a coarse granularity so that the database will not be a bottleneck. P-fuzz is implemented based on AFL. We compare P-fuzz with AFL and Roving in our experiment. The result shows that P-fuzz can easily speed up AFL about 2.59X and Roving about 1.66X on average by using 4 nodes, i.e., using 3X more resources.
\end{abstract}
\begin{IEEEkeywords}
......@@ -71,439 +66,706 @@ fuzz, parallel computing, AFL, vulnerability
\end{IEEEkeywords}
\section{Introduction}
Nowadays, software is applied widely in our life, which accompanied with the occurrence of security vulnerabilities. Attackers always utilize these bugs and errors in codes to make target crash or grab some sensitive data. Therefore, it is urgently needed for us to pay close attention to find an effective approach to test software.
Fuzzing is an effective method to find bugs or security vulnerabilities in software. To test a program, a fuzzing tool usually first generates a lot of different test cases. Then it uses each test case as input to run the target program and monitor whether the execution get exceptions. In this way, it detects bugs or security vulnerabilities. Many big companies use fuzzing in their software testing, such as Microsoft [sage], Google [oss], Amazon [Cloud9], etc. Fuzzing is popular in research and security community [AFL]. Thousands of security vulnerabilities are discovered by fuzzing [AFL, CVE list]. According to the knowledge and information acquired from the target programs, fuzzing can be divied into white-box fuzzing, black-box fuzzing, and greybox fuzzing. A white-box fuzzer has full knowledge of the source code (e.g., internal logic and data structures) and uses the control structure of the procedural design to derive test cases. Fuzzers that use symbolic execution [] to accurately direct the execution path of a program can also be classified into white-box fuzzing. In contrast, The black-box fuzzer doesn’t have any knowledge of target program, thus it generates test cases randomly and swiftly. The grey-box fuzzer tries to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of target programs. Currently, grey-box fuzzing is practical and widely used in testing and vulnearability detection as it is lightweight, fast and easy-to-use []. In this paper, we focus on improving grey-box fuzzing.
Greybox-fuzzing usually have three stages: (1) it selects a seed from the seed list and mutates the seed to generate a group of test cases, (2) it feeds the program with each test case and executes it to collect the feedback information (paths, branches, etc.), and (3) it uses the feedback information to select new seeds from test cases (test case that triggers new branch is selected as new seed). A typical greybox-fuzzing tool is American Fuzzy Lop (AFL) []. In AFL, the information it collects during each execution is the taken branches. It compresses the taken-branch information into a 64K bitmap to make its accessing fast.
Greybox fuzzing like AFL is simple and effective. However, greybox-fuzzing is still compute-intensive and cost a lot of CPU hous to fully test a program. For example, it may take several days or even several months to find the bug of a program. Current researches mainly forcus on improving the algorithms of AFL to make it faster [...]. Böhme et al. designed AFL-fast which assigned more mutation energy to interesting paths []. Gan et al introduced CollAFL which mitigated the path collisions by providing more accurate coverage information []. Böhme also implemented a directed grey-box fuzzing tool AFLGO towards the dangerous locations which tend to produce vulnerabilities. Zhang et al. leverage hardware mechanism (Intel Processor Trace) to collect branch information, and feed this information back to the fuzzing process []. All of these extensions gained higher coverage and found more bugs than AFL.
Unlike all those works, we see this efficiency problem of greybox fuzzing in a different point of view. Instead of trying to improve a single fuzzing instrance, we try to parallelize fuzzing instrances to accelerate a single fuzzing job. In this way, we can trade resource and energe for time in software testing. Our method is based on two observations. First, parallel computing is ubiquitous nowdays. We can get massive cheap computing resource easily (e.g., by using the Amazon spot instance []). Second, time is much valuable in software testing and security. Consider the situation a newly developped software is about to be released. It is worth to spend more money than usual to make the release on schedule. Besides, parallel fuzzing optimzation is orthogonal with algorithm improve approaches. Any improved fuzzing algorithm can be easily applied in a parallel fuzzing system.
However, current parallel fuzzing approaches are either too old or have drawbacks. Grid fuzzer leverages grid computing to parallelize fuzz jobs []. It distributes fuzzing tasks statically. This method is not suitable for greybox fuzzing, as work cannot be statically determined beforehand due to the feedback mechanism in greybox fuzzing. XXX presents a distributed fuzzing framework which can manage the computing resources in a cluster and schedule the resources to many submitted fuzzing jobs. However, they are not intend to accelerate a single fuzzing job [Qinghua]. XXX and XXX can parallel AFL to fuzz a single software with distributed computing nodes. However, they only parallellize the random mutation part of AFL and fail to parallelize the certain mutation part [].
In this paper, we intend to design a fully parallelized greybox fuzzer and take the advantages of distributed computing to speedup the fuzzing process of a software. The challenges are (1) how to automatically deploy the fuzzing environment to all the nodes in a distributed system, (2) how to balance workloads assigned to different nodes, and (3) how to synchronize fuzzing status, e.g., seeds, shared data structures, etc. We design and implement p-fuzz to solve these challenges. We use docker [] to build a fuzzing environment and copy this environment to all fuzzing nodes in the distributed system. We apply a contending strategy to dynamically distribute workloads to different nodes, i.e., each node will contend for new seed to fuzz the program. We leverage a key-value database to share seeds and other data structures. A fuzzing node fetches seed from the database and begins its fuzzing process: it first mutates the seed to generate test cases, then it uses each test case to run the software and monitor its execution. When it finds new executed branches, it add the corresponding test case to the database as a new seed.
We implement our design and test it using the LAVA benchmarks. The result show that ....
Nowadays, software is applied widely in our life, which accompanied with the occurrence of security vulnerabilities\cite{b1}. Attackers always utilize these bugs and errors in codes to attack systems or grab some sensitive data. Therefore, it is urgently needed for us to pay attention to find an effective approach to test software\cite{b19}.
%Through collecting the feedback information of target programs, grey-box fuzzers show the competitiveness of mutating test cases with valid guidance. It is implemented by lightweight instrumentation or other mechanisms to get program execution feedback, such as code coverage for the fuzzing process. American Fuzz Lop(AFL) is a state-of-the-art grey-box fuzzer whose principles are speed, reliability, and ease of use. AFL instruments the compiled program to get the edge coverage information. It adopts a deterministic strategy and non-deterministic strategy to generate test cases by mutating input seeds. The “interesting” change will be recorded for further detecting.
Fuzzing is an effective method to find bugs or security vulnerabilities in software\cite{b2}. To test a program, a fuzzing tool usually first generates a lot of different test cases. Then it uses each test case as input to run the target program and monitor whether the execution gets exceptions. In this way, it detects bugs or security vulnerabilities. Many big companies use fuzzing in their software testing, such as Microsoft\cite{b3}, Google\cite{b4}, Amazon\cite{b5}, etc. Fuzzing is popular in research and security community\cite{b6}. Thousands of security vulnerabilities are discovered by fuzzing. According to the knowledge and information acquired from the target programs, fuzzing can be divided into white-box fuzzing, black-box fuzzing, and grey-box fuzzing. A white-box fuzzer has full knowledge of the source code (e.g., internal logic and data structures) and uses the control structure of the procedural design to derive test cases. Fuzzers that use symbolic execution to accurately direct the execution path of a program can also be classified into white-box fuzzing. In contrast, the black-box fuzzer doesn’t have any knowledge of the target program, thus it generates test cases randomly and swiftly. The grey-box fuzzer tries to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of target programs. Currently, grey-box fuzzing is practical and widely used in testing and vulnerability detection as it is lightweight, fast and easy-to-use\cite{b1}. In this paper, we focus on improving grey-box fuzzing.
%Nevertheless, there are some common challenges in single-computer structure fuzzing.
Grey-box fuzzing usually has three stages:
\begin{enumerate}
\item It selects a seed from the seed list and mutates the seed to generate a group of test cases.
\item It feeds the program with each test case and executes it to collect the feedback information (paths, branches, etc.).
\item It uses the feedback information to select new seeds from test cases (test case that triggers a new branch is selected as new seed).
\end{enumerate}
%\begin{itemize}
%\item The fuzzing process contains too many tedious mutations.
%\item The fuzzing efficiency and effectiveness are relatively low because of executing too much-repeated work.
%\end{itemize}
%In order to address the challenges, many researchers leverage parallel computing technology to speed up the fuzzing process. Test cases produce by mutation are allocated to each computer, which balances the system workload. Some researchers have proceeded works about parallel or distributed fuzzing. Xie using grid computing for large scale fuzzing in 2010, which reduce almost two-thirds of fuzzing time. It was implemented by dividing fuzzing jobs into tasks, storing them in a server and scheduling remote clients to download them. Lian et al. proposed a dynamic resource-aware approach for parallel fuzzing. Some distributed fuzzing tools based on the parallel function of AFL are implemented in the client-server model. The clients synchronize their queue to server continuously which benefit from each other's work.
A typical grey-box fuzzing tool is American Fuzzy Lop (AFL)\cite{b6}. In AFL, the information it collects during each execution is the taken branches. It compresses the taken-branch information into a 64K bitmap to make it accessing fast.
Grey-box fuzzing like AFL is simple and effective. However, grey-box fuzzing is still compute-intensive and costs a lot of CPU hours to fully test a program. For example, it may take several days or even several months to find the bug of a program. Current researches mainly focus on improving the algorithms of AFL to make it faster. Böhme et al. designed AFL-fast which assigned more mutation energy to interesting paths\cite{b8}. Gan et al. introduced CollAFL which mitigated the path collisions by providing more accurate coverage information\cite{b9}. Böhme also implemented a directed grey-box fuzzing tool AFLGO towards the dangerous locations which tend to produce vulnerabilities\cite{b10}. Zhang et al. leverage hardware mechanism (Intel Processor Trace) to collect branch information, and feed this information back to the fuzzing process\cite{b1}. All of these extensions gained higher coverage and found more bugs than AFL.
%Despite the parallel computing technology accelerate the fuzzing process, it faces several challenges and difficulties:
%\begin{itemize}
%\item The client computers are always fuzzing same test cases caused the concurrency and race.
%\item The transferring speed is limited while the quantity of common resources is large enough.
%\item Fixed synchronizing time entails the response latency to a new test case.
%\item All clients share the same seeds get from the server. The approach tires clients and entails the low resources utilization of test cases.
%\end{itemize}
%By summarizing these research above, we propose a parallel fuzzing platform —p-fuzz, which not only alleviates the problems but also speeds up the fuzzing process by leveraging abundant parallel resources.
Unlike all those works, we see this efficiency problem of grey-box fuzzing in a different point of view. Instead of trying to improve a single fuzzing instance, we try to parallelize fuzzing instances to accelerate a single fuzzing job. In this way, we can trade resource and energy for time in software testing. Our method is based on two observations. First, parallel computing is ubiquitous nowadays\cite{b20}\cite{b21}. We can get massive cheap computing resource easily (e.g., by using the Amazon spot instance\cite{b11}. Second, time is valuable in software testing and security. Considering the situation a newly developed software is about to be released, it is worthy to spend more money than usual to make the release on schedule. Besides, parallel fuzzing optimization is orthogonal with algorithm improve approaches. Any improved fuzzing algorithm can be easily applied in a parallel fuzzing system.
However, current parallel fuzzing approaches are either too old or have drawbacks. Grid fuzzer leverages grid computing to parallelize fuzz jobs\cite{b12}. It distributes fuzzing tasks statically. This method is not suitable for grey-box fuzzing, as work cannot be statically determined beforehand due to the feedback mechanism in grey-box fuzzing. Liang et al. presented a distributed fuzzing framework which can manage the computing resources in a cluster and schedule the resources to many submitted fuzzing jobs. However, it doesn’t intend to accelerate a single fuzzing job\cite{b13}. Roving\cite{b14} and Distributed fuzzing for AFL\cite{b15} can parallel AFL to fuzz a single software with distributed computing nodes. However, they only parallelize the non-deterministic mutation part of AFL and fail to parallelize the deterministic mutation part. Also, they synchronize the shared data in a fixed time period, but not synchronizing immediately when new data is generated.
In this paper, we intend to design a fully parallelized grey-box fuzzer and take the advantages of parallel computing to speed up the fuzzing process of software. The challenges are shown as follows:
\begin{enumerate}
\item How to automatically deploy the fuzzing environment to all the nodes in a distributed system.
\item How to balance workloads assigned to different nodes.
\item How to synchronize fuzzing status, e.g., seeds, covered branches information, etc.
\end{enumerate}
We design and implement P-fuzz to solve these challenges. We use Docker\cite{b16} to build a fuzzing environment and copy this environment to all fuzzing nodes in the distributed system. We apply a contending strategy to dynamically distribute workloads to different nodes, i.e., each node will contend for new seed to fuzz the program. We leverage a key-value database to share seeds and other data structures such as the bitmap. A fuzzing node fetches a seed from the database and begins its fuzzing process: it first mutates the seed to generate test cases, then it uses each test case to run the software and monitor its execution. When it finds new executed branches, it adds the corresponding test case to the database as a new seed. Also, it updates feedback information to the database to share benefits with other nodes.
We implement our design and test it using nine target programs and LAVA-M data benchmarks. The result shows that P-fuzz outperforms the AFL and Roving. It enhances the map density of AFL about 2.59X, and of Roving about 1.66X. It also triggers 49 crashes in target programs.
\section{background}
\subsection{Parallel computing technology}
Parallel computing specifies a type of computation where many calculations are executed simultaneously. According to different granularities, parallel computing can be classified into bit-level, instruction-level, data-level, and task-level parallelism.
The task-level parallelism means a large job can be divided into several small tasks, and each node of parallel computers gets a piece of the task and execute it. In this paper, we talk about task-level parallelism.
The origin of parallel computing technology dates back to 1950s. John Cocke and Daniel Slotnick discuss the use of parallelism in numerical calculations. In the 1970s, The e-mail was invented and became the earliest and most successful example of a large-scale distributed application in ARPANET. By demanding and increase exponentially, distributed and parallel computing became its own branch of computer science in the 1980s.
In the 1990s, client-server architecture appeared and became popular. After 2000, With the big data era coming, grid computing and cloud compu{}ting provided various, massive and prompt services by their extraordinary computing ability. Nowadays, supercomputer plays an important role in computing. It handles a wide range of computationally intensive tasks in various fields.
To be specific. a distributed parallel application always falls into one of several basic architectures: client-server, three-tier, n-tier, or peer-to-peer and database-centric Architecture. Database-centric architecture specifies the software architecture in which databases is a core of the whole system. The architecture provides reliability, performance, and capacity, and scalability. We implement the p-fuzz platform with the help of database-centric architecture, and details of implementation are discussed in section \ref{alogrithm}.
\section{background}\label{background}
\subsection{The details about AFL}
In this section, we will discuss fuzzing techniques and AFL in detail. These limitations motivate us to propose our p-fuzz.
American Fuzzy Lop(AFL) is an instrumentation-guided grey-box fuzzer. The fuzzer stays brute-force that makes the fuzzing process keep speed.
AFL is designed for such goals, i.e,
In this section, we will discuss American Fuzzy Lop(AFL)\cite{b6} in detail. AFL is an instrumentation-guided grey-box fuzzer. The fuzzer stays brute-force that makes the fuzzing process keep speed. AFL is designed for such goals, i.e.,
\begin{itemize}
\item Speed: adopting an appropriate instrument approach not only gives guidance to fuzzing but also keeps its native speed.
\item Rock-solid reliability, it adapts to real-world targets.
\item Simplicity. AFL is simple and user-friendly.
\item Rock-solid reliability: it adapts to real-world targets.
\item Simplicity: AFL is simple and user-friendly.
\end{itemize}
There are two things we need to care about: the \textbf{seed} and \textbf{bitmap}.
\subsubsection{Seed}
Seed indicates the test cases which trigger the fuzzer to traverse new interesting paths. A queue is maintained to store the seeds. The high-quality corpus of candidate files will be selected as interesting seeds for further rounds.
Seeds indicate the test cases which trigger the fuzzer to traverse new branches. A queue in AFL is maintained to store the seeds. The high-quality corpus of candidate files will be selected as interesting seeds for further fuzzing.
\subsubsection{Bitmap}
Before talking about bitmap, we need to define the “new branch”. If there are three basic blocks A, B, C in a program, the tuple (AB, BC) describes a branch transition. A new branch indicates a branch transition which doesn’t appear before.
In AFL, if a new branch is triggered by a test case, it is considered as a new path. All of the branches information is recorded in the bitmap. Bitmap describes the coverage of fuzzing, which stored in shared memory. the index of a bitmap is produced by previous basic block and current basic block. How to index a branch transition is shown as following. \emph{A} and \emph{B} represent previous basic block and current basic block respectively.
The bitmap describes the coverage of fuzzing, which stored in the shared memory. The size of the shared memory is 64K. A bit in the bitmap is a branch, which connects two basic blocks. We use a tuple to express a branch. For example, there are basic block \emph{A} and \emph{B}, then a \emph{tuple(A, B)} means a branch from A to B. If a test case hits a new branch, the bitmap will record this hit by changing the corresponding bit from 1 to 0. A mechanism to index the bitmap is shown as eq.\ref{eq1}. \emph{A} and \emph{B} represent previous basic block and current basic block respectively. By simply reading the bitmap\cite{b1}, AFL knows whether a hit branch is new or not and decides to save or discard a test case.
\begin{equation}
(A\oplus B)\%BITMAP\_SIZE\label{eq1}
\end{equation}
However, AFL is inaccurate because of the path collision caused by an infinite space of bitmap. It prevents AFL from discovering potential paths that lead to new crashes.
However, according to this mechanism, AFL is inaccurate because of the collisions caused by an infinite space of bitmap. It prevents AFL from discovering potential paths and finding more crashes.
Moreover, AFL runs deterministic mutations and non-deterministic mutations. The deterministic mutation strategies include bit flips, insertions, deletions, arithmetic, which are designed to produce compact test cases and small diffs between the non-crashing and crashing inputs\cite{b24}. The non-deterministic mutation strategies include stacked operations above mentioned and splicing.
Non-deterministic mutation strategies can make fuzzing achieve high coverage rapidly. Roving relies on the non-determinism of AFL to cover more paths faster\cite{b14}. However, for fuzzing a target program whose input files are in a complex format, the random mutations will destroy the format of files. Therefore, utilizing appropriate strategies to fuzz different programs is necessary.
\subsection{The discussion of parallel mechanism in fuzzing}
In fact, if you only put a single job on a multi-core system or a multi-machine computing group, you will be underutilizing the hardware. At this time, parallel the computing resources can make full use of hardware and bring profit to low-efficiency fuzzing process. We will discuss how to make parallelization in this section.
In fact, just putting a testing job on a multi-core machine or a distributed system but running it on a core is underutilizing the hardware. At this time, to parallel the resources can make full use of hardware and bring profit to low-efficiency fuzzing process. We will discuss several methods to make parallelization in this section.
\subsubsection{A Naïve approach}
\begin{algorithm}
\begin{algorithmic} %每行显示行号
\caption{123}
\State $int i = 0$
\State $scanf("\%d",\&i)$
\If{$i > 0$}
\If{$i<100000 \&\& i>=1000$}
\If{$i>=5000 \&\& i<10000$}
\If{$i>=7000 \&\& i<8000$}
\If{$i>=7500 \&\& i<=7599$}
\If{$i>=7545 \&\& i<7571$}
\State $ printf("win")$
\EndIf
\EndIf
\EndIf
\EndIf
\EndIf
\EndIf
%\begin{algorithm}
%\begin{algorithmic} %每行显示行号
% \caption{123}
% \State $int i = 0$
% \State $scanf("\%d",\&i)$
% \If{$i > 0$}
% \If{$i<100000 \&\& i>=1000$}
% \If{$i>=5000 \&\& i<10000$}
% \If{$i>=7000 \&\& i<8000$}
% \If{$i>=7500 \&\& i<=7599$}
% \If{$i>=7545 \&\& i<7571$}
% \State $ printf("win")$
% \EndIf
% \EndIf
% \EndIf
% \EndIf
% \EndIf
% \EndIf
\end{algorithmic}
\end{algorithm}
A naïve approach to improve the efficiency of fuzzing is starting a group of fuzzers. However, the approach fails to produce a satisfying result and occupy resources,
we take a simple experiment to show this case. As is shown in algorithm.1 the simple program contains 6 levels branches.
If we only run a single AFL engine, it reaches the “win” location in 2 minutes and 34s. Compared with the single engine, we run two engines in two cores simultaneously, the time of reaching “win” location keeps at the same level.
Also, we respectively run an AFL engine and two AFL engines in two cores on a target program “uniq” in lava-m, and compare their bitmap density. In one hour, a single engine and two engines get same results. Both of them hit 214 new branches, which are shown in bitmaps.
%\end{algorithmic}
%\end{algorithm}
A naïve approach to improve the efficiency of fuzzing is starting a group of fuzzers together on a set of computing nodes. However, the approach not only occupies resources but also fails to produce a satisfying result. The reason is that each node doesn’t share seeds and feedback information with others. The nodes are independent, which can’t form a parallel fuzzing system.
\subsubsection{Previous parallel fuzzing application}
\subsubsection{Previous parallel fuzzing applications}
There are two fuzzing tools extent the parallel function in AFL.
One is Roving, which is implemented by running multiple copies of AFL on multiple machines in a cluster, all of them fuzzing the same target. It benefits from the client-server structure which shares crashes, hangs, and queues of each client. Every 300 seconds, the client update the fuzzing environment by uploading and downloading changes. The whole framework is scheduled by the central server.
There are two fuzzing tools extend the parallel function in AFL. One is Roving\cite{b14}, which is implemented by running multiple copies of AFL on multiple machines in a cluster, all of them fuzzing the same target. It benefits from the client-server structure which shares crashes, hangs, and queues of each client. Every 300 seconds, the client update the fuzzing environment by uploading and downloading changes. The whole framework is scheduled by the central server.
The other is distributed fuzzing. The main work of distributed fuzzing is similar to roving, and the difference of them is implementation. the sharing data is handled by PHP scripts in fuzzing server. All of the target projects are stored in a server. Clients download 1 or more projects(according to CPU cores) and AFL fuzzes program from the server.
New queue and hangs and crashes produced by a client will be synchronized to the server and downloaded by other clients in fixed time gap.
The other is distributed fuzzing\cite{b15}. The main work of distributed fuzzing is similar to Roving, and the difference of them is implementation. The sharing data is handled by PHP scripts in fuzzing server. All of the target projects are stored in a server. Clients download 1 or more projects (according to CPU cores) and AFL fuzzes program from the server. New queue, hangs, and crashes produced by a client will be synchronized to the server and downloaded by other clients in a fixed time period.
Although the two frameworks utilize the computing resources and parallel the fuzzing progresses, which makes each client benefits from each other's work, they have drawbacks as below.
Although the two frameworks utilize the computing resources and parallel the fuzzing progress, which makes each client benefits from each other's work, they have drawbacks as below.
\begin{itemize}
\item As time goes by, the seeds, queues, crashes and hangs entail the synchronizing speed slower and slower.
\item This kind of sharing mechanism makes all of the clients always fuzzing the same seeds.
\item The server accepts all data from clients updated, which will result in security problems.
\item All of the clients are always fuzzing the same set of seeds.
\item They only parallelize the deterministic mutation part of AFL and fail to parallelize the deterministic mutation part.
\item They synchronize the shared data in a fixed time period.
\item They ignore to share the feedback information.
\end{itemize}
\subsection{Concurrency and data race}
In parallelization computing, some uncontrolled accesses to shared data happen simultaneously, which results in race conditions. Data races are race conditions occur at memory access level, which are the most common causes of the concurrency errors and bugs.
\subsection{Data contending}
In parallelization computing, some uncontrolled accesses to shared data happen simultaneously, which results in contending conditions. Data contending occurs at memory access level, which is the most common causes of concurrency errors and bugs\cite{b22}.
In this paper, we focus on the contending in parallel fuzzing. The key to handling this problem is to tackle the sharing objects appropriately.
We list some typical contending cases:
\subsubsection{Several client nodes access the same seed}
Accessing the same seed during deterministic mutation phase produce massive repeated fuzzing, which wastes resources.
\subsubsection{Several client nodes update the bitmap in the database together}
The updated bitmaps have different changed locations. To merge the bitmap in the database, later updated bitmap may cover some valuable bits others have updated before.
To solve such contending cases, we propose a contending strategy which is shown in section \ref{algorithm}.
A data race occurs when two actions are accessing the same memory, and at least one of the two accesses is a write, and the sequence of accesses is not assured by synchronization primitives.
To prevent memory access from data race, some methods such as lock, semaphore, and mutex are adopted. The database always adopts transactions to solve some level of data races after recovery from a crash to maintain the atomicity, consistency, isolation, and durability.
\section{methodology}\label{alogrithm}
To improve the fuzzing speed and make full use of computing resources, we design p-fuzz, which is a parallel fuzzing framework.
\subsection{The algorithm of balancing workloads}
\section{methodology}\label{algorithm}
To improve the fuzzing speed and make full use of computing resources, we design a parallel fuzzing framework P-fuzz.
\subsection{The mechanism of synchronizing fuzzing status and balancing workloads}
\begin{figure}[htbp]
\centering
\subfigure[The example of distributing workloads by previous works ]{\includegraphics[width=4cm,height=3.8cm]{figcarryboxa.png}}
\subfigure[The example of distributing workloads by p-fuzz]{\includegraphics[width=4cm,height=3.8cm]{figcarryboxb.png}}
\caption{Example of distributing workloads}
\subfigure[The example of distributing workloads by P-fuzz]{\includegraphics[width=4cm,height=3.8cm]{figcarryboxb.png}}
\caption{Examples of balancing workloads}
\label{figcarrybox}
\end{figure}
Hardware resources and fuzzing tasks are two entities of parallel fuzzing. And the most important work is to distribute fuzzing tasks to hardware resources appropriately.
previous studies show us two drawbacks in tackling this work:
Previous studies show us two drawbacks in tackling this work:
\begin{itemize}
\item Underutilizing the hardware resources, which burdens the single core with many fuzzing tasks.
\item Sharing all information including(seeds, queues, crashes and hangs) with each of client, which may results all computing cores do repeated work and doesn’t fully reflect the advantages of parallel. This case is depicted in Fig.\ref{figcarrybox}(a).
\item Underutilizing the hardware resources burdens the single core with many fuzzing tasks.
\item Sharing all information (including seeds, queues, crashes and hangs) with each of client but not distributing them, which may entail all computing nodes do repeated work and don’t fully reflect the advantages of parallel. This case is depicted in Fig.\ref{figcarrybox}(a).
\end{itemize}
To make full use of hardware resources and enhance fuzzing efficiency, we schedule the fuzzing tasks to balance workload with the help of Database-centric architecture as Fig.\ref{figcarrybox}(b). Database-centric architecture put a database as a core of the whole system, and other hardware resources act as clients to communicate with the database.
The p-fuzz framework is designed based on the database-centric architecture as shown in Fig.\ref{figoverview}.
We deploy a server with a database to communicate with other hardware resources. Also, we mark the sharing records with flags and time stamps in the database, to differentiate whether this record is occupied by a client.
Furthermore, we start services to monitor the server which can not only schedule the fuzzing tasks, but also solve the race problem from parallelization.
In this way, all of the hardware resources get different seeds and do different tasks in the scheduling of p-fuzz mechanism.
To make full use of hardware resources and enhance the fuzzing efficiency, we schedule the fuzzing tasks to balance workload, then let each node to fuzz different seeds(Fig.\ref{figcarrybox}(b)). We leverage Database-centric\cite{b23} architecture to schedule the workloads and synchronize fuzzing status. Database-centric architecture put a database as a core of the whole system, and other hardware resources act as clients to communicate with the database.
The overview of P-fuzz framework is shown in Fig.\ref{figoverview}.
We deploy a server with a database to communicate with other hardware resources as client nodes. We share the seeds and bitmap in the databse. Also, we mark the sharing seeds with flags and timestamps in the database, to differentiate whether this seed has been occupied by a client.
Furthermore, we start service to monitor the server which can not only schedule the fuzzing tasks but also solve the parallel contending problem.
In this way, all of the client nodes get different seeds and do different tasks in the scheduling mechanism of P-fuzz.
\begin{figure}[htbp]
\centerline{\includegraphics[width=8cm,height=6cm]{figoverview.png}}
\caption{the framework overview of p-fuzz}
\caption{the framework overview of P-fuzz}
\label{figoverview}
\end{figure}
\subsection{Immediate response to update}
Different from roving and distributed fuzzing which synchronizes the sharing data in a fixed time gap, p-fuzz updates the new seeds and bitmap data to the database when AFL produces them.
Different from Roving and distributed fuzzing which synchronizes the sharing data in a fixed time period(such as 300s in Roving), P-fuzz updates the new seeds and bitmap data to the database when AFL produces them.
When a test case triggers a new interesting path in the fuzzing process, the test case will be uploaded as a record in the database.
Also, the bitmap stored in the database will be updated in time when a client find of a new path.
When a test case triggers a new interesting branch in the fuzzing process, the test case will be uploaded as a record in the database.
Also, the bitmap stored in the database will be updated in time when a client node hits a new path. The prompt action makes all client nodes in the system get updated seeds and the feedback information immediately.
It’s a prompt action to make all clients in the system get the information immediately.
\subsection{Contending handling strategy}
To solve the contending in parallel fuzzing, we design such three strategies.
\subsubsection{Flag: several client nodes access the same seed}
\subsection{Data race handling}
\subsubsection{Flag:two clients want to access the same seed simultaneously.}
As above mentioned, P-fuzz share seeds each AFL produced by storing them into a database. It schedules different client nodes access to different seeds to enhance fuzzing efficiency. However, when several client nodes access to a seed simultaneously, a data contending happenes. The same seed fuzzed repeatedly will produce a similar result.
As above mentioned, we sharing seeds each AFL produced by storing them into a database. Different clients access to different seeds to enhance the fuzzing efficiency. However, when two clients access to a seed simultaneously, a data race happen. The same seed fuzzed repeatedly will produce a similar result.
To alleviate this case, we set a flag attribute attaches to the seed. The flag marks whether this seed is fuzzing by a client. When a seed is free, the flag of it is “0”, or is “1” when the seed has been occupied. A client node chooses seeds by checking the flag. If the flag of a seed is “1”, the client will choose other seeds to fuzz.
To alleviate this case, we set a flag with seeds, which marks whether this seed is fuzzing by a client. If the flag is “1”, the client will choose other seeds to fuzz.
\subsubsection{Service: several client nodes update the bitmap in the database together}
\subsubsection{Service:some clients update bitmap in the database simultaneously}
We store the bitmap as a record in the database for sharing. A data contending happens as shown in Fig.\ref{figbitmap}. some bitmaps with “1” or “0” represent the branch uncovered or covered. There are two client nodes updating their new bitmaps (Fig.\ref{figbitmap}(b)(c)). If we do not control the merging process, just merge client1's bitmap then merge client2's bitmap, the information will get lost like Fig.\ref{figbitmap}(d).
We store the bitmap as a record in the database for sharing. A data race happen as shown in Fig.\ref{figbitmap}. some bitmaps with “1” or “0” represent the path uncovered or covered. There are two clients updating their new bitmaps together(Fig.\ref{figbitmap}(b)(c)). If we do not control the updating process, information will get lost which is shown in Fig.\ref{figbitmap}(d).
To alleviate this case, we start a service in the server to manage the merge operation. The service builds a queue to store the bitmap temporarily. When bitmaps from different clients come up to the database, they are enqueued according to the time order. The database merges these enqueued bitmaps by "AND(\&)" operation one by one so that the bitmap maintains all the necessary information.
To alleviate this case, we start a service in the server to maintain the sequence of updating. The service builds a queue to store the bitmap temporarily. When bitmaps come up together, they are enqueued according to the time order. The database merges these bitmaps in the queue one by one so that the bitmap maintains all necessary information.
\begin{figure}[htbp]
\centering
\subfigure[the origin bitmap]{\includegraphics[width=2cm,height=1.7cm]{figbit1.png}}
\subfigure[the bitmap updated from client1]{\includegraphics[width=2cm,height=1.7cm]{figbit2.png}}
\subfigure[the bitmap updated from client2]{\includegraphics[width=2cm,height=1.7cm]{figbit3.png}}
\subfigure[the final bitmap]{\includegraphics[width=2cm,height=1.7cm]{figbit4.png}}
\caption{race of updating bitmap from different clients}
\subfigure[the final bitmap(a "0" bit is missing)]{\includegraphics[width=2cm,height=1.7cm]{figbit4.png}}
\caption{contending of updating bitmap from different clients}
\label{figbitmap}
\end{figure}
\subsubsection{Time stamp:a client quits fuzzing accidently but doesn’t finish a complete fuzzing round}
\subsubsection{Timestamp: a client quits fuzzing accident but doesn’t finish a complete fuzzing round}
as above mentioned, we set a flag to mark whether the seed is occupied by a client. However, in parallel computing, a client sometimes quit with errors or other accidents. At this time, the flag is “1” but the fuzzing process of the corresponding seed isn’t finished.
To solve this problem, we put a time stamp when the flag is set to “1”. We also monitor if the fuzzing is overtime by the current time minus the time stamp. This mechanism assures exceptions won’t disturb the parallel fuzzing.
To solve this problem, we put a timestamp when the flag is set to “1”. We also monitor if the fuzzing is overtime with the timestamp. This mechanism assures exceptions won’t disturb the parallel fuzzing.
\subsection{The selection of mutation strategies }
According to the introduction in section \ref{background}, non-deterministic and deterministic mutation strategies do well in different targets. Therefore, P-fuzz adopts both of them to fuzz.
For most of the target programs, we set P-fuzz to do non-deterministic mutation to cover more branches and keep the efficiency of parallel fuzzing.
For those target programs which are format-awareness, we set P-fuzz to do deterministic mutations first to keep the format of files, and then do non-deterministic mutations.
\section{implementation}\label{implementation}
\subsection{Workflow}
The workflow of p-fuzz is shown below:
The workflow of P-fuzz is shown below:
\begin{itemize}
\item Setting up the database in the server machine
\item Configuring the services
\item Starting afl in each clients
\item Getting the results
\item Setting up and configuring the database in the server
\item Configuring the services in the server
\item Building a fuzzing environment in Docker
\item Copying the environment to all client nodes
\item Starting AFL in each client node
\item Each client node updates new seeds and changed bitmap during fuzzing
\item Getting the results from the server
\end{itemize}
\subsection{Server}
The server machine is the core of the whole system. We deploy a MongoDB database on the server to store the sharing data.
\begin{figure}[htbp]
\centering
\subfigure[the seed collection in the database]{\includegraphics[width=8cm,height=1cm]{figdbseed.png}}
\subfigure[the bitmap collection in the database]{\includegraphics[width=8cm,height=0.8cm]{figdbbitmap.png}}
\subfigure[the seed collection in the database]{\includegraphics[width=4.4cm,height=3.5cm]{figseedcollection.png}}
\subfigure[the bitmap collection in the database]{\includegraphics[width=4.3cm,height=2.1cm]{figbitmapcollection.png}}
\caption{the two collections of database}
\label{figdatabase}
\end{figure}
\subsubsection{MongoDB}
MongoDB is an open-source document database, which with high performance, high availability and automatic scaling.
MongoDB\cite{b32} is an open-source document database, which is no-SQL with high performance, high availability and automatic scaling.
As shown in Fig.\ref{figdatabase}(a)(b), We set two collections in the database. One is “seed”, the first row is the key to the collection which is a hash value of file name , the second row records the content of seed, the third row is flag to mark whether the seed is being fuzzed, and the last row is timestamp, which is used to mark when the fuzzing start.
As shown in Fig.\ref{figdatabase}(a)(b), We set two types of collections in the database. One is “seed”, To avoid the situation that clients send seeds which have same content, the first attribute is a hash value of the seed content. The second attribute records the seed content. The third attribute is a flag to mark whether the seed is being fuzzed, and the last attribute is a timestamp, which is used to mark the time of a fuzzing start.
The other type of collection is “bitmap”. In the whole database, there is only one bitmap collection, because all nodes need to share this bitmap to acquire whole coverage information of the system. The first attribute in this collection is the sharing bitmap. And the second attribute is the timestamp to records the latest updating time of bitmap.
The other is “bitmap”, the first row is the key of the collection is a hash value of file name, the second row records the sharing bitmap.
\subsubsection{Service}
As shown in Fig.\ref{figservice}, we start a service in the server to maintain the sequence of updating. The service binding with server keeps running and records the time of each bitmap coming. When bitmaps from several clients are sent to the server together, the service put them into a queue according to the time order. The queue provides bitmaps to the database to merge them into latest state continuously.
As we discuss in section \ref{algorithm}, when several client nodes update bitmap together, some bits in the bitmap will get lost. In order to solve this challenge, we start a service in the server to maintain the sequence of updating. The service is bound with the server. It keeps running to enqueue bitmaps from client nodes and merget them one by one. The merge operation is conducted by "\emph{AND}”(\&) bitmaps according to the arriving time. The operation maintains that there is only one bitmap to share with client nodes.
\begin{figure}[htbp]
\centerline{\includegraphics[width=8cm,height=3cm]{figservice.png}}
\caption{the workflow of service}
\label{figservice}
\end{figure}
\subsection{Client}
We choose several computers in a local area network as clients. We deploy p-fuzz client, which is a parallel fuzzing AFL version on each client, and put the same program to being fuzzed. At the start of fuzzing, each client downloads a seed from the central database. When the fuzzing engine finds some interesting paths, it updates these new seeds to the central database. Furthermore, the write and read in fuzzing is conducted by updating or downloading records to or from the central database. We achieve sharing data in parallel fuzzing by this mechanism
We choose several computers in a local area network as client nodes. To build a fuzzing environment, we utilize Docker. We deploy and configure the AFL engine with the help of Docker, then we duplicate the fuzzing environment to all other nodes in the distributed system.
\section{experiment}
\subsection{experiment setup}
\subsection{comparison in crashes and branches}
\subsection{comparison in speed}
At the start of fuzzing, each client downloads a seed from the central database. When the fuzzing engine finds some interesting paths, it updates these new seeds to the central database. Furthermore, the write and read in fuzzing is conducted by updating or downloading records to or from the central database. We achieve sharing data in parallel fuzzing by this mechanism.
\section{discussion}
\subsection{limitation}
\subsection{future work}
\section{conclusion}
\begin{equation}
a+b=\gamma\label{eq}
\end{equation}
Be sure that the
symbols in your equation have been defined before or immediately following
the equation. Use ``\eqref{eq}'', not ``Eq.~\eqref{eq}'' or ``equation \eqref{eq}'', except at
the beginning of a sentence: ``Equation \eqref{eq} is . . .''
\subsection{\LaTeX-Specific Advice}
Please use ``soft'' (e.g., \verb|\eqref{Eq}|) cross references instead
of ``hard'' references (e.g., \verb|(1)|). That will make it possible
to combine sections, add equations, or change the order of figures or
citations without having to go through the file line by line.
Please don't use the \verb|{eqnarray}| equation environment. Use
\verb|{align}| or \verb|{IEEEeqnarray}| instead. The \verb|{eqnarray}|
environment leaves unsightly spaces around relation symbols.
Please note that the \verb|{subequations}| environment in {\LaTeX}
will increment the main equation counter even when there are no
equation numbers displayed. If you forget that, you might write an
article in which the equation numbers skip from (17) to (20), causing
the copy editors to wonder if you've discovered a new method of
counting.
{\BibTeX} does not work by magic. It doesn't get the bibliographic
data from thin air but from .bib files. If you use {\BibTeX} to produce a
bibliography you must send the .bib files.
{\LaTeX} can't read your mind. If you assign the same label to a
subsubsection and a table, you might find that Table I has been cross
referenced as Table IV-B3.
{\LaTeX} does not have precognitive abilities. If you put a
\verb|\label| command before the command that updates the counter it's
supposed to be using, the label will pick up the last counter to be
cross referenced instead. In particular, a \verb|\label| command
should not go before the caption of a figure or a table.
Do not use \verb|\nonumber| inside the \verb|{array}| environment. It
will not stop equation numbers inside \verb|{array}| (there won't be
any anyway) and it might stop a wanted equation number in the
surrounding equation.
\subsection{Some Common Mistakes}\label{SCM}
\begin{itemize}
\item The word ``data'' is plural, not singular.
\item The subscript for the permeability of vacuum $\mu_{0}$, and other common scientific constants, is zero with subscript formatting, not a lowercase letter ``o''.
\item In American English, commas, semicolons, periods, question and exclamation marks are located within quotation marks only when a complete thought or name is cited, such as a title or full quotation. When quotation marks are used, instead of a bold or italic typeface, to highlight a word or phrase, punctuation should appear outside of the quotation marks. A parenthetical phrase or statement at the end of a sentence is punctuated outside of the closing parenthesis (like this). (A parenthetical sentence is punctuated within the parentheses.)
\item A graph within a graph is an ``inset'', not an ``insert''. The word alternatively is preferred to the word ``alternately'' (unless you really mean something that alternates).
\item Do not use the word ``essentially'' to mean ``approximately'' or ``effectively''.
\item In your paper title, if the words ``that uses'' can accurately replace the word ``using'', capitalize the ``u''; if not, keep using lower-cased.
\item Be aware of the different meanings of the homophones ``affect'' and ``effect'', ``complement'' and ``compliment'', ``discreet'' and ``discrete'', ``principal'' and ``principle''.
\item Do not confuse ``imply'' and ``infer''.
\item The prefix ``non'' is not a word; it should be joined to the word it modifies, usually without a hyphen.
\item There is no period after the ``et'' in the Latin abbreviation ``et al.''.
\item The abbreviation ``i.e.'' means ``that is'', and the abbreviation ``e.g.'' means ``for example''.
\end{itemize}
An excellent style manual for science writers is \cite{b7}.
\subsection{Authors and Affiliations}
\textbf{The class file is designed for, but not limited to, six authors.} A
minimum of one author is required for all conference articles. Author names
should be listed starting from left to right and then moving down to the
next line. This is the author sequence that will be used in future citations
and by indexing services. Names should not be listed in columns nor group by
affiliation. Please keep your affiliations as succinct as possible (for
example, do not differentiate among departments of the same organization).
\subsection{Identify the Headings}
Headings, or heads, are organizational devices that guide the reader through
your paper. There are two types: component heads and text heads.
Component heads identify the different components of your paper and are not
topically subordinate to each other. Examples include Acknowledgments and
References and, for these, the correct style to use is ``Heading 5''. Use
``figure caption'' for your Figure captions, and ``table head'' for your
table title. Run-in heads, such as ``Abstract'', will require you to apply a
style (in this case, italic) in addition to the style provided by the drop
down menu to differentiate the head from the text.
Text heads organize the topics on a relational, hierarchical basis. For
example, the paper title is the primary text head because all subsequent
material relates and elaborates on this one topic. If there are two or more
sub-topics, the next level head (uppercase Roman numerals) should be used
and, conversely, if there are not at least two sub-topics, then no subheads
should be introduced.
\subsection{Figures and Tables}
\paragraph{Positioning Figures and Tables} Place figures and tables at the top and
bottom of columns. Avoid placing them in the middle of columns. Large
figures and tables may span across both columns. Figure captions should be
below the figures; table heads should appear above the tables. Insert
figures and tables after they are cited in the text. Use the abbreviation
``Fig.~\ref{fig}'', even at the beginning of a sentence.
\begin{table}[htbp]
\begin{table*}[htbp]
\caption{Table Type Styles}
\begin{center}
\begin{tabular}{|c|c|c|c|}
\begin{center}\label{tabresult}
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline
\textbf{Target}&\multicolumn{4}{|c|}{AFL}
& \multicolumn{4}{|c|}{Roving}
& \multicolumn{4}{|c|}{P-fuzz} \\
\cline{2-13}
\textbf{ } & \textbf{\textit{density}}
& \textbf{\textit{paths}}
& \textbf{\textit{crashes}}
& \textbf{\textit{speed}}
& \textbf{\textit{density}}
& \textbf{\textit{paths}}
& \textbf{\textit{crashes}}
& \textbf{\textit{speed}}
& \textbf{\textit{density}}
& \textbf{\textit{paths}}
& \textbf{\textit{crashes}}
& \textbf{\textit{speed}}
\\
\hline
\textbf{Table}&\multicolumn{3}{|c|}{\textbf{Table Column Head}} \\
\cline{2-4}
\textbf{Head} & \textbf{\textit{Table column subhead}}& \textbf{\textit{Subhead}}& \textbf{\textit{Subhead}} \\
nm
& 0.58
& 389
& 0
& 368
& 6.57
& 2967
& 0
& 9928
& 6.76
& 5091
& 0
& 7351
\\
\hline
copy& More table copy$^{\mathrm{a}}$& & \\
strings
& 0.16
& 64
& 0
& 1022
& 0.16
& 395
& 0
& 4521
& 0.16
& 143
& 0
& 5011
\\
\hline
\multicolumn{4}{l}{$^{\mathrm{a}}$Sample of a Table footnote.}
objdump
& 8.48
& 1923
& 0
& 630
& 10.75
& 3777
& 0
& 12352
& 12.49
& 7283
& 0
& 5612
\\
\hline
size
& 3.54
& 605
& 0
& 2648
& 6.64
& 2907
& 0
& 11204
& 6.15
& 8723
& 0
& 8051
\\
\hline
readelf
& 7.27
& 1747
& 0
& 1219
& 12.1
& 6446
& 0
& 6252
& 9.24
& 1034
& 0
& 7006
\\
\hline
tiffinfo
& 0.04
& 9
& 0
& 4001
& 0.04
& 10
& 0
& 15903
& 4.81
& 754
& 0
& 8603
\\
\hline
bmp2info
& 0.61
& 480
& 25
& 228
& 0.6
& 1826
& 26
& 2836
& 3.56
& 201
& 38
& 6488
\\
\hline
tcpdump
& 3.61
& 775
& 0
& 1321
& 11.2
& 4072
& 0
& 5472
& 36.77
& 17926
& 0
& 4812
\\
\hline
nasm
& 8.34
& 2531
& 0
& 1337
& 10.28
& 2528
& 0
& 3362
& 10.56
& 9152
& 0
& 2914
\\
\hline
base64
& 0.58
& 389
& 0
& 368
& 0.58
& 788
& 0
& 8345
& 1.15
& 678
& 2
& 5921
\\
\hline
md5sum
& 0.83
& 156
& 0
& 206
& 1.01
& 2701
& 0
& 4366
& 0.86
& 412
& 0
& 688
\\
\hline
uniq
& 0.36
& 57
& 0
& 783
& 0.36
& 188
& 1
& 3624
& 0.37
& 143
& 1
& 3200
\\
\hline
who
& 2.46
& 178
& 0
& 1023
& 2.47
& 1008
& 1
& 6200
& 3.21
& 532
& 8
& 7722
\\
\hline
\end{tabular}
\label{tab1}
\end{center}
\end{table}
\end{table*}
\section{experiment}
\subsection{Experiment setup}
To run P-fuzz in a parallel environment, we conduct the experiments on eight desktops with Intel Core i7 3.4GHz 8 Core CPU and 8GB RAM running Ubuntu 16.04. In order to compare our framework with another parallel fuzzing framework, we divide the eight desktops into two groups. We choose five targets in GNU Binutils\cite{b17}(nm, objdump, readelf,size and strings), LAVA-M data set\cite{b18}(base64, md5sum, uniq, and who), two image processing tools(bmp2tiff and tiffinfo), and tcpdump as our target programs. Thus, we have 13 target programs to conduct experiments.
We compare P-fuzz with AFL and a previous parallel fuzzing framework Roving for two hours. To prove the effectiveness and efficiency of P-fuzz, we record four indicators for each experiment:
\begin{itemize}
\item \textbf{Bitmap density.} This is an important indicator to measure code coverage for grey-box fuzzers, which is the ratio of changed bytes in bitmap takes in the total size of the bitmap.
\item \textbf{Crashes.} This is the number of unique crashes when executing the programs. And crashes result from unique test cases that cause the tested program to receive a fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT ).
\item \textbf{Speed.} We measure execution speed of each fuzzer in exe/s to demonstrate fuzzing overhead. This indicator means the number of executed test cases each second.
\item \textbf{Paths.} this is an indicator to calculate the quantity of seeds in the queue.
\end{itemize}
Before we fuzz target programs, we need to compile the programs with AFL’s compiler called \emph{afl-gcc}. \emph{afl-gcc} instruments the target source codes and produce target binary files.
\subsection{Results}
Firstly, we take a 2-hour rapid experiment on p-fuzz, AFL in a single node, and Roving to test the above indicators of nine target programs and LAVA-M data benchmarks. The result is listed below in Table.\ref{tabresult}.
As shown in the table, we can see P-fuzz covers more bits of bitmap than AFL and Roving in most of the target programs. The map density of P-fuzz is 2.59X higher than AFL and 1.66X higher than Roving on average. Especially, in “tcpdump” the map density achieved 36.77\%, which almost triples the map density of Roving.
It is worth mentioning that in two image processing tools “tiffinfo” and “bmp2tiff”, P-fuzz also shows its ability to handle format-awareness programs by utilizing the deterministic mutation strategies. 0.04\% and 0.61\% are two map density limits of both AFL and Roving in “tiffinfo” and “bmp2tiff”, while P-fuzz reaches 4.8\% and 3.56\% respectively. However, the three frameworks get similar map densities in LAVA-M data benchmarks. The reason is that LAVA-M is a designed data set, parallel fuzzing but without the improvement in the algorithm is hard to find more paths.
Moreover, the rapid experiment in just 2-hour is hard to find crashes. With the high-efficiency characteristic, P-fuzz speeds up the whole fuzzing process and find more crashes than AFL and Roving in such a short time. In “who”, P-fuzz triggers eight crashes while Roving only one crash. Also, P-fuzz finds 38 crashes in “bmp2tiff”, more than AFL’s 25 crashes and Roving’s 26 crashes.
Because of paralleling the fuzzing, P-fuzz easily gains an almost 4X speed up. However, the average speed is a little lower than the Roving. The reason is that Roving uses non-deterministic mutations in the whole fuzzing process, while P-fuzz combines the two mutation strategies.
To prove the high efficiency of P-fuzz, We select the test data of “objdump”, which is shown in Fig. \ref{figline}. The figure reveals the map density increment during the start of 1000 seconds of the experiment. P-fuzz surpasses AFL and Roving quickly in 5 seconds and keeps increasing.
\begin{figure}[htbp]
\centerline{\includegraphics{fig1.png}}
\caption{Example of a figure caption.}
\label{fig}
\centerline{\includegraphics[width=8cm,height=5.4cm]{figresult.png}}
\caption{Test on objdump by AFL, Roving and P-fuzz}
\label{figline}
\end{figure}
Figure Labels: Use 8 point Times New Roman for Figure labels. Use words
rather than symbols or abbreviations when writing Figure axis labels to
avoid confusing the reader. As an example, write the quantity
``Magnetization'', or ``Magnetization, M'', not just ``M''. If including
units in the label, present them within parentheses. Do not label axes only
with units. In the example, write ``Magnetization (A/m)'' or ``Magnetization
\{A[m(1)]\}'', not just ``A/m''. Do not label axes with a ratio of
quantities and units. For example, write ``Temperature (K)'', not
``Temperature/K''.
\subsection{Analysis}
As is shown in the Table.\ref{tabresult}, P-fuzz outperforms the other two framework. We try to analyze the reason of the strengths of P-fuzz.
\subsubsection{P-fuzz vs. AFL single node}
\textbf{The fuzzing efficiency of P-fuzz outperforms AFL.}
AFL in a single node is the baseline of experiments. We can see from the results, P-fuzz outperforms AFL by applying parallel computing technique. In the 2-hour fuzzing, the branches P-fuzz covered and the test seeds produced are higher than AFL
\subsubsection{P-fuzz vs. Roving}
\textbf{Roving doesn't share the feedback information of grey-box fuzzing.}
Roving share the test cases, queues, crashes and hangs with each client nodes by synchronizing to the server. However, the feedback information is also significant to fuzzing. P-fuzz uploads the bitmap as feedback information to share paths the whole framework has found with each client node.
\textbf{The mechanism of Roving takes up too much memory.}
The sharing mechanism of Roving is synchronizing all the test cases produced by four client nodes to the server, whether the test case is the same with others or not. However, when fuzzing target programs, which contain a large number of paths, the server of Roving is shut down, because it doesn’t support to handle too many files. Compared with Roving, P-fuzz just upload test cases as records to the database, which saves massive storage space than Roving.
\textbf{The monotonous mutation strategy of Roving.}
Roving only adopts non-deterministic mutation to make parallel fuzzing more randomly and rapidly. The executing speed of Roving is much higher than P-fuzz actually. However, the benefits of deterministic mutation are discarded which entails some complex programs are ignored by Roving.
\section{discussion}
Although P-fuzz enhances the efficiency of AFL and outperforms the parallel fuzzing framework Roving, there are some limitations.
\begin{enumerate}
\item Some seeds which entail the changing of same bits in the bitmap are tedious. These seeds are generated before the local bitmap hasn’t be updated, which results in this case. We will further improve P-fuzz, to cut down more useless seeds. The higher the quality of seeds, the easier to find crashes and vulnerabilities.
\item Sometimes it isn’t worthy to use too many hardware overheads to exchange a little improvement in efficiency. Our experiments show utilizing parallel nodes can enhance efficiency by organizing a set of machines to fuzz. However, running hardware resources entails massive energy is consumed. We should try to find a balance to make a tradeoff between the overhead of hardware resources and efficiency.
\end{enumerate}
Because of the orthogonality of parallel fuzzing optimization and algorithm improvement, we can apply an improved algorithm of AFL or select some other techniques such as concolic execution\cite{b31} in P-fuzz to produce more vulnerabilities. That's the direction of our future work.
\section{conclusion}
In this paper, we focus on improving the efficiency of grey-box fuzzing. We leverage the parallel computing technique which is different from developing fuzzing algorithms of current works. We design the approach to make P-fuzz balance the workload by giving different client nodes different seeds. Also, P-fuzz shares the seeds and bitmap data with each node to synchronize the fuzzing status. What’s more, P-fuzz handle some data contendings cases in parallel fuzzing, such as accessing the same seeds, bitmap updating collision and accidents in client nodes. P-fuzz selects different mutation strategies based on the target it fuzzes.
We implement the parallel fuzzing framework P-fuzz by applying the database-centric architecture, which consists of a database server and several client nodes. We deploy MongoDB on the server to store shared seeds and bitmap. We use Docker to build the fuzzing environment and copy it to all the client nodes.
Finally, we conduct experiments compare P-fuzz with AFL in a single node and a parallel fuzzing framework Roving in nine target programs and LAVA-M data benchmarks. The experimental results prove that P-fuzz improves the fuzzing ability especially the efficiency of the grey-box fuzzer AFL.
\section*{Related work}
\subsection{Fuzzing tools}
Fuzzing tools can be classified into three types based on the knowledge and information acquired from the source code of target program, they are white-box, black-box, and grey-box fuzzer.
\subsubsection{White-box fuzzing}
The white-box fuzzer has full knowledge of source code (eg. internal logic and structure) and uses the control structure of the procedural design to derive test cases. Current white-box fuzzing tools always utilize symbolic execution technique, such as Sage\cite{b3}, Angr\cite{b25} and KLEE\cite{b26}. Symbolic execution abstracts the input values as symbols, which could lead a symbolic engine to explore as many as possible execution paths at the same time. Then getting the satisfying results by solving constraints. However, the whole process of symbolic execution results in state space explosion which is still a bottleneck\cite{b31}.
\section*{Acknowledgment}
\subsubsection{Black-box fuzzing}
The black-box fuzzer doesn’t have any knowledge of source code but it generates test cases randomly and swiftly. Some typical fuzzers such as Radamsa\cite{b27}, zzuf\cite{b28} and Peachfuzz\cite{b29} which did remarkable work in this field. Peachfuzz\cite{b29} have the ability to fuzz programs which are format-awareness by providing description files.
The preferred spelling of the word ``acknowledgment'' in America is without
an ``e'' after the ``g''. Avoid the stilted expression ``one of us (R. B.
G.) thanks $\ldots$''. Instead, try ``R. B. G. thanks$\ldots$''. Put sponsor
acknowledgments in the unnumbered footnote on the first page.
\subsubsection{Grey-box fuzzing}
The grey-box fuzzer try to combine the efficiency and effectiveness of black-box fuzzers and white-box fuzzers, which masters limited knowledge of the internal working of the target program. Through collecting the feedback information of target program, grey-box fuzzers show the competitiveness of mutating test cases with valid guidance. It is implemented by lightweight instrumentation or other mechanisms to get program execution feedback, such as code coverage for the fuzzing process. American Fuzzy Lop(AFL)\cite{b6} is a state-of-the-art grey-box fuzzer whose principles are speed, reliability, and ease of use. AFL instruments the compiled program to get the edge coverage information. Bohme et al. designed AFL-fast\cite{b8} which assigned more mutation energy to interesting paths. Gan et al introduced CollAFL\cite{b9}, which mitigated the path collisions by providing more accurate coverage information. Bohme also implemented a directed grey-box fuzzing tool AFLGO\cite{b10} towards the dangerous locations which tend to produce vulnerabilities. All of these extensions gained higher coverage and found more bugs than AFL. Zhang\cite{b1} et al. leverage hardware mechanism (Intel Processor Trace) to collect branch information, and feed this information back to the fuzzing process.
\section*{References}
Please number citations consecutively within brackets \cite{b1}. The
sentence punctuation follows the bracket \cite{b2}. Refer simply to the reference
number, as in \cite{b3}---do not use ``Ref. \cite{b3}'' or ``reference \cite{b3}'' except at
the beginning of a sentence: ``Reference \cite{b3} was the first $\ldots$''
\subsection{Other fuzzing tools based on parallel technique}
Number footnotes separately in superscripts. Place the actual footnote at
the bottom of the column in which it was cited. Do not put footnotes in the
abstract or reference list. Use letters for table footnotes.
Some previous works try to leverage parallel computing technology to speed up the fuzzing process. The technology collects a group of computing resources to decompose the heavy fuzzing task.
To enhance the efficiency of symbolic execution, Cloud9\cite{b5} shares the searching scope into some pieces, each computing node shares the workload. Liang\cite{b13} also solve the challenge of path explosion by put results into different computing node, this method is similar to our mechanism of distributing seeds.
In the parallel coverage-based grey-box fuzzing, more attention is paid to distribute the fuzzing test cases.
Test cases produce by mutation are allocated to each computer, which balances the system workload. Xie\cite{b12} using grid computing for large scale fuzzing in 2010, which reduce almost two-thirds of fuzzing time. It was implemented by dividing fuzzing jobs into tasks, storing them in a server and scheduling remote clients to download them. Lian et al. proposed a dynamic resource-aware approach for parallel fuzzing. ClusterFuzz\cite{b30} is a scalable fuzzing infrastructure which supports for coverage-based grey-box fuzzing (e.g. libFuzzer and AFL) and black-box fuzzing.
It is used by Google for fuzzing the Chrome Browser and serves as the fuzzing backend for OSS-Fuzz.
Unless there are six authors or more give all authors' names; do not use
``et al.''. Papers that have not been published, even if they have been
submitted for publication, should be cited as ``unpublished'' \cite{b4}. Papers
that have been accepted for publication should be cited as ``in press'' \cite{b5}.
Capitalize only the first word in a paper title, except for proper nouns and
element symbols.
For papers published in translation journals, please give the English
citation first, followed by the original foreign-language citation \cite{b6}.
\begin{thebibliography}{00}
\bibitem{b1} G. Eason, B. Noble, and I. N. Sneddon, ``On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,'' Phil. Trans. Roy. Soc. London, vol. A247, pp. 529--551, April 1955.
\bibitem{b2} J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68--73.
\bibitem{b3} I. S. Jacobs and C. P. Bean, ``Fine particles, thin films and exchange anisotropy,'' in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271--350.
\bibitem{b4} K. Elissa, ``Title of paper if known,'' unpublished.
\bibitem{b5} R. Nicole, ``Title of paper with only first word capitalized,'' J. Name Stand. Abbrev., in press.
\bibitem{b6} Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, ``Electron spectroscopy studies on magneto-optical media and plastic substrate interface,'' IEEE Transl. J. Magn. Japan, vol. 2, pp. 740--741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
\bibitem{b7} M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989.
\bibitem{b1}
G.~Zhang, X.~Zhou, Y.~Luo, X.~Wu, and E.~Min, ``Ptfuzz: Guided fuzzing with
processor trace feedback,'' \emph{IEEE Access}, vol.~6, pp. 37\,302--37\,313,
2018.
\bibitem{b2}
M.~Sutton, A.~Greene, and P.~Amini, \emph{Fuzzing: brute force vulnerability
discovery}.\hskip 1em plus 0.5em minus 0.4em\relax Pearson Education, 2007.
\bibitem{b3}
P.~Godefroid, M.~Y. Levin, and D.~Molnar, ``Sage: whitebox fuzzing for security
testing,'' \emph{Communications of the ACM}, vol.~55, no.~3, pp. 40--44,
2012.
\bibitem{b4}
K.~Serebryany, ``Oss-fuzz-google's continuous fuzzing service for open source
software,'' 2017.
\bibitem{b5}
L.~Ciortea, C.~Zamfir, S.~Bucur, V.~Chipounov, and G.~Candea, ``Cloud9: A
software testing service,'' \emph{ACM SIGOPS Operating Systems Review},
vol.~43, no.~4, pp. 5--10, 2010.
\bibitem{b6} M.~Zalewski, ``American fuzzy lop,'' [Online]. Available: http://lcamtuf.coredump.cx/afl/
\bibitem{b7}
``Cve list,'' [Online]. Available: http://cve.mitre.org/.
\bibitem{b8}
M.~B{\"o}hme, ``Aflfast. new,'' 2017.
\bibitem{b9}
S.~Gan, C.~Zhang, X.~Qin, X.~Tu, K.~Li, Z.~Pei, and Z.~Chen, ``Collafl: Path
sensitive fuzzing,'' in \emph{2018 IEEE Symposium on Security and Privacy
(SP)}.\hskip 1em plus 0.5em minus 0.4em\relax IEEE, 2018, pp. 679--696.
\bibitem{b10}
M.~B{\"o}hme, V.-T. Pham, M.-D. Nguyen, and A.~Roychoudhury, ``Directed greybox
fuzzing,'' in \emph{Proceedings of the 2017 ACM SIGSAC Conference on Computer
and Communications Security}.\hskip 1em plus 0.5em minus 0.4em\relax ACM,
2017, pp. 2329--2344.
\bibitem{b11}
``Amazon spot instance,'' [Online]. Available: https://aws.amazon.com/ec2/spot/.
\bibitem{b12}
X.~Yan, ``Using grid computing for large scale fuzzing,'' Ph.D. dissertation,
Lisbon: Universidade Nova de Lisboa, 2010.
\bibitem{b13}
H.~LIANG, Y.~Xiaoyu, D.~Yu, P.~ZHANG, and L.~Shuchang, ``Parallel smart fuzzing
test,'' \emph{Journal of Tsinghua University (Science and Technology)},
vol.~54, no.~1, pp. 14--19, 2015.
\bibitem{b14}
``Roving,'' [Online]. Available:https://github.com/richo/Roving.
\bibitem{b15}
``Distributed fuzzing for afl,'' [Online]. Available: https://github.com/richo/Roving/.
\bibitem{b16}
``Docker,'' [Online]. Available: https://www.docker.com/.
\bibitem{b17}
``GNU Binutils,'' [Online]. Available:http://www.gnu.org/software/binutils/
\bibitem{b18}
B.~Dolan-Gavitt, P.~Hulin, E.~Kirda, T.~Leek, A.~Mambretti, W.~Robertson,
F.~Ulrich, and R.~Whelan, ``Lava: Large-scale automated vulnerability
addition,'' in \emph{2016 IEEE Symposium on Security and Privacy (SP)}.\hskip
1em plus 0.5em minus 0.4em\relax IEEE, 2016, pp. 110--121.
\bibitem{b19}
K.~Lu, P.-F. Wang, G.~Li, and X.~Zhou, ``Untrusted hardware causes double-fetch
problems in the i/o memory,'' \emph{Journal of Computer Science and
Technology}, vol.~33, no.~3, pp. 587--602, 2018.
\bibitem{b20}
G.~V. Wilson, ``The history of the development of parallel computing,''
\emph{URL: http://ei. cs. vt. edu/history/Parallel. html}, 1994.
\bibitem{b21}
G.~S. Almasi and A.~Gottlieb, ``Highly parallel computing,'' 1988.
\bibitem{b22}
J.~Corbet, A.~Rubini, and G.~Kroah-Hartman, \emph{Linux Device Drivers: Where
the Kernel Meets the Hardware}.\hskip 1em plus 0.5em minus 0.4em\relax "
O'Reilly Media, Inc.", 2005.
\bibitem{b23}
``database-centric architecture,'' [Online]. Available:https://en.wikipedia.org/wiki/Database-centric\_architecture
\bibitem{b24}
``American Fuzzy Lop (AFL) Fuzzer-Technical Details'' [Online]. Available: Available: http://lcamtuf.coredump.cx/ afl/technical\_details.txt
\bibitem{b25}
``Angr,'' [Online]. Available:https://angr.io/
\bibitem{b26}
C.~Cadar, D.~Dunbar, and D.~Engler, ``Klee: unassisted and automatic generation of high-coverage tests for complex systems programs,'' in \emph{Usenix Conference on Operating Systems Design \& Implementation}, 2009.
\bibitem{b27}
``radamsa,'' [Online].
Available: https://github.com/aoh/radamsa
\bibitem{b28}
``zzuf,'' [Online].
Available: http://caca.zoy.org/wiki/zzuf
\bibitem{b29}
``peach,'' [Online].
Available: https://www.peach.tech
\bibitem{b30}
``ClusterFuzz,'' [Online].
Available: https://google.github.io/clusterfuzz/
\bibitem{b31}
R.~Baldoni, E.~Coppa, D.~C. D'Elia, C.~Demetrescu, and I.~Finocchi, ``A survey
of symbolic execution techniques,''\emph{Acm Computing Surveys}, vol.~51, no.~3, pp. 1--39, 2016.
\bibitem{b32}
``Mongodb'' [Online].
Available: https://www.mongodb.com/
\end{thebibliography}
\vspace{12pt}
\color{red}
IEEE conference templates contain guidance text for composing and formatting conference papers. Please ensure that all template text is removed from your conference paper prior to submission to the conference. Failure to remove the template text from your paper may result in your paper not being published.
\end{document}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment