1.INTRODUCTION
Recently, information security threats, including distributed denialofservice (DDoS) and malware, such as computer viruses, Trojan horses, spyware, and worms, have increased rapidly and caused serious damages to computer and communications systems. Several studies have analyzed the impact of information threats on system performance using rigorous mathematical models such as queueing models: Considering the SYN flooding attacks, Wang et al. (2007) take a queueing approach to analyze DDoS attacks against computer systems. In addition, Wang et al. (2010) deal with email systems under attacks against email accounts, such as password cracking, malicious emails and email bombs. They use a multiple queueing model to study the performance metrics of system security. Lee and Yang (2013) consider wireless sensor networks having unreliable network connections caused by external attacks and study the effective operation of the power consumption. Yu et al. (2014) apply the M/M/m queueing model to a resource allocation problem arising in a cloud environment vulnerable to DDoS attacks. Kondakci (2015) summarizes existing studies that have taken stochastic approaches to analyze the reliability of network security.
In this paper, based on a queueing analysis, we deal with the economic operation of a data backup scheme for computer systems under disastrous information threats. Damages from such information threats include data loss, which may be the most disastrous one to individuals and organizations. They suffer from computer viruses, for example, that wipes out all the files from their computers. One of the countermeasures to protect their data in a computer is to back them up to an external hardware. Among various countermeasures, such as antiviruses and firewalls, data backup is a basic method to ensure infor mation security against various threats.
Now, let us consider the following data backup scenario that frequently arises in computer systems dealing with files, data, and computer programs: A user stores her work in her computer, every time she completes her work. As soon as the amount of her completed work (the number of completed files, for example) is added up to a certain point, all her work is (all her files are) backed up in a batch to external hardware or a cloud server. This data backup process can be modeled by the twostage queueing model (Yang et al., 2016). In this model, the individual work corresponds to an individual service of the first stage and the batch work in backup is a batch service of the second stage. In addition to this model, we consider disasters in this paper to describe disastrous information threats. We assume that disasters occur stochastically to the system and wipe out all the work in the first stage as well as the second stage. The queueing model we consider in this paper, as a result, is a combination of the twostage queueing model and the disaster model.
In the literature, the external causes that flush out all the customers in queueing systems are called disasters and have been extensively researched (see, Lee et al., 2011; Lee and Yang, 2013). Disasters are also called queue flushing (Towsley and Tripathi, 1991), catastrophes (Chao, 1995; Kumar and Arivudainambi, 2000), mass exodus (Chen and Renshaw, 1997), and stochastic clearing systems (Artalejo and GomezCorral, 1998). In this paper, by disasters we mean all kinds of attacks that flush out all the customers in the system.
In order to analyze the proposed model mathematically, we use the continuoustime Markov chain. We make use of the matrixgeometric approach (Neuts, 1981), in particular, to obtain the steadystate probabilities. In addition, we investigate the costeffective operation, in which the optimal backup size is determined, with two types of costs taken into account: a recovery cost and a backup cost (see Section 4 for their definitions). Sample numerical examples are presented as well.
The contribution of this paper is threefold: First, we propose a mathematical model that describes the data backup process arising in computer systems under disastrous information threats. Second, we introduce a queueing model that is an interesting combination of the twostage queueing model and the disaster queueing model. Third, we provide a cost analysis for the economic operation of the data backup process, which would be useful for security management. To the best of our knowledge, this paper is the first attempt to study, based on queueing analysis, the data backup process under disastrous threats and analyze its economic operation.
The rest of this paper is organized as follows: We describe the queueing model in detail in Section 2 and mathematically analyze it in Section 3. Then we present the economic analysis together with numerical examples in Section 4. Finally, we conclude the paper in Section 5.
2.MODEL DESCRIPTION
The twostage queueing model that we analyze in this paper is described as follows: Jobs arrive at the system according to a Poisson process with rate λ. Jobs represent works, such as files or data requiring computer processing, which are assumed to be processed by the single server. In the first stage of this model, the server processes jobs individually on a firstcome firstserved (FCFS) basis. The processing time is assumed to be exponentially distributed with rate μ. After processing a job, the server passes the completed job to the second stage. Whenever the number of jobs in the second stage reaches a certain threshold, denoted by K, the server immediately moves to the second stage and processes all these K jobs simultaneously (that is, the server serves them in a batch of size K). Its processing time is assumed to be exponentially distributed with rate α. After finishing a batch service in the second stage, the server moves back to the first stage. In addition, we assume that disasters occur to the system according to a Poisson process with rate δ. Whenever disasters occur, all the jobs in the system, i.e., those in the first as well as in the second stage, are wiped out. Finally, it is assumed that the arrival processes of disasters as well as jobs, and the service times of jobs at the first as well as the second stages are independent of each other.
3.QUEUEING ANALYSIS
Let (n, m) denote the system state where n and m represent the numbers of jobs (including those in process) in the first and the second stages, respectively, for n = 0,1, 2, … and m = 0,1, …, K. Then, the stochastic process of the system state constitute a continuoustime Markov chain, defined by the transition rate matrix Q; arranging its states in the lexicographic order, we have(1)
The matrices in Q are all square matrices of order (K +1) and they are given as follows: First, the matrix A_{0} is given by A_{0} = λI, where I is an identity matrix of order (K +1). Next, the elements of A_{1} are given by

(A_{1})_{i, i} = − (λ + μ + δ) for i = 1, …, K,

(A_{1})_{K+1}, K_{+1} = − (λ + μ + δ),

(A_{1})_{K+1,1} = a
The elements of A_{2} are given by
The elements of B_{0} are given by

(B_{0})_{i, j} = δ for

i = 2, …, K,

(B_{0})_{K+1, 1} = δ + α,

(B_{0})_{1, 1} = − λ

(B_{0})_{i, j} = − (λ + δ) for

i = 2, …, K,

(B_{0})_{K+1, K+1} = (λ + δ + α) =
The elements of B_{1} are given by
Finally, the elements of C are given by
All the other elements of A_{1}, A_{2}, B_{0}, B_{1} and C that are not specified as given above, are set to be zero.
Now, let x_{(n, m)} denote the steadystate probability of the continuoustime Markov chain with its steadystate probability vector ${\text{x}}_{n}=\left({x}_{\left(n,\text{\hspace{0.17em}}0\right)},\text{\hspace{0.17em}}\cdots ,\text{\hspace{0.17em}}{x}_{\left(n,\text{\hspace{0.17em}}k\right)}\right)$ for n = 0,1, 2, …. Note that with δ > 0 , the system is ensured to be stable, so that the steadystate probabilities exist for any set of the system parameters. Then, making use of the matrixgeometric approach (Neuts, 1981), the steadystate probabilities can be obtained as follows: First, we have
The matrix R in (2) is a square matrix of order (K + 1) and satisfies the equation ${R}^{2}{A}_{2}+R{A}_{1}+{A}_{0}=0$. As a result, it can be computed iteratively as follows:
for k =1, 2, ….
Finally, x_{0} is obtained by solving the following equations simultaneously:
where e is a column vector with all of their (K +1) elements being 1.
Let N_{1}, N_{2}, and N denote the number of jobs in the first stage, in the second stage, and in the entire system in steady state, respectively. Then, the expected value of N_{1}, N_{2}, and N are obtained as follows:(3)(4)(5)
where
4.ECONOMIC ANALYSIS
We consider two types of costs in economic analysis: a recovery cost and a backup cost. The recovery cost represents the cost for recovering the jobs destroyed by disasters, and it is assumed to be charged to each of those destroyed jobs.
The backup cost, on the other hand, is assumed to consist of a fixed cost and a variable cost. The fixed backup cost represents the cost that is assumed to be incurred after each backup. The variable backup cost, in contrast, is assumed to be charged to each job that has been backed up. This variable backup cost represents the cost for purchasing an external hardware, renting a cloud facility, transmitting data to backup facility, and so on.
In this section, we consider the optimal value of K that minimizes the average total cost with both the recovery cost and the backup cost taken into account. To this end, we first define a Tcycle, in which a duration between two successive arrival points of disasters is denoted by T. Note that this Tcycle with E[T] = 1/δ is regenerative because the Markov chain starts over at the state (0, 0) upon every arrival of disasters. Note also that since the arrival process of disasters is Poisson, they see the time average of the number of jobs in the system (PASTA (Wolff, 1982)). Applying the renewal reward theorem (Wolff, 1989), we have the average number of jobs destroyed by disasters per unit time, denoted by L, as follows:(6)
Let c_{R} denote the recovery cost (charged to each job that is destroyed) and C_{R} (δ, K) denote the average recovery cost per unit time. Note that C_{R} (δ, K) is dependent, particularly on the disaster arrival rate δ and the batch size K, among others. Then, we have
In order to get the backup cost, we first let π = (π_{0}, π_{1}, …, π_{K}), where π_{j} for j = 0,1, …, K represents the steadystate probability of the number of jobs in the second stage. Then, from ${\pi}_{j}={\displaystyle {\sum}_{i=0}^{\infty}{x}_{\left(i,\text{\hspace{0.17em}}j\right)}}$ we have(8)
Next, we let U_{D} denote the expected number of (successful) backups per unit time. Then it is given by
(When a disaster arrives during the backup, recall that all the jobs in the middle of backup are assumed to be wiped out. Such a failed backup is not counted in calculation of U_{D} in (9).)
Now, we let c_{F} and c_{V} denote the fixed backup cost (incurred after each backup) and the variable backup cost (charged to each job that has been backed up), respectively. Also, we let C_{B}(δ, K) denote the average backup cost per unit time. Since exactly K jobs are backed up at every backup, we have
Next, we let C(δ, K) denote the average total cost with the given δ and K. Then, combining (7) and (10), we have(11)
Finally, the optimal value of K can be obtained from the following optimization problem:
Now, we investigate the characteristics of the cost minimization in (12) with sample numerical examples. To this end, we use the following parameters: λ = 0.99, μ =1, α = 5, 1, c_{R} = 1, c_{F} = and c_{V} =0.1. In addition, we consider five arrival rates of disasters, varying from δ = 0.001 to δ = 0.005.
Figure 1 gives the average total cost over the various backup sizes (K), in which the symbol “+” for each δ indicates the optimal backup size that minimizes the average total cost. From Figure 1, we have the following observations: First, it shows that the average total cost has a convex shape, such that an optimal backup size exists. Second, it shows that the optimal backup size increases, as disasters get less frequent. Third, it shows that as the disaster arrival rate gets higher, the economic batch size saves a considerable amount of the total cost (the convexity gets sharper). This saving decreases as disasters get less frequent.
Finally, let us consider a situation, where disaster arrival rate can be controlled as well. We consider such a situation, in which information security is reinforced, so that the disaster rate decreases from δ_{1} to δ_{2}, where δ_{1} > δ_{2}. Then the average cost saving per unit time due to this reinforcement is given by C(δ_{1}, K) − C(δ_{2}, K)
5.CONCLUSION
In this paper, we present a twostage queueing model with disasters that describes the data backup process arising in computer systems under disastrous information threats. Making use of the matrixgeometric approach (Neuts, 1981), we obtain its steadystate probabilities. From these results, we compute the average total cost with a recovery cost and a backup cost taken into account. With sample numerical results, we also investigate the optimal backup operation that minimizes the average total cost.
Recently, the damages caused by information security threats have increased rapidly with the explosive growth of data in the era of big data. Therefore, much attention has to be paid to the reinforcement of information security, which requires substantial investment in the workforce and equipment. We hope that the queueing model and its analysis presented in this paper would be beneficial to the information security investment and management for various computer systems that are vulnerable to disastrous information threats.