Journal Search Engine
Search Advanced Search Adode Reader(link)
Download PDF Export Citaion korean bibliography PMC previewer
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.16 No.3 pp.420-426

Semiconductor Wafer Defect Classification Using Support Vector Machine with Weighted Dynamic Time Warping Kernel Function

Young-Seon Jeong*
Department of Industrial Engineering, Chonnam National University, Gwangju, Korea
Corresponding Author,
March 22, 2017 June 8, 2017 July 17, 2017


Semiconductor wafer maps provide vital information and clues to monitor and better understand the quality issues in the underlying manufacturing process. In post-fabrication, each chip undergoes a series of quality checks to determine whether the chip is in functional or defective state. Since each defect pattern is unique, automatically characterizing the various defect patterns in wafer map can provide significant insights to process engineers towards mitigating manufacturing defects and improve the effective yield rate. In this paper, we present a novel data mining and optimization-based supervised learning algorithm, called support vector machines with weighted dynamic time warping kernel (SVM-WDTWK), to classify defect patterns on semiconductor wafers. SVM-WDTWK provides a flexible and robust matching algorithm for time series classification, leading to an accurate match between non-aligned time series data. We present a numerical comparison to show that the proposed SVM-WDTWK algorithm is superior to several existing techniques on defect pattern classification on semiconductor wafer maps.



    Semiconductor industry is often characterized with several unique attributes such as high product quality, short life cycle, reduced lead times, declining costs, market volatility, increased device complexity (Arisha and Young, 2005). The processes involved in semiconductor manufacturing are complex, costly and lengthy including hundreds of sequential steps that aid in the functional circuitry of the chip (Huang, 2007; Wang, 2008). Maintaining process quality and effective process control is absolutely critical to improve effective yield rates. A wafer is an elementary unit in semiconductor manufacturing and several hundred integrated circuits (ICs) are simultaneously fabricated on a single wafer (Fenner et al., 2005). After fabrication process such as etching and deposition, each chip undergoes a series of functional quality checks to be classified as either functional or defective. The most important step in semiconductor process monitoring is data collection on the quality of the chips during post fabrication process. Semiconductor wafer maps, which are a graphical illustration of the locations of defective chips on a wafer, provide valuable information on the location of defective chips graphically. The information contained in a semiconductor wafer map consists of binary codes, ‘1’ or ‘0’ to display the locations of defective or functional chips on the wafer.

    Defective chips on the wafer map can occur in ran- dom pattern or display some systematic defect patterns such as edge ring, linear scratch, zone type, and mixed shapes (Wang et al., 2006; Hansen et al., 1997). Such defect patterns contain useful information to identify root causes of the out of control process (Cunningham and McKinnon, 1998). Various factors such as uneven temperature exposure during thermal annealing or chemical aging can lead to various spatial clusters on the wafer map. Clusters also can be the result of crystalline nonuniformity, photo-mask misalignment or particles due to electro-mechanical vibrations. Stepper and/or probe malfunctioning and sawing imperfections can lead to repetitive patterns. Material shipping and handling also can leave a scratch on the wafer map (Cunningham and McKinnon, 1998; Hansen et al., 1997; Hansen and Thyregod, 1998; Taam and Hamada, 1993). Defect pattern recognition from the information obtained in wafer map has been traditionally performed through visual inspection with the aid of scanning electron microscope (SEM), this leads to a heavy reliance on the knowledge of quality engineers, domain expertise and sound judgment. This presents a strong need to develop and use automatic defect detection and classification methods using the enormous data arising from various steps of semiconductor manufacturing.

    Since the defect patterns represented on the wafer map contain important information for understanding of the ongoing manufacturing processes, several novel and important methods have been developed to study the automatic classification of defect patterns (Chao and Tong, 2009; Chen and Liu, 2000; Jeong et al., 2008; Li and Huang, 2009; Liu et al., 2002; Wang et al., 2006; Wang, 2008; Jeong et al., 2012; Yuan and Kuo, 2008). In general, defects patterns in semiconductor wafers occur clustered and not uniformly distributed. Recently, Jeong et al. (2008) proposed a spatial correlogram-based classification methodology, which combines the K-nearest neighbor classifier with dynamic time warping (DTW) distance measure for automatic defect patterns classification on semiconductor wafer maps. DTW defines the minimum distance between the two time series by allowing a nonlinear mapping of the one sequence to another.

    However, the drawback of conventional DTW is that all points in a sequence should be matched with equal weight of each point so that outliers can distort minimum distance. In order to overcome the drawback of standard DTW, Jeong et al. (2011) developed a novel distance measure called weighted dynamic time warping (WDTW) measure, which penalizes the distance between points on each sequence based on the phase difference. When defect patterns occur on wafer maps, defective chips are usually clustered at certain locations. Thus, the comparison of value in spatial correlogram at the same lag (or neighboring lags) between two spatial correlograms is more meaningful when they are compared for defect pattern classifications. In other words, phase difference between two points should be considered based on penalizing points when the distance between two points on each correlogram is calculated.

    Therefore, the main objective of this paper is to employ a technique in data mining and optimization to classify defect patterns on wafer maps. The proposed technique is based on the support vector machines with weighted dynamic time warping kernel (SVM-WDTWK), which provides a flexible and robust matching algorithm for time series classification. We evaluate and assess the performance of the proposed approach on a wafer dataset with four types of defect patterns.

    The rest of the paper is organized as follows. We introduce the basic concept of the related methodology of spatial correlogram and dynamic time warping technique in Section 2. Section 3 presents a novel classification technique, namely SVM-WDTWK, for automatic classification of defect patterns on wafer maps. The experimental results are presented in Section 4. We present conclusions and some future research directions in section 5.


    2.1.Measure of Spatial Dependence

    Because we focus on the binary map, the wafer map data is binary, where ‘1’ indicates a defective chip and ‘0’ indicates a functional chip. Thus, the spatial dependences among chips can be measured using join-count statistics. A join is formed when two chips are located in the neighborhood of each other. The number of possible joins is given by(1)

    j c = i < j δ i j


    δ i j = { 1 , ( i , j ) N 0 , elsewhere

    where (i, j)∈N implies that two chips i and j are neighbors.

    Let yi represents indicator variable where if yi = 1, then the chip is defective; conversely, yi = 0 means the chip is functional. By using yi, three types of join can be calculated as follows,(2)

    j c 00 = i < j δ i j ( 1 y i ) ( 1 y i ) j c 01 = i < j δ i j ( y i y j ) 2 j c 11 = i < j δ i j y i y j

    where jc00 : = the number of joins among neighbors that connect two functional chips, jc01 : = number of joins among neighbors that connect a functional and a defective chip, and jc11 : = number of joins among neighbors that connect two defective chips. By the definition of jc00, jc01, and jc11,(3)

    j c = j c 00 + j c 01 + j c 11

    Several existing techniques to identify defect patterns on semiconductor wafer map use spatial statistics (Cunningham and McKinnon, 1998; Hansen et al., 1997; Taam and Hamada, 1993), as pointed out by Hansen and Thyregod (1998), a single monitoring statistic is not sufficient to represent a variety of widespread patterns across the wafer map. To overcome this drawback, Jeong et al. (2008) proposed a spatial correlogram, which can be represented by using join count statistic with multiple spatial lags, for analysis of spatial defect patterns on semiconductor wafer maps. In order to create spatial correlogram, they developed a generalized join count based statistic T(d) with dth-order neighbors as follows:(4)

    T ( d ) = p j c 00 ( d ) + ( 1 p ) j c 11 ( d )

    where p is defective rate. In addition, jc00 (d) and jc11 (d) are the number of dth-order neighbors among functional chips and among defective ones, respectively. The mean and variance of statistic T(d) is given by(5)

    E [ T ( d ) ] = j c ( d ) p ( 1 p ) V [ T ( d ) ] = j c ( d ) p 2 ( 1 p ) 2

    and the standardized statistic T(d) can be approximated as normal distribution as follows (Jeong et al., 2008)(6)

    Z T ( d ) = T ( d ) j c ( d ) p ( 1 p ) j c ( d ) p 2 ( 1 p ) 2 N ( 0 , 1 )   as   j c ( d )

    where jc(d) = jc00(d) + jc11 (d) + jc01 (d).

    2.2.Dynamic Time Warping

    The dynamic time warping (DTW), which has been popular in speech and signature recognition applications, finds an optimal match between two time series data by allowing a nonlinear mapping of the one sequence to another by minimizing the distance between the two sequences (Keogh and Ratanamahatana, 2005). DTW distance makes nonlinear alignments to be possible while Euclidean distance are aligned one to one. Figure 1 illustrates the optimal warping path between two sequences determined by DTW.

    Suppose a sequence S of length m, S = s 1 , s 2 , , s i , , s m and a sequence R of length n, R = r 1 , r 2 , , r j , , r n . We create n-by-m path matrix where the (ith, jth) element of the matrix contains the distance between the two points si and rj such as d ( s i , r j ) = ( s i r j ) p , which p represents the lp norm. The best match between these two sequences is the one for which there is the lowest distance path aligning the one sequence to the other. Therefore, the optimal warping path can be found by using recursive formula given by(7)

    D T W p ( S , R ) = γ ( i , j ) p

    where γ (i, j) is the cumulative distance described by(8)

    γ ( i , j ) = | s i r j | p + min { γ ( i 1 , j 1 ) , γ ( i 1 , j ) , γ ( i , j 1 ) }

    Thus, DTWp can be seen as the minimization of the lp distance under warping.

    In addition, as a new distance measure for time series classification, Jeong et al. (2011) proposed the penaltybased DTW, called weighted dynamic time warping (WDTW), which weights nearer neighbors more heavily depending on the phase difference between a reference point and a testing point. Because WDTW considers the relative importance of the phase difference between two points, this approach would not allow a point in a sequence from mapping the further points in another one, preventing a minimum distance distortion caused by outliers.


    The goal of support vector machine (SVM) classifier is to make the margin as large as possible, and at the same time, to keep the number of points that are misclassified as small as possible.(9)

    M i n . 1 2 w 2 + C i = 1 n ξ i S u b j e c t t o y i ( w T x i + b ) 1 ξ i      ξ i 0 , i = 1 , 2 , , n

    where C(> 0) represents the trade-off parameter, minimizing the training error and maximizing the margin and the slack variables ξi corresponds to the deviation size of misclassified samples. By adding Lagrangian multiplier α and using the appropriate Karush-Kuhn-Tucker (KTT) conditions, the primal formulation of the optimization problem yields following dual form:(10)

    M a x a i    i = 1 n α i 1 2 i = 1 n j = 1 n α i α j y i y j x i T x j S u b j e c t t o i = 1 n α i y i = 0     w = i = 1 n α i y i x i      0 α i C ,

    For non-linear applications, SVM can apply the appropriate kernel function, K(xi, xj), to the dot product of input vectors. The key idea of kernel functions is to transform non-linear operations in input space xi to linear operations in the higher feature space. A kernel function can be considered as a similarity measure in the input space (Scholkopf, 2000). For example, the most commonly used kernel function is the radial basis function (RBF) given by,(11)

    K ( u, v ) = exp ( i = 1 n u i v i / σ 2 )

    The similarity of two samples in terms of RBF kernel can be interpreted as their Euclidean distance. In other word, standard SVM assumes that each sample has same dimension and aligned one to one between samples. This property of standard kernel could be a critical drawback especially for time series classification. In order to overcome this drawback, SVM with dynamic time warping kernel (SVM-DTWK) has recently been proposed for time series classification (Bahlmann et al., 2002; Lei and Sun, 2008; Shimodaira et al., 2001). In SVM-DTWK, kernel function is modified suitably as(12)

    K D T W ( u, v ) = exp ( D p ( u, v ) / σ 2 )

    where Dp (⋅,⋅) indicates the DTW distance with lp norm between two sequences u and v. In this paper, two sequences u = (u1, …, u38) and v = (v1,…, v38) are correlograms with the length of 38 generated by wafer maps. In DTW kernel, time series data are “warped” nonlinearly to determine their similarity independent of any non-linear variations in the time dimension. However, standard DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition, thus neighboring points between two sequences are more important than others. In other words, relative significance depending on the phase difference between points should be considered.

    Therefore, in this study, we present a support vector machine with weighted dynamic time warping kernel (SVM-WDTWK), which is based on a penalized DTW distance measure proposed by Jeong et al. (2011). The WDTW kernel can be expressed as(13)

    K W D T W ( u, v ) = exp ( W D p ( u, v ) / σ 2 )

    where WDp (⋅,⋅) indicates the weighted DTW distance with lp norm. In WDTW distance, depending on the phase difference | i j | between two points ui and vj, different weight value would be imposed. Thus, the optimal distance between the two sequences is defined as the minimum path over all possible paths as follows:(14)

    W D p ( u, v ) = γ w ( i , j ) p

    where γw (i, j) is the cumulative weighted distance described by:(15)

    γ w ( i , j ) = | w | i j | ( u i v j ) | p + min { γ w ( i 1 , j 1 ) , γ w ( i 1 , j ) , γ w ( i , j 1 ) }

    where w | i j | is a positive weight value between the two points ui and vj.

    In addition, in order to systematically assign weight as a function of the phase difference between two points, we present a modified logistic weight function (Jeong et al., 2011), which is defined as(16)

    w ( i ) = [ w max 1 + exp ( g * ( i m c ) ) ]

    where i = 1, … m, m is the length of a sequence and mc is the midpoint of a sequence. wmax is the desired upper bound for the weight parameter, and g is an empirical constant that controls the curvature (slope) of the function; that is, g controls the level of penalization for the points with larger phase difference. For example, when g = 0.25, the weight function follows a sigmoid pattern. In addition, all weight values are same with g = 0. In addition, the first one-half is given one weight and the second one-half is given another weight when g = 3. Even though there are several weight functions, a form of logistic weight function have showed better performance in diverse applications (Omitaomu, 2006). A form of logistic weight function assigns heavier weight to recent observations because recent data are more significant than past ones.

    The dual formulation of SVM-WDTWK by adding Lagrangian multiplier β is expressed by

    M a x β i i n β i 1 2 i = 1 n j = 1 n β i β j y i y j K W D T W ( x i , x j ) Subject to      i = 1 n β i y i = 0      w = i = 1 n β i y i φ ( x i )      0 β i C

    Note that the same algorithms to solve standard SVM can be used to solve SVM-WDTWK as well.


    For the experiments, we generated a total 640 wafers of 20 by 20-sized maps with four patterns such as circle, cluster, repetition and spot (160 wafer maps of each pattern). The generation procedure was followed by DeNicolao et al. (2003). In addition, we added eight level of random noise ranging from 0.05, 0.1, 0.15, …, 0.4. For example, Dataset {1} consisted of wafer maps with the noise level of 0.05, Dataset {2} with the noise level of 0.1, and so on. Figure 2 presents typical four classes of defect patterns and their corresponding spatial correlograms.

    The four-fold cross validation (CV) was implemented for the comparison of classification accuracy of different procedures: One nearest neighbor classifier with Euclidean distance (1-NN-ED), one nearest neighbor classifier with DTW (1-NN-DTW), one nearest neighbor classifier with WDTW (1-NN-WDTW), SVM with Euclidean distance kernel (SVM-EDK), SVM-DTWK (SVM-DTWK), and SVM-WDTWK (SVM-WDTWK). All parameters in SVM and value of g in weighting function were optimized by using validating dataset. Because the number of defect pattern has more than two classes, this study utilized a multi- class SVM for wafer defect pattern recognition. In order to apply SVM to multiclass classification problems, two approaches had been developed such that one is the oneagainst- all strategy to classify between each class and all the remaining; the other is the one-against-one strategy to classify between each pair. See the references in details for applying SVMs to multiclass classification (Chao and Tong, 2009; Hsu and Lin, 2002; Li and Huang, 2009). In this study, we utilized the one-against-one strategy because the one-against-one strategy had produced better (or considerable) performance in the previous researches (Hsu and Lin, 2002; Li and Huang, 2009). In addition, the parameter values for weighting function MLWF were set as follows; wmin and wmax are set to 0 and 1 for m = 38 and mc =19 because the maximum number of spatial lag of 20 by 20- sized wafer map was 38.

    Table 1 showed the accuracy of six techniques for both average and each fold of four-fold CV datasets. In this work, the accuracy was calculated as follows;

    Accuracy = (total number of testing data) -(total number of wrongly classified data) (total number of testing data)

    The experimental results indicated that SVMWDTWK performed strongly over other methods with an average accuracy of 93.1%. Compared with the performance between NN classifier and SVM classifier, SVM classifier yielded the consistent accuracy in each fold. In addition, DTW kernel function-based SVM demonstrated better accuracy than standard kernel-based SVM, showing that DTW kernel was promising method for defect pattern classifications using spatial correlograms on semiconductor wafers. Because the standard deviation of each method was higher, from the viewpoint of statistics, SVM-WDTWK method could not be preferable compared with SVM-EDK and SVM-DTWK, but the accuracy trend of the proposed method was promising in each dataset. Thus, the experimental results suggested that SVM-WDTWK was an effective algorithm for automatic defect classification on wafer maps using spatial correlograms.

    In addition, we presented the rational reason that the proposed WDTW was better than a conventional DTW. Figure 3 presents a group of circle pattern and the corresponding spatial correlograms. In this figure, X axis represents a spatial lag (d) and Y axis indicates its corresponding statistic ZT(d), which was described in Section 2. As shown in Figure 3, the shape of spatial correlograms was similar, but not exactly same. Thus, to classify those wafers into same class, the comparison of statistic value at the same lag (or neighboring lags) between two correlograms is more meaningful when they are compared for defect pattern classification. In the proposed approach, the higher d value, the more penalizing to points with higher phase difference to determine the optimal weights.


    Defect patterns on semiconductor wafer maps have been used for monitoring process status in semiconductor industry. However, the operative detections are still manual, error prone based on subjective criteria without any automated methods. There presents a strong need to develop automated methodologies that can aid process engineers to quickly recognize process problems and to track root causes of out of control process. We presented a novel classification technique called support vector machines with weighted dynamic time warping kernel (SVM-WDTWK) to classify defect patterns on wafers through spatial correlogram of a binary wafer map. Based on our presented approach, a classification accuracy of more than 93% was achieved. Although the results are quite promising and superior to several existing methods, improvements of our defect pattern classification approaches need further investigation. We postulate that classification accuracy could be increased with novel features. Thus a wafer bin map, which is more informative than the binary one, may be an improved feature for this problem and are being investigated.


    This research has been supported by the National Research Foundation of Korea (Grant No.: NRF-2015R1 C1A1A01051487). The author would like to thank J. Choi and H. R. Jeon (Lab. of Data Mining) from Chonnam National University in providing them with the valuable supports about experiments.



    Sequence match by using DTW.


    Typical defect patterns of wafer map and their corresponding spatial correlograms.


    Circle patterns and the corresponding spatial correlograms.


    Summary of performance comparison in terms of accuracy (Unit: percentage)


    1. Arisha A. , Young P. (2005) Simulation in semiconductor manufacturing facilities , Proceedings of Fifth International Workshop on Advanced Manufacturing Technologies,
    2. Bahlmann C. , Haasdonk B. , Burkhardt H. (2002) On-line handwriting recognition with support vector machines: A kernel approach , Proceedings of the 8th International Workshop on Frontiers in Handwriting, ; pp.49-54
    3. Chao L.C. , Tong L.I. (2009) Wafer defect pattern recognition by multi-class support vector machines by using a novel defect cluster index , Expert Syst. Appl, Vol.36 (6) ; pp.10158-10167
    4. Chen F.L. , Liu S.F. (2000) A neural-network approach to recognize defect spatial pattern in semiconductor fabrication , IEEE Trans. Semicond. Manuf, Vol.13 (3) ; pp.366-373
    5. Cunningham S.P. , McKinnon S. (1998) Statistical methods for visual defect metrology , IEEE Trans. Semicond. Manuf, Vol.11 (1) ; pp.48-53
    6. DeNicolao G. , Pasquinetti E. , Miraglia G. , Piccinini F. (2003) Unsupervised spatial pattern classification of electrical failures in semiconductor manufacturing , Proceedings of Artificial Neural Networks Pattern Recognition Workshop, ; pp.125-131
    7. Fenner J.S. , Jeong M.K. , Lu J.C. (2005) Optimal automatic control of multistage production processes , IEEE Trans. Semicond. Manuf, Vol.18 (1) ; pp.94-103
    8. Hansen M.H. , Nair V.N. , Friedman D.J. (1997) Monitoring wafer map data from integrated circuit fabrication processes for spatially clustered defects , Technometrics, Vol.39 (3) ; pp.241-253
    9. Hansen C.K. , Thyregod P. (1998) Use of wafer maps in integrated circuit manufacturing , Microelectron. Reliab, Vol.38 (6-8) ; pp.1155-1164
    10. Hsu C.W. , Lin C.J. (2002) A comparison of methods for multiclass support vector machines , IEEE Trans. Neural Netw, Vol.13 (2) ; pp.415-425
    11. Huang C.J. (2007) Clustered defect detection of high quality chips using self-supervised multilayer perceptron , Expert Syst. Appl, Vol.33 (4) ; pp.996-1003
    12. Jeong Y.S. , Jeong M.K. , Omitaomu O.A. (2011) Weighted dynamic time warping for time series classification , Pattern Recognit, Vol.44 (9) ; pp.2231-2240
    13. Jeong Y.S. , Kim S.J. , Jeong M.K. (2008) Automatic identification of defect patterns in semiconductor wafer maps using spatial correlogram and dynamic time warping , IEEE Trans. Semicond. Manuf, Vol.21 (4) ; pp.625-637
    14. Jeong Y.S. , Jayaraman R. , Lee K. (2012) Defect patterns classification on semiconductor wafer using multiclass support vector machine with dynamic time warping kernel , Proceedings of the IIE Asian Conference 2012,
    15. Keogh E. , Ratanamahatana C.A. (2005) Exact indexing of dynamic time warping , Knowl. Inf. Syst, Vol.7 (3) ; pp.358-386
    16. Lei H. , Sun B. (2008) A study on the dynamic time warping in kernel machines , Proceedings of Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, ; pp.839-845
    17. Li T.S. , Huang C.J. (2009) Defect spatial pattern recognition using a hybrid SOM-SVM approach in semiconductor manufacturing , Expert Syst. Appl, Vol.36 (1) ; pp.374-385
    18. Liu S.F. , Chen F.L. , Lu W.B. (2002) Wafer bin map recognition using a neural network approach , Int. J. Prod. Res, Vol.40 ; pp.2207-2223
    19. Omitaomu O.A. (2006) On-Line learning and waveletbased feature extraction methodology for process monitoring using high-dimensional functional data, Ph.D. Dissertation, University of Tennessee,
    20. Scholkopf B. (2000) The kernel trick for distances, Technical Report, Microsoft Research,
    21. Shimodaira H. , Noma K. , Naka M. , Sagayama S. (2001) Support vector machine with dynamic timealignment kernel for speech recognition , Proceedings of Eurospeech, ; pp.1841-1844
    22. Taam W. , Hamada M. (1993) Detecting spatial effects from factorial experiment: An application from IC manufacturing , Technometrics, Vol.35 (2) ; pp.149-160
    23. Wang C.H. (2008) Recognition of semiconductor defect patterns using spatial filtering and spectral clustering , Expert Syst. Appl, Vol.34 (3) ; pp.1914-1923
    24. Wang C.H. , Wang S.J. , Lee W.D. (2006) Automatic identification of spatial defect patterns for semiconductor manufacturing , Int. J. Prod. Res, Vol.44 (23) ; pp.5169-5185
    25. Yuan T. , Kuo W. (2008) Spatial defect pattern recognition on semiconductor wafers using modelbased clustering and Bayesian inference , Eur. J. Oper. Res, Vol.190 (1) ; pp.228-240