Mental Tasks EEG Signal Classification Using Support Vector Machine

This paper presents a result of electroencephalography (EEG) signal classification for mental tasks such as thinking forward, backward, left, and right. The EEG data in this study were recorded from Emotive device with 14 channels and 2 references. The aim of this study is to identify the most sensitive channels to the mental task classification. Prior to feature extraction, the EEG signal were decomposed using wavelet with three level decomposition. Eighteen features were extracted from the processed data. Principal component analysis (PCA) is then used to reduce 18 features into 3 principal components. The principal component were classified using support vector machine (SVM). The results show that the SVM classification accuracy of 75%.


INTRODUCTION
Human brain generates an electrical signal namely electroencephalography (EEG) that related to the level of consciousness. Frequency of EEG signal ranges from 0.5 to 100 Hz. The frequency and amplitude of EEG signal are related to the brain activity, such as concentration level, cognitive ability, and relaxation. Based on the value of its frequency, EEG signal can be categorized into five classes, which are delta, alpha, beta, gamma, and theta.
Delta wave oscillates at 0-4 cycles per second with its amplitude reaches 10 mV; slowest compared to other waves. Delta wave is produced at dreamless sleeping state. Alpha wave oscillates at 8-13 times per second with the amplitude reaches 50 μV. Alpha wave is produced when one is relaxing or daydreaming. At 13-40 cycle per second, Beta wave oscillates faster than alpha wave. Beta wave is activated when one is conscious.
A number of researches have studied and analysed the application of EEG signal processing for the purpose of medical diagnosis, biomedical engineering, and automation. For example, the research by Deon Garrett reported about comparison of linear, nonlinear, and feature selection methods for EEG signal classification. The researchers in this paper measured EEG signal from six electrodes were placed at C3, C4, P3, P4, O1, and O2 as defined by the 10-20 system of electrode placement. The subjects were asked to do a mental task, such as relaxing, numbering, lettering, rotate a three-dimensional solid, and counting The study compared some classifiers with result LDA 4.8%, ANN 52.8%, and SVM 52.3% (Phinyomark, Limsakul & Phukpattaranont 2009). Other studies that focus on the application of EEG signal for brain computer interface (BCI) are presented in (Subas 2010;Garrett et al 2003).
Present study focused on the classification of the EEG signals using wavelet decomposition, feature extraction, feature reduction using PCA and classification using SVM.

36
The EEG signals were collected from 4 mental tasks experiment such as thinking forward, backward, righ and left.
EEG, as health detector facility, is vital diagnosis infrastructure. EEG must be well manage to be always in fuction and to be designed to improve its performance. Therefore a research, on the EEG facility development, needs to be conducted continously (Soemitro & Suprayitno 2018). By developing a proper EEG diagnosis equipment especially for the wheel chair application, it will optimize the facilty asset management.
Prior to the feature extraction, wavelet decomposition is employed to process the EEG signal. Wavelet decomposition is used for signal analysis prior the Fourier transforms to remove unwanted high frequency band. Wavelet is an effective method to obtain localized information (Tkach, Huang & Kuiken 2010). Wavelet decomposition represented the signal with a certain number of coefficients from wavelet transform.

Pre-Processing
EEG is generally a non-stationary signal, the values of raw data EEG may vary over the time. This paper employed wavelet as a pre-processing step to decompose the EEG signal into a linear combination of time-scale units. The decomposition is basically produce two parts: (1) approximation coefficient (A) and detail coefficient (D). Approximation coefficient is a result from EEG signal that passes through the low-pass filter; while detail coefficient is result from EEG signal that passes through the high-pass filter.
Wavelet analysis begins by decomposing raw signal data into its component mutually orthogonal subspaces. The result of Wavelet decomposition with three level decomposition is shown in Figure.1.

Feature extraction
This study employed 18 feature extraction methods. The brief description is presented as follows :  Average amplitude change (AAC): is nearly equivalent to waveform length (WL) feature, except that wavelength is averaged (Dubechies 1992  Difference absolute standard deviation value (DASDV): is a standard deviation of the wavelength (Tsai, Yeh & Lo 2008).
 The V-order (V): V is a non-linear detector that implicitly estimates brain contraction force. It is defined from a functional mathematical model of the EEG signal generation [5,7].
 Hjorth 2 (mobility): The mobility parameter represents the mean frequency, or the proportion of standard deviation of the power spectrum (Rangayyan 2001).
 Hjorth 3 (complexity): The complexity parameter represents the change in frequency.
The parameter compares the signal's similarity to a pure sine wave, where the value converges to 1 if the signal is more similar (Rangayyan 2001).
 Skewness (Sk): Skewness measures the asymmetry of probability density function (pdf) of the signal. Similar to kurtosis, skewnes also has been used in vibration as a degradation feature of bearing condition (Park & Lee 1990).
 Autoregressive (AR) coefficients: An approach for modelling the univariate time series (Tsai, Yeh & Lo 2008;Turnip 2016: Turnip, Soetraprawata & Kusumandari 2013.  Correlation dimension: A larger correlation dimension corresponds to a larger degree of complexity and less-similarity. The correlation dimension is derived from the correlation integral presented in (Park & Lee 1990).
where i X , j X are the position vectors on attractor of the phase-space vector, l is the distance under consideration, () x  is the Heaviside step function, ( ) , k is the summation offset, M is the number of reconstructed vectors from the original vibration signal, and ) (l C is the correlation dimension.  Fractal dimension: Fractal dimension is used to measure the complexity of signals. Once the phase-space vector is obtained, the mean absolute length between the phasespace vector th j and ( th j -1) of X can be defined as follows: (Park & Lee 1990).

Feature Reduction
Feature reduction is process to extract data using linier transformation. It is used to determine the best feature that influence of process classification.
In this paper, principal component analysis (PCA) is selected for reduced feature extraction method. In PCA, represent the d-dimensional data in a lower dimensional space.
Data from feature extraction computed by d-dimensional mean factor vector μ and d × d covariance matrix Σ. Eigenvalue and eigenvector are computed according to decreasing eigenvalue. Eigenvectors e1 with eigenvalue λ1, eigenvactor e2 with λ2, and so on. The largest k such eigenvector can be determine by looking at a spectrum of eigenvectors. the k eigenvector are columns that consist a k × k matrix. The feature extraction data represent: (Caesarendra et al 2013).
Let a set of centered data input vectors xt (t =1, …,l and  xt = 0), each of which is of m dimension xt = [xt(1), xt(2), …, xt(m)] T usually m<l,st linearly transforms each vector xt as in (20) Where U is the m m orthogonal matrix whose ith column, ui is the eigenvector of the sample covariance matrix C. The C matrix can be calculated using (21) The eigenvalue problem in PCA can be solved using equation (22) where i is one of the eigenvalues of C. The components of st are then calculated as the orthogonal transformations of xt based on the estimated ui The new extracted components are called principal components. The number of principal components in st can be reduced using only the first several eigenvectors sorted in descending order of the eigenvalues.

Feature Classification
Kernel function used to build linear boundaries through non-linear transformations or mapping to finding the best classes for decision plane. The SVM select the classes with maximal margin. The SVM is a supervised learning method. It is widely used for classification and regression. SVM applies the input vectors that are non-linearly mapped a very high dimension feature space.
The data input is given the matrix x where it is consist of element xi (i = 1, 2, …, M), M is the number of samples. It is assumed that there are two classes namely positive class and negative class. The two classes are denoted by yi = 1 for positive class and yi = -1 for negative class, respectively. For linearly data, it is possible to determine the hyper plane function of f(x) = 0 splitting the given data as in (24).
The M-dimensional vector w and scalar b are used to define the position of separating hyper plane. It is created by decision function of sign f(x) to classify the input data either in positive or negative class. The constraint should be fulfilled by separating hyper plane that can be written in (25) The optimal separating hyper-plane is the maximum distance between the plane and the nearest data, i.e. the maximum margin created by separating hyper-plane. An example of the optimal hyper-plane of the two data sets can be seen in Figure 2.

41
A series data points for two different classes are presented in Fig. 2, black circle for positive class and white circle for negative class. The SVM tries to place a linear boundary between the two classes, and orients it in such way that the dash dotted line is maximized. Moreover, SVM tries to orientate the maximum of the distance between boundary and the nearest data point in each class. The boundary is located in the middle of margin between two points. Support vectors are the nearest data points used to define the margin. In Figure 2, support vectors are represented by square black circle and square white circle. In this linear system, the normal vector to the hyper plane is w and the perpendicular distance from the hyper plane to the origin is Figure.2 Classification of two linearly separable classes using SVM The noise with slack variables i and the error penalty C, the optimal hyperplane separating the data can be calculated using (26) Where i is measuring the distance between the margin, the calculation can be simplified into the Lagrangian dual problem as in (28) using Kuhn-Tucker condition.
The task is to minimize (26) and (27)  Solving the dual optimization problem, the coefficients αi is obtained which is required to express the w to solve (26) and (27). The non-linear decision function becomes (31) ,1 The SVM utilize different kernel functions such as linear, polynomial and Gaussian RBF. Kernel function defines the feature space, it is important to select the appropriate kernel function. Prior to classifying, the features were trained in SVM to define classes' category. A detail of SVM classification for PCA, ICA, and LDA is presented in (Turnip, Soetraprawata & Kusumandari 2013)..

MATERIALS
In this paper, Emotiv EEG device used to record EEG data. The device has 14 channels which are AF3, AF4, F7, F8, F3, F4, FC5, FC6, T7, T8, P7, P8, O1, O2 and 2 references such as gyro x, gyro y. Ten subjects were notified with the consent form and involved in the experiment. The location of the electrodes measured are shown in Figure.3. Each Subject were asked to do 4 mental tasks such as thinking forward, backward, right, and left. Each mental task activity was recorded for approximately 10 seconds. The experiment of each mental task activity and each subject were repeated five times. The sampling frequency of EEG signal acquistion is 128 Hz. Features are play an important role in classification accuracy. The previous study investigated 18 features which were used in ANN classifier (Caesarendra et al 2015). The result shows that the better accuracy are channel F7 and F8 with accuracy of 80% and 85%, respectively.
Present study include three nonlinear features to determine classes of EEG signal based four mental tasks. It has been known that EEG signal contain wide range frequency band and thus it's difficult to be analysed. The wavelet decomposition is used to extract the particular low frequency of EEG signal. The processed signal for feature extraction method were the third level wavelet decomposition (D3). Eighteen features were extracted from the D3 signal. The result of feature extraction for each mental tasks is shown in Figures 4-7. Number 1 to number 18 on x-axis represent the features that were explained in Section 2.2 and the value on y-axis represent the value of features. If we take one example feature extraction result i.e. AAC feature, the value of this features lay on negative except the backward mental task. Although there is a different between each mental task, the different still could not distinguish obviously. Therefore, the PCA method is need it.

Feature extraction
In this paper, principal component analysis (PCA) is selected to reduce 18 features into 3 features namely principal components. This is necessary to calculate the optimum different between each mental task to build better classification model. From the three features, the combination of two features are studied. It is found that the combination of principal component 2 (PC2) and principal component 3 (PC3) shows that the features of 4 mental tasks can be distinguished. The PCA results for training and testing process are presented in Figure 8 and 9, respectively. It can be seen from the Figure 8 that one feature of mental task "backward" was appeared in the area of mental task "forward". In order to increase the classification accuracy, more data were used in the training process than testing process.

Feature Clasification
The results of training and testing classification are presented in Figure 10. The plot combination of PC2 and PC3 was selected based on the distance between each feature. Selfminimum optimization SVM (SMO SVM) with kernel function is used to train and test these pairs. It can be seen from Figure 10(a) that there is overlapping between area of each mental task. For example, one feature of backward (red triangle) is lay on the forward area (blue circle). This overlapping phenomenon can reduce the classification accuracy.

CONCLUSION
SVM can determine decision from data testing based on data training. The accuracy of EEG signal for four mental tasks from SVM classifier is 75%. The result from SVM classifier known that EEG signal for thinking forward and backward have bad data because the data not match to decision.
From the methods to EEG classifying known that EEG signal are generally difficult to multi classifying. EEG signal should be pre-processed such as wavelet prior to classifying, is a practical and useful way to improve the accuracy. In this paper known that determination of feature calculation can be affect accuracy of classifying.