Survey Analysis on smart features selection for machine learning techniques mainly applied to EEG.

This research presents a survey for analyzing and classifying the EEG signal based on feature selection approaches. Moreover, The increasing complexity of high-dimensional medical datasets necessitates efficient feature selection methods for early disease detection and safeguarding public health. Intelligent feature selection represents an advanced stage in machine learning and innovative computer applications, as it reduces the number of features required for accurate classification. Generally, The main goal of feature selection is to improve the predictive model's performance and reduce the computational cost of modeling. This paper contains a survey of considerable research several on feature selection. The main measures to analyze this paper are Accuracy, precision, Recall, and F1-score assessment. In order to evaluate performance used stander dataset are EEG Bonn University. The results have proven that they have achieved the highest accuracy rate of around 99% compared with different techniques.


Introduction
An electroencephalogram (EEG) is a useful, inexpensive, non-invasive diagnostic used to study the electrical activity of the brain.EEG is the most common way to diagnose changes in brain functions.Electrodes placed on the scalp are used to calculate EEG data.It is used to identify and track neurological conditions like epilepsy and sleep disturbances [1].Moreover, EEG signals are utilized for many research and studies, such as lie detection, gaming applications, and others [2].The EEG dataset is high-dimensional data and contains various features used in the classification process, but not all of these features are essential to identify whether the feature is a normal or epileptic seizure.Therefore, choosing effective features increases early detection and reduces data size and execution time complexity due to their wide dimensions [3].

Motivation
In recent times, there has been a notable increase in the number of researchers focusing on the analysis of Electroencephalography (EEG) signals, owing to its significance in discovering and diagnosing brain diseases [4].The EEG signal represents a complex network comprising billions of interconnected neurons, generating a vast number of features per second.This poses a significant challenge for machine learning algorithms when it comes to classifying EEG data, as they often struggle with the high feature rate and numerous irrelevant features.Consequently, feature selection techniques aim to address this issue by identifying and selecting the most optimal features for improved exploration and exploitation [5].
However, feature selection approaches encounter various limitations, which can be primarily categorized into two steps.Firstly, wrapper feature selection methods suffer from excessive search time complexity, impeding their efficiency and practicality.Additionally, the standard Particle Swarm Optimization (PSO) algorithm, commonly employed for feature selection, faces challenges associated with the stagnation effect, which can lead to convergence towards local optima rather than the global optimum [6].These limitations hinder the effectiveness and performance of feature selection techniques in the context of EEG signal analysis.

wrapper and filter feature selection
Wrapper and filter feature selection methods are widely employed in various domains to enhance the performance and interpretability of machine learning models.These methods aim to identify and select a subset of relevant features from a larger feature space, thereby improving the accuracy and efficiency of the classification or regression tasks [7].
Wrapper feature selection methods evaluate the performance of a machine learning model by iteratively selecting different feature subsets and assessing their impact on the model's performance.This process involves training and testing the model using different combinations of features, often utilizing cross-validation techniques [8].The model's performance is then measured using an evaluation metric, such as accuracy or mean squared error.The feature subset that yields the best performance is selected as the optimal set of features.
In contrast, filter feature selection methods assess the relevance of features independently of the chosen machine learning model.These methods typically employ statistical or information-theoretic measures to evaluate the relationship between each feature and the target variable.Popular filter methods include correlation-based feature selection, mutual information, and chi-square tests.Features are ranked based on their individual relevance, and a predetermined number of top-ranked features are selected for further analysis [9].Explain in Fig 1

Survey Strategy and Evaluation
This section surveys the feature selection techniques for EEG signal enhancement.These are the most common techniques applied in Evolutionary Algorithms (EA).This survey analyzes the problem and proposed solutions, evaluates performance using some metrics, and analyzes the best results.A critical statement is shown for each approach.These studies focus on using some measures to assess performance, such as precision, Recall, Accuracy, and F1-score.Technically, compare and evaluate the best results within their groups.Furthermore, the best accuracy for all techniques within the same group has been calculated and compared with other techniques.Moreover, most researchers ignored the evaluation of processing time, so the system's performance in most techniques was ambiguous.
Efficient feature extraction and reduction of data dimensions can significantly improve machine learning algorithms' complexity, processing time, and memory storage requirements.Overcoming the limitations of feature selection approaches, such as the excessive search time complexity and the stagnation effect in PSO-based feature selection, is crucial for achieving optimal feature subsets that enhance the accuracy and efficiency of EEG signal classification [10].Future research efforts should develop novel and efficient feature selection algorithms that address these limitations, ultimately facilitating more effective analysis and diagnosis of brain diseases based on EEG data.

Feature selection approach
Moein Radman et al [11]This paper discussed detecting epileptic seizures by classifying EEG signals into two categories of seizure and conicity is challenging, as it identifies the seizures and seizure-free states of an epileptic patient.A novel fusion scheme is proposed in this study for Epileptic Seizure Detection (ESD) in brain disorders, utilizing the Dempster-Shafer Evidence Theory (DSET).Initially, various features are extracted from (EEG), and a correlation analysis using the Pearson Correlation Coefficient (PCC) is performed on these extracted features to identify and eliminate highly correlated features.
Next, three distinct filter-type feature selection techniques, namely Relief-F (RF), Compensation Distance Evaluation Technique (CDET), and Fisher Score (FS) are applied to the second feature set to rank the features based on their relevance.A range of performance metrics was employed to assess the method's efficacy utilization of several performance formulas, including accuracy, sensitivity, specificity, precision, Cohen's Kappa coefficient, and the Area Under the Curve (AUC) metric.The model was validated using experimental evaluation standard datasets EEG.gives the highest average accuracy of 99%.In fact, the researchers failed to show some metrics, such as time execution.
A. Phraeson Gini et al [12] This study centers on the issue of epilepsy, a neurological disorder that poses significant implications for brain function.The Electroencephalogram (EEG) serves as a valuable tool in the identification and prediction of epileptic seizures.This research paper presents an efficient soft computing framework designed for the detection of seizures from EEG signals.The proposed framework is built upon state-of-the-art techniques, aiming to achieve maximum accuracy in seizure detection.The pipeline of work encompasses the extraction of spectral features from the Intrinsic Mode Functions (IMF) of EEG samples.This feature extraction process is instrumental in directing the proposed framework toward efficient seizure detection.Additionally, a random forest algorithm is employed for convulsion classification, as it exhibits reliable learning behavior derived from a vast dataset of known instances.In order to evaluate performance metrics were employed to assess the method's efficacy utilization of several performance formulas, including precision tests, sensitivity, specificity, and predictivity.The model was validated using experimental evaluation the universal database such as Bonn University.The experimental results demonstrate a remarkable achievement, with the highest average accuracy recorded at an impressive rate of 97%.This outcome signifies the effectiveness and reliability of the proposed methodology in accurately classifying and predicting the targeted variables.The obtained high average accuracy underscores the robustness and proficiency of the proposed approach, positioning it as a promising solution for addressing the research problem at hand.
Zayneb et al [13] This paper addresses the issue of irregularity, a prominent characteristic exhibited by electroencephalographic signals (EEG), which necessitates the utilization of specific analysis methods for the accurate diagnosis of neurological diseases.In this context, an efficient tool employed for signal irregularity analysis is Sample Entropy.propose a novel approach for brain state detection based on EEG by utilizing Sample Entropy.Firstly, the machine learning model's design incorporates signal derivatives as a preprocessing step.Subsequently, extract features such as Sample Entropy and Standard Deviation (STD) from the EEG signals and their first and second derivatives.These features are employed to train a K-Nearest Neighbor classifier (KNN), resulting in high accuracy.Furthermore, conduct feature selection to identify the most relevant features and subsequently propose a classifier that exhibits improved accuracy compared to the initial KNN model.In order to evaluate performance metrics employed to assess the method's efficacy utilization only accuracy, The model was validated using experimental evaluation of the universal database such as Bonn University.The obtained high average accuracy of 100% underscores the robustness and proficiency of the proposed approach, positioning it as a promising solution for addressing the research problem at hand.In fact, using accuracy metrics only, the researchers failed to show some metrics, such as time execution.
Gurwinder et al [14] This research paper is dedicated to investigating epilepsy, a severe neurological disorder that necessitates the analysis of electroencephalogram (EEG) data for diagnosis.The proposed methodology in this study focuses on the detection of epileptic seizures through the utilization of multiscale entropies and complete ensemble empirical mode decomposition (CEEMD) for extracting relevant features.The selection of these multiscale entropies is facilitated by a feature selection method combining filter-based and wrapper-based elements.Performance evaluation in this research is assessed using classification accuracy, sensitivity, and specificity metrics.These metrics provide comprehensive insights into the effectiveness and robustness of the proposed methodology for epilepsy detection.The model was validated using an experimental evaluation EEG dataset by Bonn University, Germany is used in the study.The obtained high average accuracy of 98% underscores the robustness and proficiency of the proposed approach, positioning it as a promising solution for addressing the research problem at hand.
Merve Açikoğlu et al [15] research problem focuses on reducing high-dimensional data by feature selection affected and removing irrelevant features.This study utilized ten different FS algorithms to identify the most informative features from the EEG signals.The purpose of employing multiple algorithms was to explore different feature selection techniques and identify the most effective one for this specific context.By using these algorithms, the study aimed to reduce the number of features required for classification, thereby optimizing computational resources and enhancing the efficiency of the decision support system.Performance evaluation in this research is assessed using classification accuracy, sensitivity, and specificity metrics.These metrics provide comprehensive insights into the effectiveness and robustness of the proposed methodology for epilepsy detection.The dataset used in the study included EEG measurements and visual EEG annotations that were recorded from 79 term neonates at the Helsinki University Central Hospital due to clinically suspected seizures.The obtained high average accuracy performance (98.8%) among all channel differences.In fact, using accuracy metrics only, the researchers failed to show some metrics, such as time execution.
Virender Kumar Mehla et al [16].Having discussed the problem of treating individuals with pharmacoresistant focal epilepsy effectively, the brain's epileptogenic center must be precisely identified.Recently, several machine-learning methods have been created to help neurologists correctly diagnose epileptic patients.Sutrisno Ibrahim et al. [19] It has focused on detecting signal abnormalities in the EEG to diagnose epileptic seizures and autism spectrum disorders(ASD).In addition to the two traditional methods (standard deviation and power of range), the authors' suggestion of two nonlinear techniques, the greatest Lyapunov exponent and Shannon entropy, which assess complexity and disorder in EEG data, were investigated.The accuracy was used for the evaluation of the proposed model.The model was validated using experimental evaluation of two standard datasets.Various EEG datasets validate The datasets from MIT and the University of Bonn are used in the proposed design exploration methods.For a three-class (multi-channel) classification issue, accuracy was 94.6%.The accuracy scale was used only to assess the effectiveness of the proposal, The researchers failed to show some metrics, such as timely execution.
Shu-Ling Zhang et al. [20] Focus on neurophysiological system analysis to detect neurological disorders affecting the brain.
The framework for characterizing neurophysiological system complexity, on which specific weighted FPE complexity depends, is a new approach that is presented in this research.To automatically identify epileptic seizures, After being taken from the EEG, the feature (W-FPE-F) is applied.With the extreme learning machine (ELM) and support vector machine (ELM), many (SVM).The classifier's performance was assessed using various performance indicators such the sensitivity, specificity, accuracy, and precision criteria.The model was validated using experimental evaluation standard datasets, the CHB-MIT database and the University of Bonn.Gives the highest average accuracy of 99%.The researchers failed to show some metrics, such as timely execution.
V. S. Hemachandira et al. [21] one of the most common neurological conditions of individuals have is epilepsy.Moreover, Electroencephalography (EEG) is used to describe the process of diagnosing epilepsies.This study uses three wavelets: Haar, dB4, and Sym 8.A Particle Swarm Optimization (PSO) method chooses the best characteristics of epileptic convulsions.Seven classifiers, including Support Vector Machine (SVM-linear), SVM (polynomial), K-Nearest Neighbor (KNN), Gaussian Mixture Modeling (GMM), and SVM Radial Basis Function, are used to classify the retrieved features (RBF) further.In order to evaluate performances, proposal methods are examined using the benchmark metrics, including g-means, F1 Score, sensitivity, specificity, and accuracy.Bonn University used experimental evaluation standard datasets to validate the model.An accuracy achieved by this method of 98.2% with an error rate of 2% is the best result.The researchers failed to show some metrics, such as processing time and complexity, which makes the results inconsistent.
Anis Malekzadeh et al. [2] This paper discussed detecting epileptic seizures by classifying EEG signals.Several FD methods, the Hurst exponent (HE), detrended fluctuation analysis (DFA), Sevcik, box-counting (BC), multiresolution boxcounting (MBC), Margaos-Sun (MSFD), multifractal DFA (MF-DFA), and recurrence quantification analysis (RQA), were used for feature extraction.The technique known as minimum redundancy maximum relevance (mRMR) was used to pick features in the following stage.The classification step was completed using the Convolutional autoencoder (CNN-AE), support vector machine (SVM), and k-nearest neighbors (KNN) algorithms.Various performance formulas used to evaluate the effectiveness of classifiers such as accuracy, sensitivity, specificity, and F1-score metrics were used to evaluate the proposed model.Experimentally, the model was validated using two standard datasets; Experiments were conducted using datasets from Bonn and Freiburg.The results of the experiment demonstrate that for the Bonn and Freiburg datasets, the suggested CNN-AE approach obtained an accuracy of 99.736 percent and 99.176 percent, respectively.Technically, the new model has outperformed other approaches.This research achieved high accuracy results, but the percentage of change in results was low compared to other methods.Moreover, this research should have addressed the calculation of processing time.
Sijia Wang et al [22] This study focuses on the utilization of smart computing applications for an early epilepsy diagnosis.The KNN algorithm determines how far apart the data in the training and validation datasets are from one another.The Minkowski Distance metric is used by the K-NN method because it takes into account three crucial factors: the distance metric, K-value selection, and decision-making.The effectiveness of the classifier was measured using the criteria of accuracy, precision, recall, sensitivity, and specificity, as well as the F1-score.The model was validated using the experimental evaluation standard Bonn EEG dataset.The experiment results show that the proposed average classification is 100%.Weaknesses in this method are that using KNN algorithms with the Minkowski scale results in a loss in the number of features in the high-dimensional data set.
Millee Panigrahi et al [23]This research focuses on detecting epileptic seizures that occur due to a synaptic disorder in the prefrontal cortex through the use of electroencephalography.Moreover, it takes time to visually examine these signs to establish a diagnosis of the disease.The wavelet packet decomposition (WPD) approach is used in this work to change time and frequency before features are extracted as part of an integrated strategy.Using the balanced train-test split method, four alternative classification models are compared after feature extraction, with 70% of the training dataset and 30% of the test dataset used for model validation.To assess the efficacy of the suggested method using some measures of accuracy, Precision, recall, sensitivity and specificity, and F1-scoreused to predict seizures using the EEG output from Bonn University's benchmark dataset.According to the findings, WPD with SVM has an accuracy rate of 96%.Weaknesses in this method a loss in the number of features in the high-dimensional data set.
Luis Alfredo Moctezuma et al [24] This paper discussed detecting epileptic seizures by classifying EEG signals.Describe a multi-objective optimization technique based on the NSGA for epileptic seizures, a non-dominated sorting genetic algorithm categorization for electroencephalographic (EEG) channel selection.The EEG data from each channel is first separated into multiple frequency bands using the empirical mode decomposition (EMD) or discrete wavelet transform (DWT).Then, four characteristics are extracted for each sub-band: two fractal dimension values and two energy values.The performance of the classifier was assessed using performance formulae.Metrics of accuracy were employed to assess the suggested model.Experimentality applied to the benchmark dataset of the CHB-MIT public dataset.achieved an accuracy of 0.95 using all the channels, and 0.975 using just the two NSGA-III-selected channels.Weaknesses in this method used only accuracy metric, this ambiguity in results.

Analysis and Evaluation
Three important measures have feature selection to assess the outcomes of the feature selection techniques achieved by researchers: Accuracy, precision, Recall, and F1_score.Most research has focused on measuring accuracy, an essential component of the improvement process.The accuracy result is displayed in the following tables.Table 1 shows the accuracy of some feature selection approaches with their tools.Technically, the accuracy of the results obtained by all publications grouped started from 93.6% to 99%.However, several classification algorithms were used in this research and the tools used in the preprocessing process.
Moreover, all studies that utilized a wrapper method have demonstrated potential results accuracy ranging from 80.18% to 90.85%, as presented in Table 2. Bar chart figures visually represent the accuracy result reported in Tables 1 and 2 for a precise visual analysis (Figures 2 and 3). Figure 2 shows the accuracy results of some technique feature selection.

Conclusion
This research has explored and evaluated various EEG signal feature selection techniques.The performance evaluation was based on key measures such as accuracy, precision, recall, and F1-score.However, the analysis lacks information regarding the system's overall performance and processing time.Nonetheless, it is noteworthy that all feature selection methods demonstrated exceptional accuracy capabilities.Most approaches achieved accuracy rates ranging from 99% to 100%, surpassing the performance of other techniques.Notably, the absence of information regarding system performance analysis and processing time limits the comprehensive evaluation of the proposed methodologies.These aspects are crucial in assessing the real-world applicability and efficiency of the feature selection techniques.Further research could provide a more comprehensive analysis by including system performance analysis and processing time measurements.This would provide a complete picture of the proposed methodologies and their practical viability.

99
Barcelona dataset's proposed average classification accuracy is 99.44%, and the Bonn dataset's proposed average classification accuracy is 99.64%.In actuality, some measures, such as timely execution, were not demonstrated by the researchers.
[18]epileptogenic area was diagnosed in this study by using a Fourier-based signal decomposition method to find focal (F) EEG signals.A filter bank based on the discrete cosine transform (DCT) is used to separate the focal (F) and non-focal (NF) EEG signals into different Fourier intrinsic band functions (FIBFs), produced by splitting the entire signal bandwidth into equal frequency bands.Six aspects of variance, including interquartile range, complexity, kurtosis, extent, and mean frequency, are estimated from the FIBFs of the original EEG signals.The classifier's effectiveness was measured using several performance formulas using the criteria of accuracy, sensitivity, Precision, recall, specificity, and F1-score.The accessible Bern-Barcelona and Bonn EEG datasets were used to evaluate and validate the model experimentally.The experiment's findings indicate that the Bern-Yun Jiang et al.[18]This paper discussed detecting epilepsy electroencephalogram (EEG) signals for clinical application.Based on the time-frequency (TF) analysis technique, the authors suggested a novel synchro-extracting chirplet transform (SECT) epilepsy classification model.The proposed model was evaluated by Various performance measures to evaluate the classifier's effectiveness, such as accuracy, sensitivity and specificity, and MCC.The Bonn and Children's Hospital Boston-MIT (CHB-MIT) datasets, two common datasets, were experimentally evaluated to validate the model.provides the highest average accuracy of 99.29 %.Some measures, such as prompt execution, were not demonstrated by the researchers.