Build Network Intrusion Detection System based on combination of Fractal Density Peak Clustering and Artificial Neural Network

Imbalanced data poses a serious problem in intrusion detection systems. In this article, we propose a network intrusion detection system based on fractal density peak clustering and an artificial neural network (FD-ANN). The proposed detection system consists of three parts: data clustering based on the density-peak clustering (DPC) method, using the fractal concept as a membership weight of all data to the cluster, and a neural network to classify the data. The DPC method uses categorization of the tare data into subgroups with strongly correlated attributes to reduce the size of the training data and the imbalance of the sample. Each subgroup has its neural network to train the data. Based on fractal membership weights, the output of all classifiers of the sub-neural networks is combined using the aggregation function. The benchmarks of this model are based on the data sets NSL-KDD and UNSW-NB15. The proposed solution outperforms other known classification approaches in terms of overall accuracy, recall, precision, and F1 score. https://doi.org/10.29304/jqcm.2023.15.1.1151


Introduction
The popularity of the Internet led to the need to share and access information, and this popularity led to an increase in the amount of information and resource consumption.As data is stored in the cloud or various databases, it is crucial to protect it and ensure its security [1].Personal information is an essential component that needs to be protected from attacks.Nowadays, many organizations use various strategies to authenticate and authorize access to data that should be kept secure and confidential [2].Network attacks are becoming more numerous and diverse: ransomware is increasing like never before, and zero-day exploits are becoming so critical that they are receiving media attention [3]- [5].
Antivirus programs and firewalls are no longer sufficient to protect an enterprise network, which must be covered with multiple layers of security.On the other hand, the massive increase in computer network usage and the development of applications running on different platforms has drawn attention to network security [6].An intrusion detection system (IDS), which protects the network environment, is an important aspect of network security.When an IDS is used to detect threats over a network, it is called a NIDS [1], [7].In general, NIDS is classified into two types: The first type is a signature-based or misuse-based intrusion detection system (SIDS), which detects attacks in network traffic based on recognized patterns.These patterns reveal a series of activities that have been collected and stored in a database.The pattern database is responsible for identifying network behavior by detecting only attacks whose patterns have already been stored in this database [1].
Anomaly-based intrusion detection systems (AIDS) are the second type [8].This system is based on network behavior as the primary parameter for analyzing network transactions.The learned system accepts network transactions with predefined behavior; otherwise, it issues a warning.The ultimate goal of a network intrusion detection scheme for a flooding attack is to have an efficient technique to track and mitigate such an attack on the victim side [9].In our work, many crucial questions are raised to be answered by analyzing, modeling, and implementing a hybrid scheme to achieve an excellent result for multi-classification and detect the low rate attack using data mining clusters and neural network analysis.
The main motivation of this study is to develop an attack mechanism for information transmission over networks and to reduce the impact of zero-day attacks.Technically, the main component of IDS is a machine learning based on the amount of data it has been trained on, i.e., it depends on the amount of knowledge extracted from the training data.Machine learning is not able to predict a small number of categories, so it is necessary to weigh the data before making a decision.The problems based on the NIDS we can summarize in the following: A firewall cannot detect most attacks because some attacks involve a large number of ICMP messages that block the bandwidth of the victim network or route unauthorized traffic through ports, such as ICMP flooding and TCP flooding attacks.This leads to early damage to an organization's resources, making early detection by the firewall difficult [10].
Selecting a high-quality network dataset to train the NIDS system is one of the most difficult challenges in analyzing the credibility of NIDS theories [11].
The presence of missing values in the data under study is a common problem in the intrusion detection dataset, which makes it impossible to analyze the data correctly.Therefore, it is necessary to identify and correct missing values in data mining before performing any analysis [12], [13].
Multi-classification and low rate attack detection rate is a big challenge because the actual network environment is unbalanced, which means that the threat records in the network traffic are lower than the regular records; at the same time, the classifier is biased against more frequent data, which lowers the detection rate of records of low rate attack [14].
The proposed model (FD-ANN) uses robust data mining cluster analysis (DPC) algorithm to solve the multiple class imbalance problem and increase the detection rate of low-rate attacks.A complex dataset from a real network environment is analyzed by improving the (DPC) algorithm in terms of its distance metric.The time and computational complexity are reduced by proposing a new membership matrix FFM based on the fractal factor.Definition of the performance evaluation of the proposed scheme in terms of the different types of classification attacks and the detection rate of low-rate attacks.The obtained results are validated with other works to show the effectiveness of the proposed scheme.

Related Works
In this section, we discuss several papers by researchers who have worked on intrusion detection techniques.Many researchers have used data mining clustering techniques such as point assignment and neural or deep network classifiers as hybrid techniques to build IDS.The main reason for the decrease in true-positive rate in several recent models is due to two main reasons, either the non-use of data balancing techniques in the training phase or the use of the inefficient built-in membership function method.
Yanqing et al [15], combined the modified density peak clustering method with Deep Belief Networks to develop a hybrid intrusion detection technique.MDPCA is a technique for detecting similarities in complex and large network data.To reduce the size of the training set and eliminate the imbalance of the training samples, MDPCA is used to divide the set into numerous subgroups with similar attribute sets.For training, each subgroup is given its sub-DBNs classifier.The membership weights are used to aggregate the output of all sub-DBNs classifiers.The MDPCA-DBNs performance is evaluated using the NSL_KDDTest+, NSL -KDDTest-21, and UNSW_NB15 datasets.Using kernel methods with deep learning, the model produces a fairly good accuracy, but with high complexity Chaofei et al. [16], have proposed an intrusion detection system based on the Stacked Auto-Encoder (SAE), an attention mechanism, and a deep neural network DNN (SAE-DNA).The attention mechanism helps the network to have a strong intrusion detection framework because the SAE represents data with a latent layer.The SAE encoder can automatically extract features, but also improve the detection accuracy of the DNN by initializing the weights of the DNN potential layers.The NSL KDD dataset was used to evaluate the performance of SAE-DNN in binary and multi-classification.In multi-classification, the model SAE-DNN outperforms machine learning approaches such as random forest and decision tree by identifying regularly and attacking symmetrically.The limitation of the proposed SAE-DNN is that the method does not use the construction mechanism between attack classes, which weakens the algorithm's ability to deal with zero-day attacks.Sydney et al [17], analyzed the UNSW-NB15 intrusion detection dataset used to train and test the models.They use the K-mean clustering with the XGBoost algorithm and apply a filter-based approach to feature reduction.They examined both binary and multiclass steps in the experiment.The authors found that using the XGBoost-based feature selection strategy allows the classifiers to improve their test accuracy.Using static K-mean methods is not suitable for work with intrusion detection environments.in [18] , the authors develop two models for network intrusion detection based on Deep Learning and two approaches to data preprocessing, a simplified preprocessing approach and a hybrid two-stage preprocessing strategy, to produce meaningful features.These models use the CNN paradigm.They create binary and multiclass classification models.The proposed method combines dimensionality reduction with feature engineering through deep feature synthesis.Two benchmark datasets, the NSL_KDD, and the UNSW_NB15 dataset are used to evaluate the performance of the models.The limitation of using CNN with intrusion detection, it fails to encode object orientation and position.data in various positions are challenging for them to categorize.

Density Peak Clustering (DPC) Algorithm
The Density Peak Clustering (DPC) algorithm is one of the Density-based methods.This algorithm discovers clusters based on a high local density of data points [19].The DPC deal with continuous regions.The original Density Peak Clustering (DPC) algorithm starts by defining the distance between two data objects to calculate the density and separation distances from other data objects with higher density [20].Let  = {x1, x2 . . .x} be a dataset with  data points.Each x has  attributes.Therefore,  is the jth attribute of data point x.instead of Euclidean distance used to calculate the distance between the data point's x and x can be expressed as follows [21]: Equations ( 2) express the calculation of local density for each point in the given dataset.
Where ,  is the distance between points i and j and Cd is the initial specified parameter as the cutoff distance to determine the neighborhood radius.The second step is to calculate separation distance , which works to find the minimum distance between the point xi and any other point of higher density.Equation (3) shows the calculation of the separation distance  [21]: This method identifies the cluster centers by anomalously high  and  after the local density and separation distance for each point have been computed.The Peak points (center points) are selected using a two-dimensional decision graph with a local density as the horizontal axis and separation distance as the vertical axis [19].After determining the center points, the remaining points in the dataset will be assigned to the nearest center.

Artificial Neural Networks (ANN)
The models inspired by human brain processes are called Artificial Neural Networks (ANNs) [22].They are members of the parametric classifier family, with parameters such as several neurons/layers, weights, and biases, and a collection of hyper-parameters such as several epochs, learning rate, and batch size.ANN is typically made up of an input layer, some hidden layers, and an output layer.Lastly, learning models must be tested once the training is completed to determine their effectiveness.A Multi-Layer Perceptron is a simple feed-forward neural network that differs from a linear perceptron in terms of activation function and layer count.It has three layers: an input layer, a group of hidden layers, as well as an output layer [23], [24].Its neurons use nonlinear activation functions and are trained using backpropagation.The Multi-Layer Perceptron algorithm is technically based on two operations: First, the Forward Propagation method is used to make a prediction, which is calculated using Equations ( 4) and ( 5) as shown below [25]: Where   is show the result of net inputs xi multiplied by interconnection weight   ,  is the activation function, n number of outputs, and   ′ Means the actual output of the net.
The next operation is to train data based on the Back Propagation (BP) method, which is responsible to updates the weights of MLP.After each training epoch, the BP method will update the weights based on product error.The error is calculated as Equation ( 6):

Proposed Fractal Density Peak Clustering and Artificial Neural Network (FD-ANN)
The proposed FD-ANN system consists of three phases: prepeoceing, clustering, predaiction by using artificial neural network.Figure 1 shows the main steps of FD-ANN.

Preprocessing
In this step, hot coding is proposed to convert the nominal attributes into numeric attributes for two datasets and analyze the network characteristics for normal and attacking traffic.The non-numeric attributes were converted to numeric attributes using a one-hot coding method.Normalizing the dataset for specific scaling values of each attribute in a certain range is a necessary step of the preprocessing phase.This strategy has the advantage of removing biases from the datasets without affecting their statistical properties.To train the proposed model, the data are divided into two groups: Train and Test.

Clustering
The core idea of DPC for computing cluster centers is based on measuring local density and separation distance.These two metrics are based on the distance between any two points in the dataset.The original DPC calculates the distance between two data points using the Euclidean distance metric.However, the Euclidean distance leads to misclassification when the dataset is complex and has high-dimensional features.Therefore, the Gaussian kernel distance is used to calculate the distance between two points in high-dimensional data.This is the most important step in handling geometry, increasing accuracy and decreasing false alarm rate.

Predication test data by Fractal Membership function (FFM) and ANN
After assigning each point in the train data to the most appropriate cluster, we calculate the Membership function between the test sample and data point in each cluster.To find the FM needs to visit all points in each cluster.This process potentially requires some processing time and computational complexity, therefore, a novel FM based on fractal factor is proposed to overcome the time consuming and computational complexity.This process will extract subcultures (FDC 1 ′ , FDC 2 ′ , … .FDC  ′ ) from the main clusters (DC 1 , DC 2 , … .DC N ).The new subcultures have high similarity in behavior to the test item (test sample).The FFM has inversely proportional with the distance (d) between the nearest point in sub cluster (FDC  ′ ) and the test point.i.e. the point in a cluster with a small distance value to test item (TEST SAMPLE), the FFM degree will be greater than the point in a cluster with a high distance value. .Algorithm 1 illustrates the Fractal Membership function.The result of the previous step is two sub-clusters DC1, DC2 from the training dataset.Each sub-cluster is used to train the corresponding ANN classifier.Since they were trained on different subsets, these ANNs are different from each other.An input layer, hidden layers and an output layer are different in each ANN.Aggregation is the process of joining modules (by linking the output of one module to the output of another).In our proposed system, the production of each ANN is multiplied by the corresponding membership and then another production of ANN.The result of predicting the test sample xj in each sub classifier ANN i is defined as ANNi (xj).The proposed FFM reduced the search space for the point the used to represent cluster.Figure 2 shows the scenario of pick up the point in cluster i the used in constructing the Fractal Membership function (FFM).

Aggregation Model
Aggregation is the process of connecting modules (by linking outputs of one module to output of another).In our proposed system, the production of each ANN is multiplied with corresponding fuzzy membership, then another production of ANN.The predictions result of the test sample  in each sub-ANN i classifier is defined as ANNi (xj).Fig. 3 -Illustrates The Aggregation Method Predict Final Output.

Data Sets Description
NSL_KDD [26], UNSW_NB15 [27] datasets consist of a large number of packets; the first one contains four major attacks, while the second has two million packets and contains nine major attacks.The features of these datasets include TCP/IP header information of TCP/IP suite.The following subsections present the details of these datasets.

NSL_KDD Data Set
The

UNSW_NB15 Data Set
The Australian Centre for Cyber Security (ACCS) developed this dataset to test network intrusion detection.It solved the problems of older benchmarks datasets like KDD_CUP99.The UNSW_NB2015 dataset contains a combination of real-world and synthetic attack activities of the network traffic.In comparison to the NSL_KDD dataset, the UNSW_NB15 has more forms of attacks.The UNSW_NB15 consists of two million packets; each packet has 49 attributes with the class label in the whole dataset, while the partitions UNSW_NB15_training-set.csv and UNSW_NB15_testing-set.csv is configured as partitions of the UNSW_NB15 dataset with only 42 attributes pulse label of class.The partition dataset has six attributes (srcip,, sport,, dstip,, dsport,, Stime, and Ltime,) removed from the total data set.In our proposed system, utilize the random state method to choose 20% of train data from a partition UNSW_NB15_training set (35069) as training data and 20% of test data from a partition UNSW_NB15_testing set (16466) as testing data in our proposed system such as table (2).

Evaluation Metrics
A confusion matrix can calculate the performance measures for intrusion detection.The Confusion Matrix is a tool for summarizing a classifier's predicted performance on test data.The confusion matrix is used to determine all standard measures, such as True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).True Negative (TN) is the number of cases successfully forecasted as Normal class in cyber security, while True Positive (TP) is the number of instances correctly predicted as attacks, The number of normal occurrences identified as an attack is known as False Positive (FP), while the number of attack instances classified as normal is known as False Negative (FN).To evaluate the proposed system, standard metrics are used, such as measures like accuracy, precision, recall, and F1 score, while the Silhouette Index measure is presented to evaluate the quality of clusters [27].
Where x(i) is a lower distance among cluster and object and y(i) is the higher distance between object and clusters.

Analysis FFM on the performance of clustering model
After the preprocessing phase, we used an DPC algorithm to dynamically divide the training dataset into two clusters using a decision and sorting graphs based on feature similarity to break the training dataset's imbalance and improve the detection rate of low rate classes.The process of determining the peak points is based on plots of all data points in a decision graph and sorting graph, as shown in figures (4) for the NSL_KDD data set and (5) for the UNSW_NB15 data set.The training data will distribute for each peak according to the most feature similarity, which breaks the imbalance of train data.Experiments are carried out to evaluate IDPC's performance selection for both datasets, which enhance detection performance and helps to increase the homogeneity of training data distribution of the clustering process.After assigning each point in the training data to the most appropriate cluster chosen in the previous steps, the next step is to calculate the fractal membership matrix.In general, the system searches for the nearest point to test point (  ) in each cluster (  ).This operation required to visit each point in the cluster  and examine them to select the nearest point.Therefore, it consumes a large amount of time and computational complexity.So, we proposed gathering data with the same approximation behavior of the test point in one sub-cluster of the cluster  .The novel fractal membership matrix () is proposed for this purpose.It is technically based on the fractal phenomenon, rather than calculating the distance between each test sample and all the training data points in each cluster, which is computationally complex and time-consuming.To solve this problem, we apply the fractal factor () between the test sample   and each point in clusters   .The delta-F determines the similarity ratio between the fractal of (  ) test and fractal points in the cluster (  ).The different values of the proposed fractal metric are tested.The weights used to test Delta-F are 0.2, 0.4, 0.6, and 0.8.The optimal value is 0.2. Figure (8) shows the effect value of Delta-F on time required to build the membership matrix in the dataset (20 Percent Training Set of NSL_KDD).The experimentation that shows in the table (3) using (25192) records as the (20% from full NSL_KDDTrain+) as training data and (NSL_Test-21), (NSL_ Test+) as testing data to approve how the proposed fractal metric minimizes the time required for building the membership matrix.

ANN Classifiers Analysis in a Training Stage for both Data Sets
Each cluster will be trained on a separate ANN classifier.The structure for both NSL_KDD and UNSW-NB15 data sets are: The number of the hidden layers are two, The optimizer of each ANN is Adam, ReLU is the activation function that is used of the hidden layer, softmax is the activation function that used for the output layer.
The number of neurons for hidden layers in the NSL_KDD data set is [80, 10,], while the number of neurons for hidden layers in the UNSW-N15 data set is [120, 20,].

Results Comparison
It is clearly shown that the result of the proposed system in Table (4) presents the overall performance based on accuracy, Precision, Recall and f1scor for the system for three testing data sets.The experimented on these data sets shows how well our proposed system performed.We compared the results to five established machine learning classifiers such as K-Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT) and Artificial neural network (ANN).The Comparison Results are evaluated based on the four evaluation metrics (Accuracy, Precision, Recall, f1scor).The results are depicted in tables ( 5), ( 6) and (7).

Comparison with the Related Works
To prove the case of handling the imbalance problem of multi classes by enhancing the detection rate of low rate classes such as R2l, U2r, Analysis, Backdoors, Worms, and Shellcode with better performance of our proposed system.We compare our proposed system to nine intrusion detection models.On the data sets (NSL KDDTest-21), (NSL KDDTest+), and (UNSW NB15), our proposed system outperforms based on the Accuracy, Precision, Recall, f1scor, and detection rate of low rate classes.The Comparison results based on these datasets are shown in tables ( 8), (9), and (10), respectively.

Fig. 2 -
Fig. 2 -Example of points are selected for Fractal Membership function

Figure ( 6 )
shows two Clusters of the NSL_KDD training dataset and Figure (7) Clusters Description Of UNSW_NB15 Data Set With Full Features.

Fig. 8 -
Fig. 8 -Delta-F Based On Time Required To Building The Membership Matrix.

Table 1 -Description Of 20% NSL_KDD Training And Full Testing Data Set Files. Training Dataset Testing Dataset Category 20 % from full NSL_KDDTrain+ NSL_KDDTest-21 NSL_KDDTest+
NSL_KDDTrain+ dataset has 22 different sub-types of attacks aggregated to four major types.In comparison, the (testing) NSL_KDDTest-21 and NSL_KDDTest+ datasets have 37 other sub-types of attacks, which means the testing dataset has an additional 15 novels, unknown attacks not available in the training datasets (1)_KDD data set is not the most recent, but it is a new version that addresses the shortcomings of the KDDCup 1999 data set.The NSL_KDD data set consists of four files are: the first file is called NSL_KDDTrain+ for training, which is the complete training data; the second file is called 20 Percent Training Set, which is also for training data.Moreover, two files for testing are called NSL_KDDTest+ and NSL_KDDTest-21.The description of NSL_KDD training and testing of our system is shown in table(1).

Table 6 -results comparison of different algorithms using NSL_KDDTest+ dataset. Model Dos Normal Probe R2l U2r Accuracy Precision Recall f1scor
(6)shown in table(6), our proposed system for the (NSL_KDDTest+) dataset shows a high detection rate of low rate classes such as U2R and R2L attack types, which has detection accuracy of 22.0% and 74.9%, respectively.It is produces a higher accuracy of 89.96% than the other compared methods.

Table 7 -results comparison of different algorithms using UNSW_NB15 dataset.
(7)shown in table(7), our proposed system for the (UNSW_NB15) dataset shows a high detection rate of low rate classes such as Analysis, Backdoors, Shellcode, and Worms attack types, which have detection accuracy of 88.7%, 96.4%, 45.2% and 62.5%, respectively.It produces a higher accuracy of 94.92% than the other compared methods.