Eye Blinking for Command Generation Based on Deep Learning

)


Introduction
Nerve cells become damaged in illnesses such lock-in sickness, stroke, and "amyotrophic lateral sclerosis (ALS)".People with these disorders have limited mobility or are completely immobile, and they have no control over any portion of their body save eye movement and blinking.These persons are unable to speak, write notes, or express ideas to others around them since they have lost control of all muscle fibers.In this scenario, eye movements appear to be the most appropriate, if not the sole ,means of communication.Eye-blinks have been used extensively in investigations up to this point, particularly for control purposes [1].A considerable number of new articles connected to a variety of relevant applications show the recent interest in eye state and blinking detection.Eye monitoring and eye state detection, for example, are used by driving assistance systems to gauge driver attention in the psychology field [2].Blinking is a natural biological action in which the eyelid closes quickly and semi automatically.Dynamic eyelid folding is used to investigate a specific blink.It is an important function of the eye that aids in the dissemination of tears and the removal of irritants from of the depths of the cornea [3].Speech difficulties can be caused by medical conditions such as a stroke or paralysis .It can also happen as a result of incidents that cause people to lose their ability to communicate.According to recent studies in the United States of America, over a million people are affected by such issues.Mouth actuated joysticks, tongue motion analysis, switch positioned near user's head, actuated breath blowing straws, and more technologies exist to create patient communication.However, because using the equipment necessitates professional work, these prove to be costly and unpleasant [4].
There are two types of techniques for classifying eye conditions: non-image-based and image-based methods.Non-image-based methods use data from "electroencephalography (EEG)"," electromyography (EOG)", and "electromyography to categorize eye disorders (EMG)".Although data acquisition is faster, it does have certain drawbacks, including the requirement to wear sensors, which can be uncomfortable.Some image-based strategies have been developed to overcome the disadvantages of non-image-based methods for categorizing eye diseases [5].The method proposed in this paper is one of the image-based methods.images Acquire from webcam are in RGB color space.RGB model consists of the three additional material primary of red, green, and blue These extraterrestrial components are used to make any other color, A color image is a collection of numbers characterized by a set of rows and columns containing a significant amount of data.[6][7].
"Artificial Neural Networks" (ANNs) have been used to solve a variety of issues [3,5], demonstrating that they outperform standard methods when coping with messy or missing data [8].In today's world, deep learning models employed in a variety of domains, including picture classifying and pattern matching [9].To address the constraints of manually features extraction learning depends, deep learning (DL) based classification algorithms have recently been developed.The application of these DL approaches has been made possible by recent advances in processing abilities and the introduction of high computing power Graphics Processing Units (GPUs).Deep belief networks [10], stacked auto encoders [11], and convolutional neural networks [12] are examples of DL designs that have been shown to give a good accuracy in the fields of "computer vision", "image processing", and "natural language processing" [5].

2.Related Work:
Udayashankar A and et.al propose a study in 2012 , The major goal of their study is to create a real-time interactive system that will help paralyzed people operate appliances like lights and fans, as well as play pre-recorded audio messages, by blinking a certain amount of times.To identify the blinks of the eyes, image processing methods were used.Face tracking is achieved in their method using a set of learned Haar cascade classifiers, and eye tracking is done using a template matching technique [13].
In 2017 , Essa R. Anas and et.al proposed a new method for detecting ocular state is offered.This novel method, unlike the majority of previous methods, is not dependent on an explicit eye appearance model.The detection is instead based on deep learning, in which the discriminant function is trained from a large set of exemplar photos of eyeballs in various states, appearances, and 3D positions.The Convolutional Neural Network (CNN) architecture is used in this technique.To evaluate the suggested method's effectiveness, it was compared to two methods: The CNN architecture was then tweaked to work on a three-class issue with "opened," "closed," and "partially-opened" classes [2].
In 2019, Kapil Juneja and Chhavi Rana presented a model to detect potential eye blinks.The model is divided into three stages.In first stage, frame resemblance evaluation, background separation, positional, and mathematical filters are combined to recognize the appropriate eye region on distinctive frames.In the second stage, the model accepts real-time facial video as input and extracts the effective frames and eye-region.The statistical filters are used in the beginning to determine the effective frames by analyzing the frame dissimilarity.To segment the eye region, the functional frames are evaluated using positional or statistical filters.The findings revealed that the proposed approach reduced the potential for created mistakes and accurately detected eye blinks [14].
The best way to avoid visual problems caused by digital screens is to take appropriate preventive measures, such as seeing an eye doctor on a regular basis.In 2019, Sree Sharmila T. and etal.suggested employing the Viola Jones algorithm for eye detection, back-ground subtraction for eye blink diagnosis, and gradient-based corner detection, and it is able to detect frequent cases of weariness associated with prolonged computer use by measuring the eye blink rate.As a result, the suggested approach has the potential to considerably reduce symptoms among regular computer users, resulting in better health behaviors [15].
By developing an eye blink detection method, M. H. Baccour and etal.In 2019 proposed algorithm for driver sleepiness monitoring.The algorithm's examination revealed that it is resistant to inter-individual variances and performs consistently in both the awake and drowsy states.The proposed algorithm should be tested in realworld driving scenarios in the future.An adaptive camera-based eye blink detection system is presented in this research for determining the level of tiredness while driving.The data for this investigation was gathered utilizing a remote camera during driving simulator exercises.We will extract features from the recorded blinks and develop classification algorithms to detect drowsiness based on blinking behavior in future research [16].
Driver drowsiness is one of the major causes of accidents and fatal road crashes, causing a high human and economic cost.In 2020, A. Arcaya and et.al. propose embedding a convolutional neural network (CNN)-based solution in smart connected glasses to detect eye blinks and use them to determine drowsiness level.This innovative solution is compared with a more traditional method based on a detection threshold mechanism.Results demonstrate that CNN outperforms the accuracy obtained by the threshold-based algorithm by more than 7% [17].
In 2021 Pothuraju Vishesh and et.al present a study to identify tiredness in drivers, we used a CNN, deep learning principles, and image.The mobilenet V2 is utilized as a base for training the blink detection algorithm.RMSprop was employed as the training loss function, while binary cross entropy is being used as the optimizer.To perceive and pre-process the detected faces, the dlib facial landmark was used.The dataset for the deep learning model was taken via Nanjing " University of Aeronautics and Astronautics (Xiaoyang Tan) " The anticipated method delivers a 97 percent accuracy obtained from the experiments.The prototype created will be used to further refine this method in order to improve traffic safety [3].
In 2021 , Nada B. Jarah assess the efficacy of a new structure for distributed deep learning proposed by her by comparing between learning with a Convolutional Neural Network (CNN) and learning with a distributed learning system.In comparison to using a single sensor for all computations, the proposed method achieves decentralization [18].
In 2021 , S. N. Deshpande and et.al Build a computer vision program capable of detecting and calculating the length of eye blinks in streaming video, then determining if the blink is a dash or a dot based on the time span of the blink.The sequence of dashes and dots will be saved in an array and subsequently decoded into conventional text.The text will be transformed to audio using the pyttsx python package.Our technology will deliver a message to another individual in this manner [19].

Proposed System :
In the proposed system, a video file acquired through the web camera has been adopted, and the person should be directly facing the camera.Figure (1) show the block diagram of presented command code generation system for paralyzed people.

CNN architecture :
In this research , a new CNN model has been proposed In this paper, a new structure of the convoluted network was presented, which consisted of six layers (three convolutional layers and three max pooling layers) for learning features (edges) and four layers for binary classification, each convolutional layer followed by nonlinearity by applying the activation function ReLU to the output of convolution process .
-The input stage of the CNN consists of images of dimension 86×86×3 -Convolutional Layer: The two First convolutional layer consist of 32 convolutional mask filter with 3×3, Third convolutional layer consist of 40 convolutional mask filter with 5×5 -Max pooling layer: The pooling process aids in extracting features which do not alter with transitional transitions, and reducing feature size improves adaptation by adjusting parameters.The pooling size in max pooling layers is 2x2 .That is, the feature matrix produced by the convolution process will be divided into blocks of size 2×2 and the maximum value will be chosen between them, and thus the size of the feature matrix will be reduced to half.
-Fully connected layer :The first and second of three FC layers used in this architecture consist of 128 and 64 units, respectively, and the activation function was ReLU.The third FC layer contains one unit, indicating that the number of output nodes matches the binary classification and that the activation function is sigmoid.

Command Code Table:
The command table has been built based on four bits as shown below , which are : short blink, long blink, short blink and long blink, if bit's value equals (1), this means that this blink is exist in the command Code, and if it is zero, it means that it does not exist, for example if the command code is 0 1 1 1 This indicates on the existence of long blink ,short blink and long blink consecutive (LSL).

Results and Discussions :
The proposed system for this research built and implemented.The modified proposed CNN architecture trained and tested by 25 and 62 epochs on two datasets, mrlEye2018 dataset and CEW dataset respectively, validation accuracies were 99% for both datasets and accuracy were more than 97% and 96% for mrlEye2018 dataset and CEW dataset respectively.
The loss and val_loss on first dataset were 0.13 and 0.1 respectively.for the second dataset ,loss and val_loss were 0.19 and 0.13 respectively.Figures (3)

(4) show Loss metric on validation data from mrlEye2018 and CEW datasets
When comparing the proposed CNN architecture with pre-trained CNN architecture such as VGG16,Inception V3 and Resnet and the previous works [3] and [20] that were tested on the same datasets ( mrlEye2018 and CEW), The proposed CNN architecture has been achieved a higher validation accuracy reached more than 99% , as shown in table (2).Proposed CNN 99% Pothuraju Vishesh and et.al [3] 97% The command generation system was tested in real time by 10 subjects using a webcam after training them on mentioned table (1).The blink sequence of each person has been counted after four times blinking .thefineness of Commands Generation was 94% .

Conclusion:
The goal of this research was to create and implement an algorithm to assist paralyzed patients based on their eye blinking.A proposed CNN model outperforms other models in terms of accuracy.For the command generation table, ten individuals were trained to make a blink sequence .The first command, for example, was a long blink, and the third command was a short blink followed by a long blink, and so on, since the eye blinks were transformed into binary codes that represented specific commands according to the command table 6% errors.

Fig.( 2 )
Fig.(2) illustrate Modified CNN architecture and (4) show the proposed CNN model evaluation metrics on both training and validation Data of both datasets ( mrlEye2018 and CEW ).