Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 October 2023

Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network

  • Refat Khan Pathan 1 ,
  • Munmun Biswas 2 ,
  • Suraiya Yasmin 3 ,
  • Mayeen Uddin Khandaker   ORCID: orcid.org/0000-0003-3772-294X 4 , 5 ,
  • Mohammad Salman 6 &
  • Ahmed A. F. Youssef 6  

Scientific Reports volume  13 , Article number:  16975 ( 2023 ) Cite this article

18k Accesses

16 Citations

Metrics details

  • Computational science
  • Image processing

Sign Language Recognition is a breakthrough for communication among deaf-mute society and has been a critical research topic for years. Although some of the previous studies have successfully recognized sign language, it requires many costly instruments including sensors, devices, and high-end processing power. However, such drawbacks can be easily overcome by employing artificial intelligence-based techniques. Since, in this modern era of advanced mobile technology, using a camera to take video or images is much easier, this study demonstrates a cost-effective technique to detect American Sign Language (ASL) using an image dataset. Here, “Finger Spelling, A” dataset has been used, with 24 letters (except j and z as they contain motion). The main reason for using this dataset is that these images have a complex background with different environments and scene colors. Two layers of image processing have been used: in the first layer, images are processed as a whole for training, and in the second layer, the hand landmarks are extracted. A multi-headed convolutional neural network (CNN) model has been proposed and tested with 30% of the dataset to train these two layers. To avoid the overfitting problem, data augmentation and dynamic learning rate reduction have been used. With the proposed model, 98.981% test accuracy has been achieved. It is expected that this study may help to develop an efficient human–machine communication system for a deaf-mute society.

Similar content being viewed by others

sign language recognition literature review

AI enabled sign language recognition and VR space bidirectional communication using triboelectric smart glove

sign language recognition literature review

Sign language recognition based on dual-path background erasure convolutional neural network

sign language recognition literature review

Improved 3D-ResNet sign language recognition algorithm with enhanced hand features

Introduction.

Spoken language is the medium of communication between a majority of the population. With spoken language, it would be workable for a massive extent of the population to impart. Nonetheless, despite spoken language, a section of the population cannot speak with most of the other population. Mute people cannot convey a proper meaning using spoken language. Hard of hearing is a handicap that weakens their hearing and makes them unfit to hear, while quiet is an incapacity that impedes their talking and makes them incapable of talking. Both are just handicapped in their hearing or potentially, therefore, cannot still do many other things. Communication is the only thing that isolates them from ordinary people 1 . As there are so many languages in the world, a unique language is needed to express their thoughts and opinions, which will be understandable to ordinary people, and such a language is named sign language. Understanding sign language is an arduous task, an ability that must be educated with training.

Many methods are available that use different things/tools like images (2D, 3D), sensor data (hand globe 2 , Kinect sensor 3 , neuromorphic sensor 4 ), videos, etc. All things are considered due to the fact that the captured images are excessively noisy. Therefore an elevated level of pre-processing is required. The available online datasets are already processed or taken in a lab environment where it becomes easy for recent advanced AI models to train and evaluate, causing prone to errors in real-life applications with different kinds of noises. Accordingly, it is a basic need to make a model that can deal with noisy images and also be able to deliver positive results. Different sorts of methods can be utilized to execute the classification and recognition of images using machine learning. Apart from recognizing static images, work has been done in depth-camera detecting and video processing 5 , 6 , 7 . Various cycles inserted in the system were created utilizing other programming languages to execute the procedural strategies for the final system's maximum adequacy. The issue can be addressed and deliberately coordinated into three comparable methodologies: initially using static image recognition techniques and pre-processing procedures, secondly by using deep learning models, and thirdly by using Hidden Markov Models.

Sign language guides this part of the community and empowers smooth communication in the community of people with trouble talking and hearing (deaf and dumb). They use hand signals along with facial expressions and body activities to cooperate. Yet, as a global language, not many people become familiar with communication via sign language gestures 8 . Hand motions comprise a significant part of communication through signing vocabulary. At the same time, facial expressions and body activities assume the jobs of underlining the words and phrases communicated by hand motions. Hand motions can be static or dynamic 9 , 10 . There are methodologies for motion discovery utilizing the dynamic vision sensor (DVS), a similar technique used in the framework introduced in this composition. For example, Arnon et al. 11 have presented an event-based gesture recognition system, which measures the event stream utilizing a natively event-based processor from International Business Machines called TrueNorth. They use a temporal filter cascade to create Spatio-temporal frames that CNN executes in the event-based processor, and they reported an accuracy of 96.46%. But in a real-life scenario, corresponding background situations are not static. Therefore the stated power saving process might not work properly. Jun Haeng Lee et al. 12 proposed a motion classification method with two DVSs to get a stereo-vision system. They used spike neurons to handle the approaching occasions with the same real-life issue. Static hand signals are also called hand acts and are framed in different shapes and directions of hands without speaking to any movement data. Dynamic hand motions comprise a sequence of hand stances with related movement information 13 . Using facial expressions, static hand images, and hand signals, communication through signing gives instruments to convey similarly as if communicated in dialects; there are different kinds of communication via gestures as well 14 .

In this work, we have applied a fusion of traditional image processing with extracted hand landmarks and trained on a multi-headed CNN so that it could complement each other’s weights on the concatenation layer. The main objective is to achieve a better detection rate without relying on a traditional single-channel CNN. This method has been proven to work well with less computational power and fewer epochs on medical image datasets 15 . The rest of the paper is divided into multiple sections as literature review in " Literature review " section, materials and methods in " Materials and methods " section with three subsections: dataset description in Dataset description , image pre-processing in " Pre-processing of image dataset " and working procedure in " Working procedure ", result analysis in " Result analysis " section, and conclusion in " Conclusion " section.

Literature review

State-of-the-art techniques centered after utilizing deep learning models to improve good accuracy and less execution time. CNNs have indicated huge improvements in visual object recognition 16 , natural language processing 17 , scene labeling 18 , medical image processing 15 , and so on. Despite these accomplishments, there is little work on applying CNNs to video classification. This is halfway because of the trouble in adjusting the CNNs to join both spatial and fleeting data. Model using exceptional hardware components such as a depth camera has been used to get the data on the depth variation in the image to locate an extra component for correlation, and then built up a CNN for getting the results 19 , still has low accuracy. An innovative technique that does not need a pre-trained model for executing the system was created using a capsule network and versatile pooling 11 .

Furthermore, it was revealed that lowering the layers of CNN, which employs a greedy way to do so, and developing a deep belief network produced superior outcomes compared to other fundamental methodologies 20 . Feature extraction using scale-invariant feature transform (SIFT) and classification using Neural Networks were developed to obtain the ideal results 21 . In one of the methods, the images were changed into an RGB conspire, the data was developed utilizing the movement depth channel lastly using 3D recurrent convolutional neural networks (3DRCNN) to build up a working system 5 , 22 where Canny edge detection oriented FAST and Rotated BRIEF (ORB) has been used. ORB feature detection technique and K-means clustering algorithm used to create the bag of feature model for all descriptors is described, but the plain background, easy to detect edges are totally dependent on edges; if the edges give wrong info, the model may fall accuracy and become the main problem to solve.

In recent years, utilizing deep learning approaches has become standard for improving the recognition accuracy of sign language models. Using Faster Region-based Convolutional Neural Network (Faster-RCNN) 23 , a CNN model is applied for hand recognition in the data image. Rastgoo et al. 24 proposed a method where they cropped an image properly, used fusion between RGB and depth image (RBM), added two noise types (Gaussian noise + salt n paper noise), and prepared the data for training. As a naturally propelled deep learning model, CNNs achieve every one of the three phases with a single framework that is prepared from crude pixel esteems to classifier yields, but extreme computation power was needed. Authors in ref. 25 proposed 3D CNNs where the third dimension joins both spatial and fleeting stamps. It accepts a few neighboring edges as input and performs 3D convolution in the convolutional layers. Along with them, the study reported in 26 followed similar thoughts and proposed regularizing the yields with high-level features, joining the expectations of a wide range of models. They applied the developed models to perceive human activities and accomplished better execution in examination than benchmark methods. But it is not sure it works with hand gestures as they detected face first and thenody movement 27 .

On the other hand, the Microsoft and Leap Motion companies have developed unmistakable approaches to identify and track a user’s hand and body movement by presenting Kinect and the leap motion controller (LMC) separately. Kinect recognizes the body skeleton and tracks the hands, whereas the LMC distinguishes and tracks hands with its underlying cameras and infrared sensors 3 , 28 . Using the provided framework, Sykora et al. 7 utilized the Kinect system to catch the depth data of 10 hand motions to classify them using a speeded-up robust features (SURF) technique that came up to an 82.8% accuracy, but it cannot test on more extensive database and modified feature extraction methods (SIFT, SURF) so it can be caused non-invariant to the orientation of gestures. Likewise, Huang et al. 29 proposed a 10-word-based ASL recognition system utilizing Kinect by tenfold cross-validation with an SVM that accomplished a precision pace of 97% using a set of frame-independent features, but the most significant problem in this method is segmentation.

The literature summarizes that most of the models used in this application either depend on a single variable or require high computational power. Also, their dataset choice for training and validating the model is in plain background, which is easier to detect. Our main aim is to show how to reduce the computational power for training and the dependency of model training on one layer.

Materials and methods

Dataset description.

Using a generalized single-color background to classify sign language is very common. We intended to avoid that single color background and use a complex background with many users’ hand images to increase the detection complexity. That’s why we have used the “ASL Finger Spelling” dataset 30 , which has images of different sizes, orientations, and complex backgrounds of over 500 images per sign (24 sign total) of 4 users (non-native to sign language). This dataset contains separate RGB and depth images; we have worked with the RGB images in this research. The photos were taken in 5 sessions with the same background and lighting. The dataset details are shown in Table 1 , and some sample images are shown in Fig.  1 .

figure 1

Sample images from a dataset containing 24 signs from the same user.

Pre-processing of image dataset

Images were pre-processed for two operations: preparing the original image training set and extracting the hand landmarks. Traditional CNN has one input data channel and one output channel. We are using two input data channels and one output channel, so data needs to be prepared for both inputs individually.

Raw image processing

In raw image processing, we have converted the images from RGB to grayscale to reduce color complexity. Then we used a 2D kernel matrix for sharpening the images, as shown in Fig.  2 . After that, we resized the images into 50 × 50 pixels for evaluation through CNN. Finally, we have normalized the grayscale values (0–255) by dividing the pixel values by 255, so now the new pixel array contains value ranges (0–1). The primary advantage of this normalization is that CNN works faster in the (0–1) range rather than other limits.

figure 2

Raw image pre-processing with ( a ) sharpening kernel.

Hand landmark detection

Google’s hand landmark model has an input channel of RGB and an image size of (224 × 224 × 3). So, we have taken the RGB images, converted pixel values into float32, and resized all the images into (256 × 256 × 3). After applying the model, it gives 21 coordinated 3-dimensional points. The landmark detection process is shown in Fig.  3 .

figure 3

Hand landmarks detection and extraction of 21 coordinates.

Working procedure

The whole work is divided into two main parts, one is the raw image processing, and another one is the hand landmarks extraction. After both individual processing had been completed, a custom lightweight simple multi-headed CNN model was built to train both data. Before processing through a fully connected layer for classification, we merged both channel’s features so that the model could choose between the best weights. This working procedure is illustrated in Fig.  4 .

figure 4

Flow diagram of working procedure.

Model building

In this research, we have used multi-headed CNN, meaning our model has two input data channels. Before this, we trained processed images and hand landmarks with two separate models to compare. Google’s model is not best for “in the wild” situations, so we needed original images to complement the low faults in Google’s model. In the first head of the model, we have used the processed images as input and hand landmarks data as the second head’s input. Two-dimensional Convolutional layers with filter size 50, 25, kernel (3, 3) with Relu, strides 1; MaxPooling 2D with pool size (2, 2), batch normalization, and Dropout layer has been used in the hand landmarks training side. Besides, the 2D Convolutional layer with filter size 32, 64, 128, 512, kernel (3, 3) with Relu; MaxPooling 2D with pool size (2, 2); batch normalization and dropout layer has been used in the image training side. After both flatten layers, two heads are concatenated and go through a dense, dropout layer. Finally, the output dense layer has 24 units with Softmax activation. This model has been compiled with Adam optimizer and MSE loss for 50 epochs. Figure  5 illustrates the proposed CNN architecture, and Table 2 shows the model details.

figure 5

Proposed multi-headed CNN architecture. Bottom values are the number of filters and top values are output shapes.

Training and testing

The input images were augmented to generate more difficulty in training so that the model could not overfit. Image Data Generator did image augmentation with 10° rotation, 0.1 zoom range, 0.1 widths and height shift range, and horizontal flip. Being more conscious about the overfitting issues, we have used dynamic learning rates, monitoring the validation accuracy with patience 5, factor 0.5, and a minimum learning rate of 0.00001. For training, we have used 46,023 images, and for testing, 19,725 images. For 50 epochs, the training vs testing accuracy and loss has been shown in Fig.  6 .

figure 6

Training versus testing accuracy and loss for 50 epochs.

For further evaluation, we have calculated the precision, recall, and F1 score of the proposed multi-headed CNN model, which shows excellent performance. To compute these values, we first calculated the confusion matrix (shown in Fig.  7 ). When a class is positive and also classified as so, it is called true positive (TP). Again, when a class is negative and classified as so, it is called true negative (TN). If a class is negative and classified as positive, it is called false positive (FP). Also, when a class is positive and classified as not negative, it is called false negative (FN). From these, we can conclude precision, recall, and F1 score like the below:

figure 7

Confusion matrix of the testing dataset. Numerical values in X and Y axis means the sequential letters from A = 0 to Y = 24, number 9 and 25 is missing because dataset does not have letter J and Z.

Precision: Precision is the ratio of TP and total predicted positive observation.

Recall: It is the ratio of TP and total positive observations in the actual class.

F1 score: F1 score is the weighted average of precision and recall.

The Precision, Recall, and F1 score for 24 classes are shown in Table 3 .

Result analysis

In human action recognition tasks, sign language has an extra advantage as it can be used to communicate efficiently. Many techniques have been developed using image processing, sensor data processing, and motion detection by applying different dynamic algorithms and methods like machine learning and deep learning. Depending on methodologies, researchers have proposed their way of classifying sign languages. As technologies develop, we can explore the limitations of previous works and improve accuracy. In ref. 13 , this paper proposes a technique for acknowledging hand motions, which is an excellent part of gesture-based communication jargon, because of a proficient profound deep convolutional neural network (CNN) architecture. The proposed CNN design disposes of the requirement for recognition and division of hands from the captured images, decreasing the computational weight looked at during hand pose recognition with classical approaches. In our method, we used two input channels for the images and hand landmarks to get more robust data, making the process more efficient with a dynamic learning rate adjustment. Besides in ref 14 , the presented results were acquired by retraining and testing the sign language gestures dataset on a convolutional neural organization model utilizing Inception v3. The model comprises various convolution channel inputs that are prepared on a piece of similar information. A capsule-based deep neural network sign posture translator for an American Sign Language (ASL) fingerspelling (posture) 20 has been introduced where the idea concept of capsules and pooling are used simultaneously in the network. This exploration affirms that utilizing pooling and capsule routing on a similar network can improve the network's accuracy and convergence speed. In our method, we have used the pre-trained model of Google to extract the hand landmarks, almost like transfer learning. We have shown that utilizing two input channels could also improve accuracy.

Moreover, ref 5 proposed a 3DRCNN model integrating a 3D convolutional neural network (3DCNN) and upgraded completely associated recurrent neural network (FC-RNN), where 3DCNN learns multi-methodology features from RGB, motion, and depth channels, and FCRNN catch the fleeting data among short video clips divided from the original video. Consecutive clips with a similar semantic significance are singled out by applying the sliding window way to deal with a section of the clips on the whole video sequence. Combining a CNN and traditional feature extractors, capable of accurate and real-time hand posture recognition 26 where the architecture is assessed on three particular benchmark datasets and contrasted and the cutting edge convolutional neural networks. Extensive experimentation is directed utilizing binary, grayscale, and depth data and two different validation techniques. The proposed feature fusion-based CNN 31 is displayed to perform better across blends of approval procedures and image representation. Similarly, fusion-based CNN is demonstrated to improve the recognition rate in our study.

After worldwide motion analysis, the hand gesture image sequence was dissected for keyframe choice. The video sequences of a given gesture were divided in the RGB shading space before feature extraction. This progression enjoyed the benefit of shaded gloves worn by the endorsers. Samples of pixel vectors representative of the glove’s color were used to estimate the mean and covariance matrix of the shading, which was sectioned. So, the division interaction was computerized with no user intervention. The video frames were converted into color HSV (Hue-SaturationValue) space in the color object tracking method. Then the pixels with the following shading were distinguished and marked, and the resultant images were converted to a binary (Gray Scale image). The system identifies image districts compared to human skin by binarizing the input image with a proper threshold value. Then, at that point, small regions from the binarized image were eliminated by applying a morphological operator and selecting the districts to get an image as an applicant of hand.

In the proposed method we have used two-headed CNN to train the processed input images. Though the single image input stream is widely used, two input streams have an advantage among them. In the classification layer of CNN, if one layer is giving a false result, it could be complemented by the other layer’s weight, and it is possible that combining both results could provide a positive outcome. We used this theory and successfully improved the final validation and test results. Before combining image and hand landmark inputs, we tested both individually and acquired a test accuracy of 96.29% for the image and 98.42% for hand landmarks. We did not use binarization as it would affect the background of an image with skin color matched with hand color. This method is also suitable for wild situations as it is not entirely dependent on hand position in an image frame. A comparison of the literature and our work has been shown in Table 4 , which shows that our method overcomes most of the current position in accuracy gain.

Table 5 illustrates that the Combined Model, while having a larger number of parameters and consuming more memory, achieves the highest accuracy of 98.98%. This suggests that the combined approach, which incorporates both image and hand landmark information, is effective for the task when accuracy is priority. On the other hand, the Hand Landmarks Model, despite having fewer parameters and lower memory consumption, also performs impressively with an accuracy of 98.42%. But it has its own error and memory consumption rate in model training by Google. The Image Model, while consuming less memory, has a slightly lower accuracy of 96.29%. The choice between these models would depend on the specific application requirements, trade-offs between accuracy and resource utilization, and the importance of execution time.

This work proposes a methodology for perceiving the classification of sign language recognition. Sign language is the core medium of communication between deaf-mute and everyday people. It is highly implacable in real-world scenarios like communication, human–computer interaction, security, advanced AI, and much more. For a long time, researchers have been working in this field to make a reliable, low cost and publicly available SRL system using different sensors, images, videos, and many more techniques. Many datasets have been used, including numeric sensory, motion, and image datasets. Most datasets are prepared in a good lab condition to do experiments, but in the real world, it may not be a practical case. That’s why, looking into the real-world situation, the Fingerspelling dataset has been used, which contains real-world scenarios like complex backgrounds, uneven image shapes, and conditions. First, the raw images are processed and resized into a 50 × 50 size. Then, the hand landmark points are detected and extracted from these hand images. Making images goes through two processing techniques; now, there are two data channels. A multi-headed CNN architecture has been proposed for these two data channels. Total data has been augmented to avoid overfitting, and dynamic learning rate adjustment has been done. From the prepared data, 70–30% of the train test spilled has been done. With the 30% dataset, a validation accuracy of 98.98% has been achieved. In this kind of large dataset, this accuracy is much more reliable.

There are some limitations found in the proposed method compared with the literature. Some methods might work with low image dataset numbers, but as we use the simple CNN model, this method requires a good number of images for training. Also, the proposed method depends on the hand landmark extraction model. Other hand landmark model can cause different results. In raw image processing, it is possible to detect hand portions to reduce the image size, which may increase the recognition chance and reduce the model training time. Hence, we may try this method in future work. Currently, raw image processing takes a good amount of training time as we considered the whole image for training.

Data availability

The dataset used in this paper (ASL Fingerspelling Images (RGB & Depth)) is publicly available at Kaggle on this URL: https://www.kaggle.com/datasets/mrgeislinger/asl-rgb-depth-fingerspelling-spelling-it-out .

Anderson, R., Wiryana, F., Ariesta, M. C. & Kusuma, G. P. Sign language recognition application systems for deaf-mute people: A review based on input-process-output. Proced. Comput. Sci. 116 , 441–448. https://doi.org/10.1016/j.procs.2017.10.028 (2017).

Article   Google Scholar  

Mummadi, C. et al. Real-time and embedded detection of hand gestures with an IMU-based glove. Informatics 5 (2), 28. https://doi.org/10.3390/informatics5020028 (2018).

Hickeys Kinect for Windows - Windows apps. (2022). Accessed 01 January 2023. https://learn.microsoft.com/en-us/windows/apps/design/devices/kinect-for-windows

Rivera-Acosta, M., Ortega-Cisneros, S., Rivera, J. & Sandoval-Ibarra, F. American sign language alphabet recognition using a neuromorphic sensor and an artificial neural network. Sensors 17 (10), 2176. https://doi.org/10.3390/s17102176 (2017).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ye, Y., Tian, Y., Huenerfauth, M., & Liu, J. Recognizing American Sign Language Gestures from Within Continuous Videos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 2145–214509 (IEEE, 2018). https://doi.org/10.1109/CVPRW.2018.00280 .

Ameen, S. & Vadera, S. A convolutional neural network to classify American Sign Language fingerspelling from depth and colour images. Expert Syst. 34 (3), e12197. https://doi.org/10.1111/exsy.12197 (2017).

Sykora, P., Kamencay, P. & Hudec, R. Comparison of SIFT and SURF methods for use on hand gesture recognition based on depth map. AASRI Proc. 9 , 19–24. https://doi.org/10.1016/j.aasri.2014.09.005 (2014).

Sahoo, A. K., Mishra, G. S. & Ravulakollu, K. K. Sign language recognition: State of the art. ARPN J. Eng. Appl. Sci. 9 (2), 116–134 (2014).

Google Scholar  

Mitra, S. & Acharya, T. “Gesture recognition: A survey. IEEE Trans. Syst. Man Cybern. Part C 37 (3), 311–324. https://doi.org/10.1109/TSMCC.2007.893280 (2007).

Rautaray, S. S. & Agrawal, A. Vision based hand gesture recognition for human computer interaction: A survey. Artif. Intell. Rev. 43 (1), 1–54. https://doi.org/10.1007/s10462-012-9356-9 (2015).

Amir A. et al A low power, fully event-based gesture recognition system. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 7388–7397 (IEEE, 2017). https://doi.org/10.1109/CVPR.2017.781 .

Lee, J. H. et al. Real-time gesture interface based on event-driven processing from stereo silicon retinas. IEEE Trans. Neural Netw. Learn Syst. 25 (12), 2250–2263. https://doi.org/10.1109/TNNLS.2014.2308551 (2014).

Article   PubMed   Google Scholar  

Adithya, V. & Rajesh, R. A deep convolutional neural network approach for static hand gesture recognition. Proc. Comput. Sci. 171 , 2353–2361. https://doi.org/10.1016/j.procs.2020.04.255 (2020).

Das, A., Gawde, S., Suratwala, K., & Kalbande, D. Sign language recognition using deep learning on custom processed static gesture images. In 2018 International Conference on Smart City and Emerging Technology (ICSCET) , 1–6 (IEEE, 2018). https://doi.org/10.1109/ICSCET.2018.8537248 .

Pathan, R. K. et al. Breast cancer classification by using multi-headed convolutional neural network modeling. Healthcare 10 (12), 2367. https://doi.org/10.3390/healthcare10122367 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), 2278–2324. https://doi.org/10.1109/5.726791 (1998).

Collobert, R., & Weston, J. A unified architecture for natural language processing. In Proceedings of the 25th international conference on Machine learning—ICML ’08 , 160–167 (ACM Press, 2008). https://doi.org/10.1145/1390156.1390177 .

Farabet, C., Couprie, C., Najman, L. & LeCun, Y. Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35 (8), 1915–1929. https://doi.org/10.1109/TPAMI.2012.231 (2013).

Xie, B., He, X. & Li, Y. RGB-D static gesture recognition based on convolutional neural network. J. Eng. 2018 (16), 1515–1520. https://doi.org/10.1049/joe.2018.8327 (2018).

Jalal, M. A., Chen, R., Moore, R. K., & Mihaylova, L. American sign language posture understanding with deep neural networks. In 2018 21st International Conference on Information Fusion (FUSION) , 573–579 (IEEE, 2018).

Shanta, S. S., Anwar, S. T., & Kabir, M. R. Bangla Sign Language Detection Using SIFT and CNN. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) , 1–6 (IEEE, 2018). https://doi.org/10.1109/ICCCNT.2018.8493915 .

Sharma, A., Mittal, A., Singh, S. & Awatramani, V. Hand gesture recognition using image processing and feature extraction techniques. Proc. Comput. Sci. 173 , 181–190. https://doi.org/10.1016/j.procs.2020.06.022 (2020).

Ren, S., He, K., Girshick, R., & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process Syst. , 28 (2015).

Rastgoo, R., Kiani, K. & Escalera, S. Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine. Entropy 20 (11), 809. https://doi.org/10.3390/e20110809 (2018).

Jhuang, H., Serre, T., Wolf, L., & Poggio, T. A biologically inspired system for action recognition. In 2007 IEEE 11th International Conference on Computer Vision , 1–8. (IEEE, 2007) https://doi.org/10.1109/ICCV.2007.4408988 .

Ji, S., Xu, W., Yang, M. & Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35 (1), 221–231. https://doi.org/10.1109/TPAMI.2012.59 (2013).

Huang, J., Zhou, W., Li, H., & Li, W. sign language recognition using 3D convolutional neural networks. In 2015 IEEE International Conference on Multimedia and Expo (ICME) , 1–6 (IEEE, 2015). https://doi.org/10.1109/ICME.2015.7177428 .

Digital worlds that feel human Ultraleap. Accessed 01 January 2023. Available: https://www.leapmotion.com/

Huang, F., & Huang, S. Interpreting american sign language with Kinect. Journal of Deaf Studies and Deaf Education, [Oxford University Press] , (2011).

Pugeault, N., & Bowden, R. Spelling it out: Real-time ASL fingerspelling recognition. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) , 1114–1119 (IEEE, 2011). https://doi.org/10.1109/ICCVW.2011.6130290 .

Rahim, M. A., Islam, M. R. & Shin, J. Non-touch sign word recognition based on dynamic hand gesture using hybrid segmentation and CNN feature fusion. Appl. Sci. 9 (18), 3790. https://doi.org/10.3390/app9183790 (2019).

“ASL Alphabet.” Accessed 01 Jan, 2023. https://www.kaggle.com/grassknoted/asl-alphabet

Download references

Funding was provided by the American University of the Middle East, Egaila, Kuwait.

Author information

Authors and affiliations.

Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, 47500, Bandar Sunway, Selangor, Malaysia

Refat Khan Pathan

Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong, 4381, Bangladesh

Munmun Biswas

Department of Computer and Information Science, Graduate School of Engineering, Tokyo University of Agriculture and Technology, Koganei, Tokyo, 184-0012, Japan

Suraiya Yasmin

Centre for Applied Physics and Radiation Technologies, School of Engineering and Technology, Sunway University, 47500, Bandar Sunway, Selangor, Malaysia

Mayeen Uddin Khandaker

Faculty of Graduate Studies, Daffodil International University, Daffodil Smart City, Birulia, Savar, Dhaka, 1216, Bangladesh

College of Engineering and Technology, American University of the Middle East, Egaila, Kuwait

Mohammad Salman & Ahmed A. F. Youssef

You can also search for this author in PubMed   Google Scholar

Contributions

R.K.P and M.B, Conceptualization; R.K.P. methodology; R.K.P. software and coding; M.B. and R.K.P. validation; R.K.P. and M.B. formal analysis; R.K.P., S.Y., and M.B. investigation; S.Y. and R.K.P. resources; R.K.P. and M.B. data curation; S.Y., R.K.P., and M.B. writing—original draft preparation; S.Y., R.K.P., M.B., M.U.K., M.S., A.A.F.Y. and M.S. writing—review and editing; R.K.P. and M.U.K. visualization; M.U.K. and M.B. supervision; M.B., M.S. and A.A.F.Y. project administration; M.S. and A.A.F.Y, funding acquisition.

Corresponding author

Correspondence to Mayeen Uddin Khandaker .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Pathan, R.K., Biswas, M., Yasmin, S. et al. Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network. Sci Rep 13 , 16975 (2023). https://doi.org/10.1038/s41598-023-43852-x

Download citation

Received : 04 March 2023

Accepted : 29 September 2023

Published : 09 October 2023

DOI : https://doi.org/10.1038/s41598-023-43852-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

sign language recognition literature review

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Journal Proposal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sensors-logo

Article Menu

sign language recognition literature review

  • Subscribe SciFeed
  • Recommended Articles
  • PubMed/Medline
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Sign language recognition using the electromyographic signal: a systematic literature review.

sign language recognition literature review

1. Introduction

2. background and objectives, 2.1. background.

  • The first type is the surface EMG (sEMG), which is recorded via non-invasive electrodes, often used to obtain data on the intensity of the superficial muscle activation or on the time [ 7 ].
  • The second is the intramuscular EMG that is recorded via invasive electrodes [ 8 ].

2.2. Objectives

  • Systematically searched, identified, and critically evaluated the relevant literature on sign language recognition using EMG signals.
  • Investigated the various data acquisition methods and devices used to capture EMG signals and their impact on the recognition performance.
  • Analyzed the different feature extraction and classification techniques applied to EMG signals for sign language recognition.
  • Identified the most used datasets and evaluated their relevance and suitability for sign language recognition using EMG signals.
  • Assessed the current state of research in terms of the sample size and the diversity of the participants in the studies.
  • Provided a summary of the current state of research on sign language recognition using EMG signals and made recommendations for future research.
  • Identified the challenges and limitations of using EMG signals for sign language recognition, including problems related to signal quality, feature extraction, and classification.

3. Data Acquisition and Devices

Click here to enlarge figure

4. Feature Extraction

5. classification approaches, 5.1. k-nearest neighbor-based approaches, 5.2. support vector machine-based approches, 5.3. hidden markov model-based approaches, 5.4. artificial neural network-based approaches, 5.5. convolutional neural network-based approaches, 5.6. long short-term memory-based approaches, 5.7. other proposed approaches, 6. discussion, 7. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

Ref.SensorDeviceEMG
Channel
Freq.Hand
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzRight
[ ]sEMG + accelerometer-51000 HzRight
[ ]sEMG + 2 accelerometers-81000 HzRight/tow
[ ]sEMG + 2 accelerometers +
2 gyroscopes
-81000 HzRight
[ ]sEMG + accelerometer + gyroscopeMyo armband8200 Hz-
[ ]sEMGCustom device:
Conductor muscle electrical sensor, Arduino UNO.
6--
[ ]sEMG + accelerometerDelsys Trigno Lab Wireless System41927 HzRight
[ ]sEMGDelsys Trigno81926 HzRight
[ ]sEMG + accelerometerCustom device41 kHzTow
[ ]sEMG and accelerometerDELSYS TrignoTM Wireless EMG System32000 HzRight
[ ]sEMGMyo armband8200 Hz-
[ ]sEMG + accelerometerCustom device:
sEMG sensors, MMA7361
41000 HzTow
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8100 Hz-
[ ]sEMG +
2 accelerometers
Custom device:
NIPCI-6036 E,
National Instruments
81000 HzTow
[ ]sEMG + accelerometer + gyroscopeMyo armband8200 HzRight
[ ]sEMG + accelerometer + gyroscopeCustom device6500 HzRight
[ ]sEMG + accelerometer + gyroscopeCustom device41 kHzTow
[ ]sEMG + accelerometer + gyroscopeMyo armband8200 HzTow
[ ]sEMG + accelerometerCustom device81 kHzTow
[ ]sEMGDelsys Trigno Lab Wireless System62 kHz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzRight
[ ]sEMG +acceleration, gyroscope, gravity sensorsSmartwatch + Myo armband8200 HzDominant
[ ]sEMG + accelerometer + gyroscope + magnetometerCustom device:
InvenSense MPU9150
ADS1299
41000 HzRight
[ ]sEMG + accelerometer + gyroscope + magnetometerCustom device:
InvenSense MPU9150, TI ADS1299
41000 HzRight
[ ]sEMGMyo armband8200 HzDominant
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzRight/tow
[ ]sEMGMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometer2 Myo armbands8 each armband200 HzTow
sEMG + accelerometer + gyroscope
sEMG + accelerometer + magnetometer
sEMG + accelerometer + magnetometer
sEMG + accelerometer
sEMG + gyroscope + magnetometer
sEMG
[ ]sEMG + accelerometer2 Myo armbands8200 HzTow
[ ]sEMGMyo armband8200 HzRight
[ ]sEMG + FMGCustom device:
ADS1299, Texas instrument
81000 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzTow
[ ]sEMG + accelerometer + gyroscopeMyo armband8200Tow
[ ]HD- sEMGCustom device8 × 16400 HzRight
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometer + flexCustom device2--
[ ]sEMGMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometer + leap motion + VIVE HMDMyo armband8200 Hz-
[ ]sEMGBio Radio 150 CleveMed8960 HzRight
[ ]sEMG + accelerometer + gyroscopeMyo armband8200 HzTow
[ ]sEMGDelsys Trigno Lab Wireless System62 kHzTow
[ ]sEMGBIOPAC-MP-4541000 HzRight
[ ]sEMGDelsys Trigno Wireless EMG31111 HzDominant
Accelerometer + sEMG
[ ]sEMG + accelerometer + gyroscope + magnetometer 31000 HzTow
[ ]sEMG + accelerometer + gyroscopeDelsys Trigno Lab Wireless System31 kHzTow
[ ]sEMGDelsys Trigno Lab Wireless System51 kHz-
[ ]sEMG + accelerometer + gyroscopeDelsys Trigno Lab Wireless System21 kHzDominant
[ ]sEMGDelsys Trigno Lab Wireless System31 kHzDominant
[ ]sEMGCustom device11 kHzRight
[ ]sEMGDelsys Trigno Lab Wireless System31.1 kHzRight
[ ]sEMG + accelerometer + gyroscopeDelsys Trigno Lab Wireless System3900 kHzTow
[ ]sEMGMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscopeMyo armband8200 HzRight
[ ]sEMGMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMGCustom device4--
[ ]sEMGMyo armband8100 Hz-
[ ]sEMG + pressureCustom device3--
[ ]Leap motion + sEMGMyo armband8200 HzTow
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzTow
[ ]sEMGMyo armband8200 HzRight
[ ]sEMGCustom device3500 HzRight
[ ]sEMGMyo armband8200 HzRight
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzRight
[ ]sEMGMyo armband8200 Hz-
[ ]sEMGMyo armband8200 Hz-
[ ]sEMGMyo armband8200 HzDominant
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzRight
[ ]sEMGsEMG armband8600 HzRight
[ ]sEMGBIOPAC3 Dominant
[ ]sEMGBIOPAC3--
[ ]sEMGMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometer + gyroscope + force-sensing resistorCustom device110 HzRight
[ ]sEMGCustom device4100 HzLeft
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzDominant
[ ]sEMG + accelerometerCustom device1--
[ ]EMGMyo armband8200 HzRight
[ ]sEMG + accelerometer + gyroscopeCustom device4-Right
[ ]sEMG + accelerometerBTS FREEMG41000 HzRight
[ ]sEMGTMS porti81000 Hz-
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 HzRight
[ ]sEMG + accelerometer + gyroscope + magnetometerMyo armband8200 Hz-
[ ]sEMG + accelerometerBioplux851000 HzRight
Ref.TargetTypeClassesDataset SizeSensorPlacementFrame RateSubjectsAccuracy
[ ]Chinese SLWord8685,4248Armband placed on arm muscles:
Extensor carpi radialis longus
Flexor carpi ulnaris
Flexor carpi radialis
Brachioradialis
Extensor digitorum
Extensor digiti minimi
20020From 94.72% to 98.92%
[ ]Chinese SLWord72ni5Extensor digiti minimi
Palmaris longus
Extensor carpi ulnaris
Extensor carpi radialis
Brachioradialis
1000293.1%
[ ]Chinese SLSubword12124208Extensor digiti minimi
Palmaris
Extensor carpi ulnaris
Extensor carpi radialis
1000195.78%
[ ]Chinese SLSubword15037508Extensor digiti minimi
Palmaris longus
Extensor carpi ulnaris
Extensor carpi radialis
10008From 88.2% to 95.1%
[ ]Chinese SLWord4848008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200197.12%
[ ]Chinese SLAlphabet48006Pronator quadratus
Flexor digitorum superficialis
Flexor carpi ulnaris
Palmaris longus
Flexor carpi radialis
Brachioradialis
Pronator teres
-486%
[ ]Chinese SLWord5326,5004Extensor digitorum
Palmaris longus
Extensor carpi radialis longus
Flexor carpi ulnari
The hybrid sensor (EMG+IMU) is placed on the extensor digiti minimi.
Extensor pollicis longus
Extensor pollicis brevis
1927596.01 ± 0.83%
Alphabet238028 92.73% ± 1.47
[ ]Chinese SLAlphabet306008Extensor carpi radialis brevis
Extensor digitorum
Brachioradialis
Extensor carpi ulnaris
1926495.48%
[ ]Chinese SLSubword12014,2004-1000591.51%
[ ]Chinese SLWord188643Extensor carpi radialis longus
Extensor carpi ulnaris
Flexor carpi radialis longus
Extensor digitorum
Tendons of extensor digitorum/lumbricals
20008From 84.9% to 91.4%
[ ]Chinese SLWord1552508Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001088.7%
[ ]Chinese SLSubword12114524Extensor minimi digiti
Palmaris longus
Extensor carpi ulnaris
Extensor carpi radilis
1000598.25%
[ ]Chinese SLWord3544808Armband placed on arm muscles:
Same as the study published by the authors of [ ]
100898.12%
[ ]Chinese SLWord
Sentence
1209754Extensor minimi digiti
Palmaris longus
Extensor carpi ulnaris
Extensor carpi radialis
1000596.5%
20086.7%
[ ]Chinese SLWord5027808Armband placed on arm muscles:
Same as [ ]
2001089%
[ ]Chinese SLWord550006Extensor digitorum
Flexor carpi radialis longus
Extensor carpi radialis longus
Extensor carpi ulnaris
500491.2%
[ ]Chinese SLWord15030,0004Extensor digiti minimi
Palmaris longus
Extensor carpi ulnaris
Extensor carpi radialis
1000890%
[ ]Chinese SLWord6020,4008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
20034WER 19.7
[ ]Chinese SLSubword11627,8408Extensor minimi digiti
Palmaris longus
Extensor carpi ulnaris
Extensor carpi radialis
1000297.55%
[ ]Chinese SLHand shape137806Extensor carpi radialis longus
Extensor digitorum and flexor carpi ulnaris
Palmaris longus
Extensor pollicis longus
Abductor pollicis longus
Extensor digiti minimi
19271078.15%
[ ]Chinese SLWord1020,0008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001098.66%
[ ]American SLSentence250106258Armband placed on arm muscles:
Same as the study published by the authors of [ ]
20015WER 0.29%
[ ]American SLWord8024,0004Extensor digitorum
Flexor carpi radialis longus
Extensor carpi radialis longus
Extensor carpi ulnaris
1000485.24–96.16%
[ ]American SLWord4040004Extensor digitorum
Flexor carpi radialis longus
Extensor carpi radialis longus
Extensor carpi ulnaris
1000495.94%
[ ]American SLWord2720808Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001From 60.85% to 80%
10,40010From 34.00% to 51.54%
[ ]American SLWord70ni8Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001593.7%
Sentence100
[ ]American SLWord813008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2005099.31%
Alphabet5
[ ]American SLWord9SCEPTRE database8Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2003From 47.4% to 100%
[ ]American SLWord5010,0008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001033.66%
[ ]American SLWord2040008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2002097.9%
[ ]American SLDigit102508Armband placed on arm muscles:
Same as the study published by the authors of [ ]
1000591.6 ± 3.5%
[ ]American SLAlphabet242408Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200180%
[ ]American SLWord133908Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200381.20%
[ ]American SLWord1326,0008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200393.79%
[ ]American SLHand shape101208 × 16Intrinsic muscles
Extrinsic muscles
400478%
[ ]American SLAlphabet269368Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200895.36%
[ ]American SLAlphabet261302Flexor carpi radialis
Extensor carpi radialis longus
-195%
[ ]American SLWord1030008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200-95%
[ ]American SLAlphabet2533,6008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2007100%
[ ]American SLAlphabet2620808-960192%
[ ]American SLWord20-8Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001097.72%
[ ]American SLWord2018006-2000397%
[ ]Indian SLWord52504Flexor carpi radialis
Extensor carpi radialis longus
Reference electrode: Palm
1000690%
[ ]Indian SLWord1012003Extensor carpi radialis longus
Extensor digitorum
Flexor carpi radialis
1111687.5%
[ ]Indian SLDigit/word91803Where the maximum movement of the muscles of the forelimbs is observed1000-91.1%
[ ]Indian SLWord10016,0003-10001097%
[ ]Indian SLHand shape1512005-1111-100%
[ ]Indian SLHand shape + word1224002Flexor digitorum
Extensor carpi radialis
11111088.25%
[ ]Indian SLDigit99003Flexor digitorum
Extensor carpi radialis
Brachioradialis
1111590.10%
[ ]Indian SLHand shape41201Flexor carpi radialis1000-97.50%
[ ]Indian SLHand shape + word108003Flexor capri ulnaris
Extensor capri radialis
Brachioradialis
1100492.37%
[ ]Indian SLWord10020,0003-9001090.73%
[ ]Brazilian SLAlphabet2022008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001From 4% to 95%
[ ]Brazilian SLAlphabet265208Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001089.11%
[ ]Brazilian SLAlphabet26-8Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001599.06%
[ ]Brazilian SLAlphabet208408Extensor carpi ulnar
Flexor carpi radial
200181.60%
[ ]General hand shapesHand shapes12-8Brachial200--
[ ]GeneralDigit/alphabet36-4Lumbric muscles
Hypothenar muscles
Thenar muscles
Flexor radials carpi
---
[ ]GeneralHand shape636008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
100394%
[ ]GeneralDigit105003---86.80%
[ ]Indonesian SLWord + alphabet102008Armband placed on arm muscles:
Same as [ ]
200198.63%
[ ]Indonesian SLAlphabet262608Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200193.08%
[ ]Indonesian SLWord + alphabet5252008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2002086.75%
[ ]Indonesian SLAlphabet262608Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200-82.31%
[ ]Arabic SLAlphabet2833,6008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200398.49%
[ ]Arabic SLWords51503-500190.66%
[ ]Arabic SLAlphabet2815,0008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200897.4%
[ ]Italian SLAlphabet267808Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2003097%
[ ]Italian SLAlphabet 7808Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200193.5%
[ ]Italian SLAlphabet267808Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001-
[ ]Korean SLWord312008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200194%
[ ]Korean SLWord30ni8Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200199.6%
[ ]Korean SLWord383008Armband placed on arm muscles:
Same as the study published by the authors of [ ], with the reference channel placed on the flexor carpi radialis
6001797.4%
[ ]Pakistani SLAlphabet267803Flexor carpi radialis
Flexor digitorum superficialis
181%
[ ]Pakistani SLSentence115503--585.40%
[ ]Turkish SLNumber1116568Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200986.61%
[ ]Turkish SLHand shape36-8-2001078%
[ ]Malaysian SLWord51501Extensor carpi ulnaris10391%
[ ]Peru SLAlphabet271354-100193.9%
[ ]Polish SLWord1821,4208Armband placed on arm muscles:
Same as the study published by the authors of [ ]
2001491%
[ ]German SLWord75601Flexor carpi radialis—nearby wrist-896.31%
[ ]French SLAlphabet724808Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200490%
[ ]Persian SLWord2020004Extensor digitorum communis
Flexor carpi radialis longus
Extensor carpi radialis longus
Extensor carpi ulnaris
-1096.13%
[ ]Colombian SLWord123604Extensor digitorum communis
Extensor carpi ulnaris
Flexor carpi ulnaris
Flexor carpi radialis
1000396.66%
[ ]Thai SLAlphabet1020008Armband placed on arm muscles:
Same as the study published by the authors of [ ]
1000195%
[ ]Sinhala SLWord123608Armband placed on arm muscles:
Same as the study published by the authors of [ ]
200694.4%
[ ]Irish SLAlphabet2615608Armband placed on arm muscles:
Same as the study published by the authors of [ ], with focus on the extensor digitorum from the posterior forearm, and the flexor carpi ulnaris from the anterior forearm
2001278%
[ ]Greek SLWord60-5Flexor carpi ulnaris
Flexor digitorum superficialis
Flexor carpi radialis
Extensor digitorum communis
Extensor carpi ulnaris
1000-92%
  • Aviles, M.; Rodríguez-Reséndiz, J.; Ibrahimi, D. Optimizing EMG Classification through Metaheuristic Algorithms. Technologies 2023 , 11 , 87. [ Google Scholar ] [ CrossRef ]
  • Aviles, M.; Sánchez-Reyes, L.M.; Fuentes-Aguilar, R.Q.; Toledo-Pérez, D.C.; Rodríguez-Reséndiz, J. A Novel Methodology for Classifying EMG Movements Based on SVM and Genetic Algorithms. Micromachines 2022 , 13 , 2108. [ Google Scholar ] [ CrossRef ]
  • Toledo-Pérez, D.C.; Martínez-Prado, M.A.; Gómez-Loenzo, R.A.; Paredes-García, W.J.; Rodríguez-Reséndiz, J. A study of movement classification of the lower limb based on up to 4-EMG channels. Electronics 2019 , 8 , 259. [ Google Scholar ] [ CrossRef ]
  • Toledo-Pérez, D.C.; Rodríguez-Reséndiz, J.; Gómez-Loenzo, R.A.; Jauregui-Correa, J.C. Support vector machine-based EMG signal classification techniques: A review. Appl. Sci. 2019 , 9 , 4402. [ Google Scholar ] [ CrossRef ]
  • Amor, A.B.H.; Ghoul, O.; Jemni, M. Toward sign language handshapes recognition using Myo armband 2017 6th. In Proceedings of the International Conference on Information and Communication Technology and Accessibility (ICTA) 2017, Muscat, Oman, 19–21 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [ Google Scholar ]
  • Kim, J.; Mastnik, S.; André, E. EMG-based hand gesture recognition for realtime biosignal interfacing. In Proceedings of the 13th International Conference on Intelligent User Interfaces, Gran Canaria, Spain, 13–16 January 2008; pp. 30–39. [ Google Scholar ]
  • Farina, D.; Negro, F. Accessing the neural drive to muscle and translation to neurorehabilitation technologies. IEEE Rev. Biomed. Eng. 2012 , 5 , 3–14. [ Google Scholar ] [ CrossRef ]
  • Merletti, R.; De Luca, C.J. New techniques in surface electromyography. Comput. Aided Electromyogr. Expert Syst. 1989 , 9 , 115–124. [ Google Scholar ]
  • Ahsan, M.R.; Ibrahimy, M.I.; Khalifa, O.O. EMG signal classification for human computer interaction: A review. Eur. J. Sci. Res. 2009 , 33 , 480–501. [ Google Scholar ]
  • Di Pino, G.; Guglielmelli, E.; Rossini, P.M. Neuroplasticity in amputees: Main implications on bidirectional interfacing of cybernetic hand prostheses. Prog. Neurobiol. 2009 , 88 , 114–126. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kallenberg, L.A.C. Multi-Channel Array EMG in Chronic Neck-Shoulder Pain. Ph.D. Thesis, Roessingh Research and Development, University of Twente, Enschede, The Netherlands, 2007. [ Google Scholar ]
  • Galván-Ruiz, J.; Travieso-González, C.M.; Tejera-Fettmilch, A.; Pinan-Roescher, A.; Esteban-Hernández, L.; Domínguez-Quintana, L. Perspective and evolution of gesture recognition for sign language: A review. Sensors 2020 , 20 , 3571. [ Google Scholar ] [ CrossRef ]
  • Moher, D.; Shamseer, L.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst. Rev. 2015 , 4 , 1. [ Google Scholar ] [ CrossRef ]
  • Wang, F.; Zhao, S.; Zhou, X.; Li, C.; Li, M.; Zeng, Z. An recognition—Verification mechanism for real-time chinese sign language recognition based on multi-information fusion. Sensors 2019 , 19 , 2495. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Chen, X.; Li, Y.; Lantz, V.; Wang, K.; Yang, J. A framework for hand gesture recognition based on accelerometer and EMG sensors. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2011 , 41 , 1064–1076. [ Google Scholar ] [ CrossRef ]
  • Li, Y.; Chen, X.; Tian, J.; Zhang, X.; Wang, K.; Yang, J. Automatic recognition of sign language subwords based on portable accelerometer and EMG sensors. In Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, Beijing, China, 8–10 November 2010; pp. 1–7. [ Google Scholar ]
  • Yu, Y.; Chen, X.; Cao, S.; Zhang, X.; Chen, X. Exploration of Chinese sign language recognition using wearable sensors based on deep belief net. IEEE J. Biomed. Health Inform. 2019 , 24 , 1310–1320. [ Google Scholar ] [ CrossRef ]
  • Jane, S.P.Y.; Sasidhar, S. Sign language interpreter: Classification of forearm emg and imu signals for signing exact english. In Proceedings of the 2018 IEEE 14Th International Conference on Control and Automation (ICCA), Anchorage, AK, USA, 12–15 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 947–952. [ Google Scholar ]
  • Chen, H.; Qin, T.; Zhang, Y.; Guan, B. Recognition of American Sign Language Gestures Based on Electromyogram (EMG) Signal with XGBoost Machine Learning. In Proceedings of the 2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 3–5 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 24–29. [ Google Scholar ]
  • Cheng, J.; Chen, X.; Liu, A.; Peng, H. A novel phonology-and radical-coded Chinese sign language recognition framework using accelerometer and surface electromyography sensors. Sensors 2015 , 15 , 23303–23324. [ Google Scholar ] [ CrossRef ]
  • Yuan, S.; Wang, Y.; Wang, X.; Deng, H.; Sun, S.; Wang, H.; Huang, P.; Li, G. Chinese sign language alphabet recognition based on random forest algorithm. In Proceedings of the 2020 IEEE International Workshop on Metrology for Industry 4.0 & IoT, Rome, Italy, 3–5 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 340–344. [ Google Scholar ]
  • Ma, D.; Chen, X.; Li, Y.; Cheng, J.; Ma, Y. Surface electromyography and acceleration based sign language recognition using hidden conditional random fields. In Proceedings of the 2012 IEEE-EMBS Conference on Biomedical Engineering and Sciences, Langkawi, Malaysia, 17–19 December 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 535–540. [ Google Scholar ]
  • Zhuang, Y.; Lv, B.; Sheng, X.; Zhu, X. Towards Chinese sign language recognition using surface electromyography and accelerometers. In Proceedings of the 2017 24Th International Conference on Mechatronics and Machine Vision in Practice (m2VIP), Auckland, New Zealand, 21–23 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [ Google Scholar ]
  • Zhang, Z.; Su, Z.; Yang, G. Real-time Chinese Sign Language Recognition based on artificial neural networks. In Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali, China, 6–8 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1413–1417. [ Google Scholar ]
  • Su, R.; Chen, X.; Cao, S.; Zhang, X. Random forest-based recognition of isolated sign language subwords using data from accelerometers and surface electromyographic sensors. Sensors 2016 , 16 , 100. [ Google Scholar ] [ CrossRef ]
  • Li, M.; Wang, F.; Jia, K.; Zhao, S.; Li, C. A Sign Language Interactive System based on Multi-feature Fusion. In Proceedings of the 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Suzhou, China, 29 July–2 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 237–241. [ Google Scholar ]
  • Li, Y.; Chen, X.; Zhang, X.; Wang, K.; Wang, Z.J. A sign-component-based framework for Chinese sign language recognition using accelerometer and sEMG data. IEEE Trans. Biomed. Eng. 2012 , 59 , 2695–2704. [ Google Scholar ]
  • Zeng, Z.; Wang, F. An Attention Based Chinese Sign Language Recognition Method Using sEMG Signal. In Proceedings of the 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Baishan, China, 27–31 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 457–461. [ Google Scholar ]
  • Wang, N.; Ma, Z.; Tang, Y.; Liu, Y.; Li, Y.; Niu, J. An optimized scheme of mel frequency cepstral coefficient for multi-sensor sign language recognition. In Smart Computing and Communication: Proceedings of the First International Conference, SmartCom 2016, Shenzhen, China, 17–19 December 2016 ; Proceedings 1; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 224–235. [ Google Scholar ]
  • Yang, X.; Chen, X.; Cao, X.; Wei, S.; Zhang, X. Chinese sign language recognition based on an optimized tree-structure framework. IEEE J. Biomed. Health Inform. 2016 , 21 , 994–1004. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Zhao, T.; Ma, J.; Chen, H.; Liu, K.; Shao, H.; Wang, Q.; Ren, J. Hear sign language: A real-time end-to-end sign language recognition system. IEEE Trans. Mob. Comput. 2020 , 21 , 2398–2410. [ Google Scholar ] [ CrossRef ]
  • Li, Y.; Chen, X.; Zhang, X.; Wang, K.; Yang, J. Interpreting sign components from accelerometer and sEMG data for automatic sign language recognition. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 3358–3361. [ Google Scholar ]
  • Xue, B.; Wu, L.; Wang, K.; Zhang, X.; Cheng, J.; Chen, X.; Chen, X. Multiuser gesture recognition using sEMG signals via canonical correlation analysis and optimal transport. Comput. Biol. Med. 2021 , 130 , 104188. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Meng, J.; Gong, H.; Fan, Z. Research on Continuous Dynamic Gesture Recognition of Chinese Sign Language Based on Multi-Mode Fusion. IEEE Access 2022 , 10 , 106946–106957. [ Google Scholar ] [ CrossRef ]
  • Qian, Z.; JiaZhen, J.; Dong, W.; Run, Z. WearSign: Pushing the Limit of Sign Language Translation Using Inertial and EMG Wearables. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022 , 35 , 1–27. [ Google Scholar ]
  • Wu, J.; Sun, L.; Jafari, R. A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE J. Biomed. Health Inform. 2016 , 20 , 1281–1290. [ Google Scholar ] [ CrossRef ]
  • Wu, J.; Tian, Z.; Sun, L.; Estevez, L.; Jafari, R. Real-time American sign language recognition using wrist-worn motion and surface EMG sensors. In Proceedings of the 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Cambridge, MA, USA, 9–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [ Google Scholar ]
  • Savur, C.; Sahin, F. American Sign Language Recognition system by using surface EMG signal. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 002872–002877. [ Google Scholar ]
  • Zhang, Q.; Wang, D.; Zhao, R.; Yu, Y. MyoSign: Enabling end-to-end sign language recognition with wearables. In Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray, CA, USA, 17–20 March 2019; pp. 650–660. [ Google Scholar ]
  • Andronache, C.; Negru, M.; Neacsu, A.; Cioroiu, G.; Radoi, A.; Burileanu, C. Towards extending real-time EMG-based gesture recognition system. In Proceedings of the 2020 43rd International Conference on Telecommunications and Signal Processing (TSP), Milan, Italy, 7–9 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 301–304. [ Google Scholar ]
  • Rodríguez-Tapia, B.; Ochoa-Zezzatti, A.; Marrufo, A.I.S.; Arballo, N.C.; Carlos, P.A. Sign Language Recognition Based on EMG Signals through a Hibrid Intelligent System. Res. Comput. Sci. 2019 , 148 , 253–262. [ Google Scholar ] [ CrossRef ]
  • Derr, C.; Sahin, F. Signer-independent classification of American sign language word signs using surface EMG. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 665–670. [ Google Scholar ]
  • Tateno, S.; Liu, H.; Ou, J. Development of sign language motion recognition system for hearing-impaired people using electromyography signal. Sensors 2020 , 20 , 5807. [ Google Scholar ] [ CrossRef ]
  • Jiang, S.; Gao, Q.; Liu, H.; Shull, P.B. A novel, co-located EMG-FMG-sensing wearable armband for hand gesture recognition. Sens. Actuators A Phys. 2020 , 301 , 111738. [ Google Scholar ] [ CrossRef ]
  • Catalan-Salgado, E.A.; Lopez-Ramirez, C.; Zagal-Flores, R. American Sign Language Electromiographic Alphabet Sign Translator. In Telematics and Computing: Proceedings of the 8th International Congress, WITCOM 2019, Merida, Mexico, 4–8 November 2019 ; Proceedings 8; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 162–170. [ Google Scholar ]
  • Fatmi, R.; Rashad, S.; Integlia, R.; Hutchison, G. American Sign Language Recognition using Hidden Markov Models and Wearable Motion Sensors. Trans. Mach. Learn. Data Min. 2017 , 10 , 41–55. [ Google Scholar ]
  • Fatmi, R.; Rashad, S.; Integlia, R. Comparing ANN, SVM, and HMM based machine learning methods for American sign language recognition using wearable motion sensors. In Proceedings of the 2019 IEEE 9th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, USA, 7–9 January 2018; pp. 290–297. [ Google Scholar ]
  • Serdana, F.I. Controlling 3D Model of Human Hand Exploiting Synergistic Activation of The Upper Limb Muscles. In Proceedings of the 2022 International Electronics Symposium (IES), Surabaya, Indonesia, 9–11 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 142–149. [ Google Scholar ]
  • Paudyal, P.; Lee, J.; Banerjee, A.; Gupta, S.K. Dynamic feature selection and voting for real-time recognition of fingerspelled alphabet using wearables. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, Limassol, Cyprus, 13–16 March 2017; pp. 457–467. [ Google Scholar ]
  • Anetha, K.; Rejina Parvin, J. Hand talk-a sign language recognition based on accelerometer and SEMG data. Int. J. Innov. Res. Comput. Commun. Eng. 2014 , 2 , 206–215. [ Google Scholar ]
  • Shakeel, Z.M.; So, S.; Lingga, P.; Jeong, J.P. MAST: Myo Armband Sign-Language Translator for Human Hand Activity Classification. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 494–499. [ Google Scholar ]
  • Amrani, M.Z.; Borst, C.W.; Achour, N. Multi-sensory assessment for hand pattern recognition. Biomed. Signal Process. Control 2022 , 72 , 103368. [ Google Scholar ] [ CrossRef ]
  • Savur, C.; Sahin, F. Real-time american sign language recognition system using surface emg signal. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 497–502. [ Google Scholar ]
  • Paudyal, P.; Banerjee, A.; Gupta, S.K. Sceptre: A pervasive, non-invasive, and programmable gesture recognition technology. In Proceedings of the 21st International Conference on Intelligent User Interfaces, Sonoma, CA, USA, 7–10 March 2016; pp. 282–293. [ Google Scholar ]
  • Qi, S.; Wu, X.; Chen, W.H.; Liu, J.; Zhang, J.; Wang, J. sEMG-based recognition of composite motion with convolutional neural network. Sens. Actuators A Phys. 2020 , 311 , 112046. [ Google Scholar ] [ CrossRef ]
  • Divya, B.; Delpha, J.; Badrinath, S. Public speaking words (Indian sign language) recognition using EMG. In Proceedings of the 2017 International Conference on Smart Technologies for Smart Nation (SmartTechCon), Bengaluru, India, 17–19 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 798–800. [ Google Scholar ]
  • Gupta, R. A quantitative performance assessment of surface EMG and accelerometer in sign language recognition. In Proceedings of the 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON), Jaipur, India, 13–15 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 242–246. [ Google Scholar ]
  • Goel, S.; Kumar, M. A Real Time Sign Language Interpretation of forearm based on Data Acquisition Method. In Proceedings of the 2019 International Conference on Signal Processing and Communication, Noida, India, 7–9 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 207–212. [ Google Scholar ]
  • Gupta, R.; Kumar, A. Indian sign language recognition using wearable sensors and multi-label classification. Comput. Electr. Eng. 2021 , 90 , 106898. [ Google Scholar ] [ CrossRef ]
  • Gupta, R. On the selection of number of sensors for a wearable sign language recognition system. In Proceedings of the 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida, India, 8–10 August 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [ Google Scholar ]
  • Sharma, S.; Gupta, R.; Kumar, A. On the use of multi-modal sensing in sign language classification. In Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 7–8 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 495–500. [ Google Scholar ]
  • Sharma, S.; Gupta, R. On the use of temporal and spectral central moments of forearm surface EMG for finger gesture classification. In Proceedings of the 2018 2nd International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE), Ghaziabad, India, 20–21 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 234–239. [ Google Scholar ]
  • Kaginalkar, A.; Agrawal, A. Towards EMG Based Gesture Recognition for Indian Sign Language Interpretation Using Artificial Neural Networks. In Proceedings of the HCI International 2015-Posters’ Extended Abstracts: International Conference, HCI International 2015, Los Angeles, CA, USA, 2–7 August 2015; Proceedings, Part I. Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 718–723. [ Google Scholar ]
  • Suri, K.; Gupta, R. Transfer learning for semg-based hand gesture classification using deep learning in a master-slave architecture. In Proceedings of the 2018 3rd International Conference on Contemporary Computing and Informatics (IC3I), Gurgaon, India, 10–12 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 178–183. [ Google Scholar ]
  • Sharma, S.; Gupta, R.; Kumar, A. Trbaggboost: An ensemble-based transfer learning method applied to Indian Sign Language recognition. J. Ambient Intell. Humaniz. Comput. 2020 , 13 , 3527–3537. [ Google Scholar ] [ CrossRef ]
  • Abreu, J.G.; Teixeira, J.M.; Figueiredo, L.S.; Teichrieb, V. Evaluating sign language recognition using the myo armband. In Proceedings of the 2016 XVIII Symposium on Virtual and Augmented Reality (SVR), Gramado, Brazil, 21–24 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 64–70. [ Google Scholar ]
  • Kawamoto, A.L.S.; Bertolini, D.; Barreto, M. A dataset for electromyography-based dactylology recognition. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2376–2381. [ Google Scholar ]
  • Mendes Junior, J.J.A.; Freitas, M.L.B.; Campos, D.P.; Farinelli, F.A.; Stevan Jr, S.L.; Pichorim, S.F. Analysis of influence of segmentation, features, and classification in sEMG processing: A case study of recognition of brazilian sign language alphabet. Sensors 2020 , 20 , 4359. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mendes Junior, J.J.A.; Freitas, M.L.B.; Stevan, S.L.; Pichorim, S.F. Recognition of libras static alphabet with myo tm and multi-layer perceptron. In Proceedings of the XXVI Brazilian Congress on Biomedical Engineering: CBEB 2018, Armação de Buzios, RJ, Brazil, 21–25 October 2018; Springer: Singapore, 2019; Volume 2, pp. 413–419. [ Google Scholar ]
  • Kim, J.; Kim, E.; Park, S.; Kim, J. Implementation of a sign language primitive framework using EMG and motion sensors. In Proceedings of the 2016 IEEE 5th Global Conference on Consumer Electronics, Kyoto, Japan, 11–14 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–2. [ Google Scholar ]
  • Das, P.; Paul, S.; Ghosh, J.; PalBhowmik, S.; Neogi, B.; Ganguly, A. An approach towards the representation of sign language by electromyography signals with fuzzy implementation. Int. J. Sens. Wirel. Commun. Control 2017 , 7 , 26–32. [ Google Scholar ] [ CrossRef ]
  • Oh, D.C.; Jo, Y.U. Classification of hand gestures based on multi-channel EMG by scale Average wavelet transform and convolutional neural network. Int. J. Control. Autom. Syst. 2021 , 19 , 1443–1450. [ Google Scholar ] [ CrossRef ]
  • Dong, W.; Yang, L.; Gravina, R.; Fortino, G. Soft wrist-worn multi-functional sensor array for real-time hand gesture recognition. IEEE Sens. J. 2021 , 22 , 17505–17514. [ Google Scholar ] [ CrossRef ]
  • Wibawa, A.D.; Sumpeno, S. Gesture recognition for Indonesian Sign Language Systems (ISLS) using multimodal sensor leap motion and myo armband controllers based-on naïve bayes classifier. In Proceedings of the 2017 International Conference on Soft Computing, Intelligent System and Information Technology (ICSIIT), Densepar, Indonesia, 26–29 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [ Google Scholar ]
  • Rahagiyanto, A.; Basuki, A.; Sigit, R.; Anwar, A.; Zikky, M. Hand gesture classification for sign language using artificial neural network. In Proceedings of the 2017 21st International Computer Science and Engineering Conference (ICSEC), Bangkok, Thailand, 15–18 November 2017; IEEE: Piscataway, NJ, USA; pp. 1–5. [ Google Scholar ]
  • Anwar, A.; Basuki, A.; Sigit, R. Hand gesture recognition for Indonesian sign language interpreter system with myo armband using support vector machine. Klik—Kumpul J. Ilmu Komput. 2020 , 7 , 164. [ Google Scholar ] [ CrossRef ]
  • Rahagiyanto, A.; Basuki, A.; Sigit, R. Moment invariant features extraction for hand gesture recognition of sign language based on SIBI. EMITTER Int. J. Eng. Technol. 2017 , 5 , 119–138. [ Google Scholar ] [ CrossRef ]
  • Amor, B.H.A.; El Ghoul, O.; Jemni, M. Deep learning approach for sign language's handshapes recognition from EMG signals. In Proceedings of the 2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS), Paris, France, 15–17 July 2022; pp. 1–5. [ Google Scholar ] [ CrossRef ]
  • Hamid Yousuf, F.; Bushnaf Alwarfalli, A.; Ighneiwa, I. Arabic Sign Language Recognition System by Using Surface Intelligent EMG Signal. In Proceedings of the 7th International Conference on Engineering & MIS 2021, Almaty, Kazakhstan, 11–13 October 2021; pp. 1–6. [ Google Scholar ]
  • Amor, A.B.H.; El Ghoul, O.; Jemni, M. A deep learning based approach for Arabic Sign language alphabet recognition using electromyographic signals. In Proceedings of the 2021 8th International Conference on ICT & Accessibility (ICTA), Tunis, Tunisia, 8–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [ Google Scholar ]
  • Sernani, P.; Pacifici, I.; Falcionelli, N.; Tomassini, S.; Dragoni, A.F. Italian Sign Language Alphabet Recognition from Surface EMG and IMU Sensors with a Deep Neural Network. In Proceedings of the RTA-CSIT 2021: Recent Trends and Applications in Computer Science and Information Technology, Tirana, Albania, 21–22 May 2021; pp. 74–83. [ Google Scholar ]
  • Saif, R.; Ahmad, M.; Naqvi, S.Z.H.; Aziz, S.; Khan, M.U.; Faraz, M. Multi-Channel EMG Signal analysis for Italian Sign Language Interpretation. In Proceedings of the 2022 International Conference on Emerging Trends in Smart Technologies (ICETST), Karachi, Pakistan, 23–24 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [ Google Scholar ]
  • Pacifici, I.; Sernani, P.; Falcionelli, N.; Tomassini, S.; Dragoni, A.F. A surface electromyography and inertial measurement unit dataset for the Italian Sign Language alphabet. Data Brief 2020 , 33 , 106455. [ Google Scholar ] [ CrossRef ]
  • Oh, D.C.; Jo, Y.U. EMG-based hand gesture classification by scale average wavelet transform and CNN. In Proceedings of the 2019 19th International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 15–18 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 533–538. [ Google Scholar ]
  • Shin, S.; Baek, Y.; Lee, J.; Eun, Y.; Son, S.H. Korean sign language recognition using EMG and IMU sensors based on group-dependent NN models. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–7. [ Google Scholar ]
  • Kim, S.; Kim, J.; Ahn, S.; Kim, Y. Finger language recognition based on ensemble artificial neural network learning using armband EMG sensors. Technol. Health Care 2018 , 26 , 249–258. [ Google Scholar ] [ CrossRef ]
  • Khan, M.U.; Amjad, F.; Aziz, S.; Naqvi, S.Z.H.; Shakeel, M.; Imtiaz, M.A. Surface electromyography based Pakistani sign language interpreter. In Proceedings of the 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Istanbul, Turkey, 12–13 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [ Google Scholar ]
  • Khan, M.U.; Aziz, S.; Naqvi, S.Z.H.; Amjad, F.; Shakeel, M. Pakistani phrasal sign language classification using surface electromyography. In Proceedings of the 2020 International Conference on Computing and Information Technology (ICCIT-1441), Taibuk, Saudi Arabia, 9–10 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [ Google Scholar ]
  • Kaya, E.; Kumbasar, T. Hand gesture recognition systems with the wearable myo armband. In Proceedings of the 2018 6th International Conference on Control Engineering & Information Technology (CEIT), Istanbul, Turkey, 25–27 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [ Google Scholar ]
  • Seddiqi, M.; Kivrak, H.; Kose, H. Recognition of Turkish Sign Language (TID) Using sEMG Sensor. In Proceedings of the 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [ Google Scholar ]
  • Van Murugiah, K.; Subhashini, G.; Abdulla, R. Wearable IOT based Malaysian sign language recognition and text translation system. J. Appl. Technol. Innov. 2021 , 5 , 51. [ Google Scholar ]
  • Witman, A.D.; Meneses-Claudio, B.; Flores-Medina, F.; Condori, P.; Vargas-Cuentas, N.I.; Roman-Gonzalez, A. Acquisition and classification system of EMG signals for interpreting the alphabet of the sign language. Int. J. Adv. Comput. Sci. Appl. 2019 , 10 , 518–521. [ Google Scholar ] [ CrossRef ]
  • Kowalewska, N.; Łagodziński, P.; Grzegorzek, M. Electromyography Based Translator of the Polish Sign Language. In Information Technology in Biomedicine: Proceedings of the International Conference on Information Technologies in Biomedicine, Kamień Śląski, Poland, 18–20 June 2019 ; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 93–102. [ Google Scholar ]
  • Kim, J.; Wagner, J.; Rehm, M.; André, E. Bi-channel sensor fusion for automatic sign language recognition. In Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, The Netherlands, 17–19 September 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–6. [ Google Scholar ]
  • Amor, A.B.H.; El Ghoul, O.; Jemni, M. Sign language handshape recognition using Myo Armband. In Proceedings of the 2019 7th International Conference on ICT & Accessibility (ICTA), Hammamet, Tunisia, 13–15 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–5. [ Google Scholar ]
  • Khomami, S.A.; Shamekhi, S. Persian sign language recognition using IMU and surface EMG sensors. Measurement 2021 , 168 , 108471. [ Google Scholar ] [ CrossRef ]
  • Pereira-Montiel, E.; Pérez-Giraldo, E.; Mazo, J.; Orrego-Metaute, D.; Delgado-Trejos, E.; Cuesta-Frau, D.; Murillo-Escobar, J. Automatic sign language recognition based on accelerometry and surface electromyography signals: A study for Colombian sign language. Biomed. Signal Process. Control 2022 , 71 , 103201. [ Google Scholar ]
  • Amatanon, V.; Chanhang, S.; Naiyanetr, P.; Thongpang, S. Sign language—Thai alphabet conversion based on Electromyogram (EMG). In Proceedings of the 7th 2014 Biomedical Engineering International Conference, Fukuoka, Japan, 26–28 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–4. [ Google Scholar ]
  • Madushanka, A.L.P.; Senevirathne, R.; Wijesekara, L.M.H.; Arunatilake, S.; Sandaruwan, K.D. Framework for Sinhala Sign Language recognition and translation using a wearable armband. In Proceedings of the 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer), Negombo, Sri Lanka, 1–3 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 49–57. [ Google Scholar ]
  • Galea, L.C.; Smeaton, A.F. Recognising Irish sign language using electromyography. In Proceedings of the 2019 International Conference on Content-Based Multimedia Indexing (CBMI), Dublin, Ireland, 4–6 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [ Google Scholar ]
  • Kosmidou, V.E.; Hadjileontiadis, L.I. Using sample entropy for automated sign language recognition on sEMG and accelerometer data. Med. Biol. Eng. Comput. 2010 , 48 , 255–267. [ Google Scholar ] [ CrossRef ]
  • Toledo-Perez, D.C.; Rodriguez-Resendiz, J.; Gomez-Loenzo, R.A. A study of computing zero crossing methods and an improved proposal for EMG signals. IEEE Access 2020 , 8 , 8783–8790. [ Google Scholar ] [ CrossRef ]
  • Yousefi, J.; Hamilton-Wright, A. Characterizing EMG data using machine-learning tools. Comput. Biol. Med. 2014 , 51 , 1–13. [ Google Scholar ] [ CrossRef ]
  • Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003 , 3 , 1157–1182. [ Google Scholar ]
  • Phinyomark, A.; Limsakul, C.; Phukpattaranont, P. A novel feature extraction for robust EMG pattern recognition. arXiv 2009 , arXiv:0912 3973. [ Google Scholar ]
  • Khushaba, R.N.; Al-Jumaily, A. Channel and feature selection in multifunction myoelectric control. In Proceedings of the 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 5182–5185. [ Google Scholar ]
  • Kosmidou, V.E.; Hadjileontiadis, L.J.; Panas, S.M. Evaluation of surface EMG features for the recognition of American Sign Language gestures. In Proceedings of the 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, New York, NY, USA, 30 August–3 September 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 6197–6200. [ Google Scholar ]
  • Phinyomark, A.; Hirunviriya, S.; Nuidod, A.; Phukpattaranont, P.; Limsakul, C. Evaluation of EMG feature extraction for movement control of upper limb prostheses based on class separation index. In Proceedings of the 5th Kuala Lumpur International Conference on Biomedical Engineering 2011: (BIOMED 2011), Kuala Lumpur, Malaysia, 20–23 June 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 750–754. [ Google Scholar ]
  • Veer, K.; Sharma, T. A novel feature extraction for robust EMG pattern recognition. J. Med. Eng. Technol. 2016 , 40 , 149–154. [ Google Scholar ] [ CrossRef ] [ PubMed ]
ParameterVideoMotion CaptureSurface Electromyography
Sensing technologyCamerasInfrared cameras/sensorsElectrodes
Data typeVisual/2D/3D3D positions/orientationsMuscle activation signals
SensitivityLight conditionsMarker occlusionsMuscle contractions and noise
Gesture typesStatic/dynamicStaticStatic/dynamic
Spatial resolutionHigh (depends on camera)HighModerate/high
Temporal resolutionHigh (depends on fps)HighHigh
AccuracyVariable (depends on algo)High (depends on setup)Variable (depends on algo and setup)
ApplicationGeneral sign languageDetailed motion analysisMuscle analysis for sign language
PortabilityModerate/highLowHigh
CostLow/moderateHighModerate
Sign LanguagePercentReferences
Chinese sign language23.86%[ , , , , , , , , , , , , , , , , , , , , ]
American sign language23.86%[ , , , , , , , , , , , , , , , , , , , , ]
Indian sign language11.4%[ , , , , , , , , , ]
Brazilian sign language4.5%[ , , , ]
General sign language4.5%[ , , , ]
Indonesian sign language4.5%[ , , , ]
Arabic sign language3.4%[ , , ]
Italian sign language3.4%[ , , ]
Korean sign language3.4%[ , , ]
Pakistani sign language2.3%[ , ]
Turkish sign language2.3%[ , ]
Malaysian sign language1.1%[ ]
Peru sign language1.1%[ ]
Polish sign language1.1%[ ]
German sign language1.1%[ ]
French sign language1.1%[ ]
Parisian sign language1.1%[ ]
Colombian sign language1.1%[ ]
Thai sign language1.1%[ ]
Sinhala sign language1.1%[ ]
Irish sign language1.1%[ ]
Greek sign language1.1%[ ]
Sign LanguageNbr ClassesTypeSubjectsSizeDevice
Arabic sign language28Alphabet39350Myo armband
Italian sign language26Alphabet1780Myo armband
American sign language10Word8320Myo armband
Indian sign language6Sentence19223Myo armband
American sign language26Alphabet9234 × 5 sMyo armband
General sign language5Emotion12360-
FeaturePaper CountFeature ClassPaper Count
Variance (VAR)17Time-domain/statistical features138
Mean absolute value (MAV)46
Modified mean absolute value4
Root mean square (RMS)27
Standard deviation (SDV)20
Average amplitude change (AAC)8
Maximum (MAX)6
Minimum4
Median4
Average power2
Modified mean frequency (MMF)3Frequency-domain features30
Mean frequency (MFR)7
Modified median frequency2
Median frequency5
Reflection coefficient1
Power spectral density2
Discrete Fourier transform2
Spectral mean1
Spectral standard deviation1
Spectral skewness1
Maximum energy frequency1
Power in the channel2
Standard deviation (SDV)2
Temporal and spectral moment2Time-frequency features6
Moving variance2
Short-time Fourier transform2
Histogram2Signal shape and distribution Features8
Minimum fractal length1
Maximum fractal length4
Shape factor1
Kurtosis (KUR)9Higher-order statistics20
Skewness (SKW)11
Mel frequency cepstral coefficient3Mel frequency cepstral coefficients4
Mean of gammatone cepstral coefficient1
Wavelet transform3Wavelet transform coefficients5
Scale-average wavelet transform (SAWT)1
Wavelet energy1
Autoregressive coefficient (ARC)13Autoregressive model coefficients13
Waveform length (WVL)21Waveform-based features49
Zero crossing rate (ZCR)17
Willison amplitude4
Simple square integral (SSI)7
Log detector (LGD)5Other features37
Sample entropy1
Permutation entropy1
Mean power1
Power spectrum ratio1
Peak frequency2
Spurious-free dynamic range1
Log energy1
Shannon energy2
Irregularity factor1
Katz fractal dimension1
Integrated absolute value3
Slope sign changes11
Hjorth parameter2
Linear prediction coefficient1
Difference absolute standard deviation value2
Root squared zero-order moment normalized1
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Ben Haj Amor, A.; El Ghoul, O.; Jemni, M. Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review. Sensors 2023 , 23 , 8343. https://doi.org/10.3390/s23198343

Ben Haj Amor A, El Ghoul O, Jemni M. Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review. Sensors . 2023; 23(19):8343. https://doi.org/10.3390/s23198343

Ben Haj Amor, Amina, Oussama El Ghoul, and Mohamed Jemni. 2023. "Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review" Sensors 23, no. 19: 8343. https://doi.org/10.3390/s23198343

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

Artificial Intelligence Technologies for Sign Language

AI technologies can play an important role in breaking down the communication barriers of deaf or hearing-impaired people with other communities, contributing significantly to their social inclusion. Recent advances in both sensing technologies and AI algorithms have paved the way for the development of various applications aiming at fulfilling the needs of deaf and hearing-impaired communities. To this end, this survey aims to provide a comprehensive review of state-of-the-art methods in sign language capturing, recognition, translation and representation, pinpointing their advantages and limitations. In addition, the survey presents a number of applications, while it discusses the main challenges in the field of sign language technologies. Future research direction are also proposed in order to assist prospective researchers towards further advancing the field.

1. Introduction

Sign language (SL) is the main means of communication between hearing-impaired people and other communities and it is expressed through manual (i.e., body and hand motions) and non-manual (i.e., facial expressions) features. These features are combined together to form utterances that convey the meaning of words or sentences [ 1 ]. Being able to capture and understand the relation between utterances and words is crucial for the Deaf community in order to guide us to an era where the translation between utterances and words can be achieved automatically [ 2 ]. The research community has long identified the need for developing sign language technologies to facilitate the communication and social inclusion of hearing-impaired people. Although the development of such technologies can be really challenging due to the existence of numerous sign languages and the lack of large annotated datasets, the recent advances in AI and machine learning have played a significant role towards automating and enhancing such technologies.

Sign language technologies cover a wide spectrum, ranging from the capturing of signs to their realistic representation in order to facilitate the communication between hearing-impaired people, as well as the communication between hearing-impaired and speaking people. More specifically, sign language capturing involves the accurate extraction of body, hand and mouth expressions using appropriate sensing devices in marker-less or marker-based setups. The accuracy of sign language capturing technologies is currently limited by the resolution and discrimination ability of sensors and the fact that occlusions and fast hand movements pose significant challenges to the accurate capturing of signs. Sign language recognition (SLR) involves the development of powerful machine learning algorithms to robustly classify human articulations to isolated signs or continuous sentences. Current limitations in SLR lie in the lack of large annotated datasets that greatly affect the accuracy and generalization ability of SLR methods, as well as the difficulty in identifying sign boundaries in continuous SLR scenarios.

On the other hand, sign language translation (SLT) involves the translation between different sign languages, as well as the translation between sign and speaking languages. SLT methods employ sequence-based machine learning algorithms and aim to bridge the communication gap between people signing or speaking different languages. The difficulties in SLT lie in the lack of multilingual sign language datasets, as well as the inaccuracies of SLR methods, considering that the gloss recognition (performed by SLR methods) is the initial step of the SLT methods . Finally, sign language representation involves the accurate representation and reproduction of signs using realistic avatars or signed video approaches. Currently, avatar movements are deemed unnatural and hard to understand by the Deaf community due to inaccuracies in skeletal pose capturing and the lack of life-like features in the appearance of avatars.

Sign language technologies are connected in a way that affect each other as seen in Figure 1 . The accurate extraction of hand and body motions as well as facial expressions plays a crucial role to the success of the machine learning algorithms that are responsible for the robust recognition of signs. Moreover, the accurate sign language recognition significantly affects the performance of sign language translation and representation methods. The breakthroughs in sensorial devices and AI have paved the way for the development of sign language applications that can immensely facilitate hearing-impaired people in their everyday life.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05843-g001.jpg

Sign language technologies.

Previous literature reviews mainly concentrate on specific sign language technologies, such as video-based and sensor-based sign language recognition [ 3 , 4 , 5 , 6 , 7 ] and sign language translation [ 8 , 9 ]. Lately, with the development of sign language applications, there are also reviews that presented sign language systems to facilitate hearing-impaired people in teaching and learning, as well as in voice and text interpretation systems [ 10 , 11 ]. However, there is no systematic review that presents all sign language technologies and their relations with each other. This review aims to fill this gap by presenting the advances of AI in all sign language technologies, ranging from capturing and recognition to translation and representation and concludes by describing recent sign language applications that can considerably facilitate the communication among hearing-impaired and speaking people. The main purpose of this review is to demonstrate the importance of using AI technologies in sign language to facilitate deaf and hearing-impaired people in their communication with other communities. In addition, this review aims at familiarizing researchers with the state-of-the-art in all sign language technologies and propose future research directions that can facilitate the development of even more accurate approaches that can lead to mainstream products for the Deaf community. More specifically, the objectives of this review can be summarized as follows:

  • A comprehensive overview of the use of AI technologies in various sign language tasks (i.e., capturing, recognition, translation and representation), along with their importance to their field, is provided.
  • The advantages and limitations of modern sign language technologies and the relations between them are discussed and explored.
  • Possible future directions in the development of AI technologies for sign language are suggested to facilitate prospective researchers in the field.

The rest of this survey is organized as follows. In Section 2 , the literature search guideline is presented. Sign language capturing sensors are described in Section 3 . In Section 4 , sign language recognition methods are categorized and discussed. Sign language representation approaches and applications are presented in Section 5 and Section 6 , respectively. Finally, conclusions and potential future research directions are highlighted in Section 7 .

2. Literature Search

A systematic literature search was performed by adopting the PRISMA guidelines [ 12 ]. The articles were extracted in June 2021 from three academic databases, namely Scopus ( https://www.scopus.com/home.uri ), (link, accessed on 28 May 2021), ProQuest ( https://www.proquest.com/ ), (link, accessed on 28 May 2021) and IeeeXplore ( https://ieeexplore.ieee.org/Xplore/home.jsp ), (link, accessed on 28 May 2021). The articles that were not peer-reviewed or written in English were discarded. Since this review deals with AI technologies for sign language, the search was based on the following condition:

TITLE-ABSTRACT-KEYWORDS ( sign AND language AND ( recognition OR application(*) OR avatar(*) OR representation(*) OR translation OR captur(*) OR generation OR production ) ) AND PUBLISH YEAR > 2018 AND ( LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "cp" ) OR LIMIT-TO ( DOCTYPE , "ch" ) ) AND ( LIMIT-TO ( LANGUAGE , "English" ) ) AND ( LIMIT-TO ( PUBSTAGE , "final" ) ) AND ( LIMIT-TO ( SUBJAREA , "COMP" ) OR LIMIT-TO ( SUBJAREA , "ENGI" ) )

The aforementioned search condition describes the existence of the above words (i.e., recognition, translation, etc.) in the title, abstract or keywords of the literature works. In this context, (*) allows for variations in the search terms (i.e., captur(*) allows the existence of words, such as capture, capturing, etc.). In addition, the search is performed for papers published after 2018 since the field is evolving with fast pace and older methods are rendered quickly obsolete. To this end, this review aims to present only the latest and best works related to sign language technologies. Finally, the papers included in this review have been published as journal articles, conference proceedings and book chapters (i.e., DOCTYPE) in the fields of computing and engineering (i.e., SUBJAREA).

The number of the records retrieved from the three databases is 2368. From this number, 331 duplicate records are removed, leading to 2037 unique records. After screening title, abstract and finally the full text with various criteria to discard irrelevant records, 106 records remain and are included in this review. The selection procedure is depicted in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05843-g002.jpg

Flowchart of the systematic literature search process.

3. Sign Language Capturing

Sign language capturing involves the recording of sign gestures using appropriate sensor setups. The purpose is to capture discriminative information from the signs that will allow the study, recognition and 3D representation of signs at later stages. Moreover, sign language capturing enables the construction of large datasets that can be used to accurately train and evaluate machine learning sign language recognition and translation algorithms.

3.1. Capturing Sensors

The most common means of recording sign gestures is through visual sensors that are able to capture fine-grained information, such as facial expressions and body postures, that is crucial for understanding sign language. Cerna et al. in [ 13 ] employed a Kinect sensor [ 14 ] to simultaneously capture red-green-blue (RGB) image, depth and skeletal information towards the recording of a multimodal dataset with Brazilian sign language. Similarly, Kosmopoulos et al. in [ 15 ] captured realistic real-life scenarios with sign language using the Kinect sensor. The dataset contains isolated and continuous sign language recordings with RGB, depth and skeletal information, along with annotated hand and facial features. Contrary to the previous methods that use a single Kinect sensor, this work additionally employs a machine vision camera, along with a television screen, for sign demonstration. Sincan et al. in [ 16 ], captured isolated Turkish sign language glosses using Kinect sensors with a large variety of indoor and outdoor backgrounds, revealing the importance of capturing videos with various backgrounds. Adaloglou et al. in [ 17 ], created a large sign language dataset with RealSense D435 sensor that records both RGB and depth information. The dataset contain continuous and isolated sign videos and is appropriate for both isolated and continuous sign language recognition tasks.

Another sensor that has been employed for sign language capturing is Leap Motion, which has the ability to capture 3D positions of hand and fingers at the expense of having to operate close to the subject. Mittal et al. in [ 18 ], employed this type of sensor to record sign language gestures. Other setups with antennas and readers of radio-frequency identification (RFID) signals have also been adopted for sign language recognition. Meng et al. in [ 19 ], extracted phase characteristics of RFID signals to detect and recognize sign gestures. The training setup consists of an RFID reader, an RFID tag and a directional antenna. The recorded human should stand between the reader and the tag for a proper capturing. Moreover, the recognition system is signer-dependent.

On the other hand, wearable sensors have been adopted for capturing sign language gestures. Galea et al. in [ 20 ], used electromyography (EMG) to capture electrical activity that was produced during arm movement. The Thalmic MYO armband device was used for the recording of Irish sign language alphabet. Similarly, Zhang et al. [ 21 ] used a wearable device to capture EMG and inertial measurement unit (IMU) signals, while they used a convolutional neural network (CNN) [ 22 ] followed by a long short-term memory (LSTM) [ 23 ] architecture to recognize American sign language at both word and sentence levels. One disadvantage of the method is that its performance has not been evaluated under walking condition. Hou et al. in [ 24 ], proposed Sign-Speaker, which was deployed on a smartwatch to collect sign signals. Then, these signals were sent to a smartphone and were translated into spoken language in real-time. In this method, a very simple capturing setup is required, consisting of a smartwatch and a smartphone. However, their system recognizes a limited number of signs and it cannot generalize well to new users. Wang et al. in [ 25 ], employed a system with two armbands using both IMU and EMG sensors in order to capture fine-grained finger and hand positions and movements. How et al. in [ 26 ], used a low-cost dataglove with IMU sensors to capture sign gestures that were transmitted through Bluetooth to a smartphone device. Nevertheless, the employment of a single right-hand dataglove limited the number of signs that could be performed by this setup.

Each of the aforementioned sensor setups for sign language capturing has different characteristics, which makes it suitable for different applications. Kinect sensors provide high resolution RGB and depth information but their accuracy is restricted by the distance from the sensors. Leap Motion also requires a small distance between the sensor and the subject, but their low computational requirements enable its usage in real-time applications. Multi-camera setups are capable of providing highly accurate results at the expense of increased complexity and computational requirements. A myo armband that can detect EMG and inertial signals is also used in few works but the inertial signals may be distorted by body motions when people are walking. Smartwatches are really popular nowadays and they can also be used for sign language capturing but their output can be quite noisy due to unexpected body movements. Finally, datagloves can provide highly accurate sign language capturing results in real-time. However, the tuning of its components (i.e., flex sensor, accelerometer, gyroscope) may require a trial and error process that is impractical and time-consuming. In addition, signers tend to not prefer datagloves for sign language capturing as they are considered invasive.

3.2. Datasets

Datasets are crucial for the performance of methodologies regarding sign language recognition, translation and synthesis and as a result a lot of attention has been drawn towards the accurate capturing of signs and their meticulous annotation. The majority of the existing publicly available datasets are captured with visual sensors and are presented below.

3.2.1. Continuous Sign Language Recognition Datasets

Continuous sign language recognition (CSLR) datasets contain videos of sequences of signs instead of individual signs and are more suitable for developing real-life applications. Phoenix-2014 [ 27 ] is one of the most popular CSLR dataset with recordings of weather forecasts in German sign language. All videos were recorded with 9 signers at a frame rate of 25 frames per second. The dictionary has 1081 unique glosses and the dataset contains 5672 videos for training, 540 videos for validation and 629 videos for testing. The same authors created an updated version of Phoenix-2014, called Phoenix-2014-T [ 28 ], with spoken language translations, which makes it appropriate for both CSLR and sign language translation experiments. It contains 8257 videos from 9 different signers performing 1088 unique signs and 2887 unique words. Although all recordings are performed in a controlled environment, Phoenix-2014 and Phoenix-2014-T are both challenging datasets with large vocabularies and varying number of samples per sign with a few signs having a single sample. Similarly, BSL-1K [ 29 ] contains video recordings from British news broadcasts, along with automatically extracted annotations from provided subtitles. It is a large database with 273,000 samples from 40 signers that is also used for sign language segmentation. Another notable dataset is CSL [ 30 , 31 ] that contains Chinese words widely used in daily communication. The dataset has 100 sentences with signs that were performed from 50 signers. The recordings are performed in a lab with predefined conditions (i.e., background, lighting). The vocabulary size is 178 words that are performed multiple times, resulting in high recognition results achieved by SLR methods. GRSL [ 15 ] is another CSLR dataset of Greek sign language that is used in home care services, which contains multiple modalities, such as RGB, depth and skeletal joints. On the other hand, GSL [ 17 ] is a large Greek sign language dataset created to assist communication of Deaf people with public service employees. The dataset was created with a RealSense D435 sensor that records both RGB and depth information. Furthermore, it contains both continuous and isolated sign videos from 15 predefined scenarios. It is recorded on a laboratory environment, where each scenario is repeated five consecutive times.

3.2.2. Isolated Sign Language Recognition Datasets

Isolated sign language recognition (ISLR) datasets are important for identifying and learning discriminative features for sign language recognition. CSL-500 [ 31 , 32 ] is the isolated version of CSL but it contains 500 unique glosses performed from the same 50 signers. CSLR methods usually adopt this dataset for feature learning prior to finetuning on the CSL dataset. MS-ASL [ 33 ] is another widely employed ISLR dataset with 1000 unique American sign language glosses. It contains recordings collected from YouTube platform from 222 signers with a large variance in background settings, which makes this dataset suitable for training complex methods with strong representation capabilities. Similarly, WASL [ 34 ] is an ISLR dataset with 2000 unique American sign glosses performed by 119 signers. The videos have different background and illumination conditions, which makes it a challenging ISLR benchmark dataset. On the other hand, AUTSL is a Turkish sign language dataset captured under various indoor and outdoor backgrounds, while LSA64 [ 35 ] is an Argentinian sign language dataset that includes 3200 videos, in which 10 non-expert subjects execute 5 repetitions of 64 different types of signs. LSA64 is a small and relatively easy dataset, where SLR methods achieve outstanding recognition performance. Finally, IsoGD [ 36 ] is a gesture recognition dataset that consists of 47,933 RGB-D videos performed by 21 different individuals and contains 249 gesture labels. Although IsoGD is a gesture recognition dataset, its large size and challenging illumination and background conditions allows the training of highly accurate ISLR methods.

3.2.3. Discussion

A discussion about the aforementioned datasets can be made at this stage, while a detailed overview of the dataset characteristics is provided on Table 1 . It can be seen that over time datasets become larger in size (i.e., number of samples) with more signers involved in them, as well as contain high resolution videos captured under various and challenging illumination and background conditions. Moreover, new datasets usually include different modalities (i.e., RGB, depth and skeleton). Recording sign language videos using many signers is very important, since each person performs signs with different speed, body posture and face expression. Moreover, high resolution videos capture more clearly small but important details, such as finger movements and face expressions, which are crucial cues for sign language understanding. Datasets with videos captured under different conditions enable deep networks to extract highly discriminative features for sign language classification. As a result, methodologies trained in such datasets can obtain greatly enhanced representation and generalization capabilities and achieve high recognition performances. Furthermore, although RGB information is the predominant modality used for sign language recognition, additional modalities, such as skeleton and depth information, can provide complementary information to the RGB modality and significantly improve the performance of SLR methods.

Large-scale publicly available SLR datasets.

DatasetsCharacteristics
LanguageSignersClassesVideo InstancesResolutionTypeModalitiesYear
Phoenix-2014 [ ]German912316841210 × 260CSLRRGB2014
CSL [ , ]Chinese5017825,0001920 × 1080CSLRRGB, depth2016
Phoenix-2014-T [ ]German912318257210 × 260CSLRRGB2018
GRSL [ ]Greek1515004000varyingCSLRRGB, depth, skeleton2020
BSL-1K [ ]British401064273,000varyingCSLRRGB2020
GSL [ ]Greek731010,295848 × 480CSLRRGB, depth2021
CSL-500 [ , ]Chinese50500125,0001920 × 1080ISLRRGB, depth2016
MS-ASL [ ]American222100025,513varyingISLRRGB2019
WASL [ ]American119200021,013varyingISLRRGB2020
AUTSL [ ]Turkish4322638,336512 × 512ISLRRGB, depth2020
KArSL [ ]Arabic350275,300varyingISLRRGB, depth, skeleton2021

4. Sign Language Recognition

Sign language recognition (SLR) is the task of recognizing sign language glosses from video streams. It is a very important research area since it can bridge the communication gap between hearing and Deaf people, facilitating the social inclusion of hearing-impaired people. Moreover, sign language recognition can be classified into isolated and continuous based on whether the video streams contain an isolated gloss or a gloss sequence that corresponds to a sentence.

4.1. Continuous Sign Language Recognition

Continuous Sign Language Recognition aims at classifying signed videos to entire sentences (i.e., ordered sequence of glosses). CSLR is a very challenging task as it requires the recognition of glosses from video sequences without any knowledge of the sign boundaries (i.e., lack of ground truth annotations regarding the start and end of glosses). Most works adopt 2D or 3D-CNNs for feature extraction followed by temporal convolutional networks or recurrent neural networks (RNNs) for sequential information modelling. To measure CSLR performance, word error rate (WER) [ 38 ] is commonly adopted. WER measures the number of operations (i.e., substitutions, deletions and insertions) required to transform the predicted sequence into the target sequence.

Cui et al. [ 39 ] adopted a 2D-CNN followed by temporal 1D convolutional layers for feature extraction. The extracted spatio-temporal features were fed to a bidirectional long short-term memory (BLSTM) network for modelling the context of the entire sequence. The feature extractor was extended with a classifier and trained in a fully-supervised setting on isolated glosses for video to gloss alignment, while the BLSTM was used for CSLR. This two-step optimization process was conducted iteratively with Connectionist Temporal Classification (CTC) [ 40 ] and Cross-Entropy losses, until the network converged. Besides, the recognition model fused RGB with optical flow modalities and achieved a WER of 22.8% on the Phoenix-2014 dataset. Similarly, Koishybay et al. in [ 41 ], adopted a residual 2D-CNN with cascaded 1D convolutional layers for feature extraction, while for CSLR experiments, BLSTM was utilized. Their method generated gloss-level alignments using the Levenshtein distance in order to fine-tune the feature extractor. However, the authors stated that during the early iterations the model predicted poor alignment proposals, which hinders the training process and requires several iterations to converge. Cheng et al. in [ 42 ], proposed a 2D fully convolutional network with a feature enhancement module that did not require iterative training. Instead, it provided extra supervision and assisted the CSLR network to learn better gloss alignments. Niu et al. in [ 43 ], proposed a 2D-CNN followed by a Transformer network for CSLR. They used three stochastic methods to drop frames of the input video, to randomly stop gradients of back-propagation and to model glosses using hidden states, respectively, which led to better CSLR performance. Nevertheless, the randomness ratio of these stochastic processes must be tuned carefully to achieve good recognition rates. Generally, CSLR methods based on 2D-CNNs achieve great recognition performance. More specifically, 2D-CNNs extract descriptive features from the frame sequences, while the sequence modelling mechanisms align efficiently the input video and the output predictions. However, they usually require complex training strategies, such as iterative optimization techniques, to achieve strong feature extraction capabilities.

On the other hand, some works chose to incorporate attention mechanisms for CSLR. Pan et al. in [ 44 ], used a key-frame sampling technique to extract the most descriptive frames of the video. Then, a vector representation was constructed from the skeletal data of the key-frames, which was fed to an attention-based BLSTM to model the temporal information. Huang et al. [ 45 ] proposed an adaptive encoder-decoder architecture to learn the temporal boundaries of the video. Furthermore, a hierarchical BLSTM with attention over sliding windows was used on the decoder to weigh the importance of the input frames. Li et al. in [ 46 ], used a pyramid structure of BLSTMs in order to find key actions of the video representations, which were produced from the 2D-CNN. Moreover, an attention-based LSTM was used to align the input and output sequences and the whole network was trained jointly with Cross-Entropy and CTC losses.

Recently, the self-attention mechanism has been introduced in a variety of models, such as the Transformer, and has also been adopted by CSLR methods. Slimane et al. in [ 47 ], proposed two data streams with cropped hand images and full images. The two modalities were passed through two 2D-CNNs to extract the spatial features. Then, the modalities were synchronized by a self-attention module to obtain better contextual information and generate efficient video representations for CSLR. Zhou et al. [ 48 ], adopted a fully-inception architecture with 2D and 1D convolutional layers along with a self-attention to further improve the feature extraction capabilities of the inception layers.

Reinforcement techniques have also been applied for CSLR, along with Transformer networks. Zhang et al. in [ 49 ], adopted a 3D-CNN followed by a Transformer network that was responsible for recognizing gloss sequences from input videos. Instead of training the model with cross-entropy loss, they used the REINFORCE algorithm [ 50 ] to directly optimize the model by using WER as the reward function of the agent (i.e., the feature extractor). Wei et al. in [ 51 ], used a semantic boundary detection algorithm with reinforcement learning to improve CSLR performance. A spatio-temporal feature extractor learned the video representations. Then, the detection algorithm used reinforcement learning to detect gloss timestamps from video sequences and refine the final video representations. The evaluation metric was used again as the reward function. The major limitation of this method is the need for a careful selection of the pooling size, which defines the action search space for the reinforcement learning agent.

Papastratis et al. [ 52 ] constructed a cross-modal approach in order to effectively model intra-gloss dependencies by leveraging information from text. This method extracted video features using a video encoder that consisted of a 2D-CNN followed by temporal convolutions and a BLSTM, while text representations were obtained from an LSTM. Finally, these embeddings were aligned in a joint latent space. The improved representations led to great CSLR performance, achieving WERs of 24.0% and 3.52% on Phoenix-2014 and GSL SI, respectively. Papastratis et al. in their latest work [ 53 ], employed a generative adversarial network to evaluate the predictions of the video encoder. In addition, contextual information was incorporated to improve recognition performance on sign language conversations.

Due to their efficient feature extraction capabilities, 3D-CNNs have also been adopted by many researchers for CSLR. Wei et al. in [ 54 ], used a 3D residual CNN along with a BLSTM, while they applied grammatical rules sign language. The text was split into isolated words and n -grams, which are modelled using two classifiers. The two classifiers aimed to recognize each word independently and based on the context in contrast to CTC, which models the whole sequence. Pu et al. in [ 55 ], employed a 3D-CNN with an LSTM decoder and a CTC decoder that were jointly aligned with a soft dynamic time warping (soft-DTW) [ 56 ] alignment constraint. The network was trained recursively with the proposed alignments from soft-DTW. The method achieved WERs of 6.1% and 32.7% on CSL Split 1 and CSL Split 2, respectively. Guo et al. in [ 57 ], developed a fully convolutional approach with a 3D-CNN followed by 1D temporal convolutional layers. The 1D CNN block had a hierarchical structure with small and large receptive fields to capture short- and long-term correlations in the video, while the entire architecture was trained with CTC loss. 3D-CNNs are computationally expensive methods that require pre-training on large-scale datasets and cannot be tuned directly for CSLR. To this end, sliding window techniques are adopted to create informative features. To tackle this problem, some works incorporated pseudo-labelling, which is an optimization process that adds predicted labels on the training set. Pei et al. in [ 58 ], trained a deep 3D-CNN with CTC and generate clip-level pseudo-labels from the alignment of CTC to obtain better feature representations. To improve the quality of pseudo-labels, Zhou et al. in [ 59 ], proposed a dynamic decoding method instead of greedy decoding to find better alignment paths and filter out the wrong pseudo-labels. Their method applied the I3D [ 60 ] network from the action recognition field along with temporal convolutions and bidirectional gated recurrent units (BGRU) [ 61 ]. Moreover, the proposed method achieved a WER of 34.5% on the Phoenix-2014 dataset. However, pseudo-labelling required many iterations, while initial labels affected the convergence of the optimization process.

In Table 2 , several methods are compared on the test set of the most commonly adopted datasets for continuous sign language recognition. From the experimental results it is shown that multi-modal methods achieve the lowest WERs. More specifically, STMC [ 62 ] has the best recognition rates on Phoenix-2014, CSL Split 1 and CSL Split 2 datasets using RGB, hands and skeleton modalities, while SLRGAN [ 53 ], employing the RGB and text modality, achieves superior performance on the GSL SI and GSL SD datasets.

Performance comparison of CSLR approaches categorized by dataset measured in WER (%). The best performance for each dataset appears in bold.

MethodInput ModalityDatasetTest Set (WER)
PL [ ]RGBPhoenix-201440.6
RL [ ]RGB 38.3
Align-iOpt [ ]RGB 36.7
DenseTCN [ ]RGB 36.5
DPD [ ]RGB 34.5
CNN-1D-RNN [ ]RGB 34.4
Fully-Inception Networks [ ]RGB 31.3
SAN [ ]RGB 29.7
SFD [ ]RGB 25.3
CrossModal [ ]RGB 24.0
Fully-Conv-Net [ ]RGB 23.9
SLRGAN [ ]RGB 23.4
CNN-TEMP-RNN [ ]RGB+Optical flow 22.8
STMC [ ]RGB+Hands+Skeleton
DenseTCN [ ]RGBCSL Split 114.3
Key-action [ ]RGB 9.1
Align-iOpt [ ]RGB 6.1
WIC-NGC [ ]RGB 5.1
DPD [ ]RGB 4.7
Fully-Conv-Net [ ]RGB 3.0
CrossModal [ ]RGB 2.4
SLRGAN [ ]RGB
STMC [ ]RGB+Hands+Skeleton
Key-action [ ]RGBCSL Split 249.1
DenseTCN [ ]RGB 44.7
Align-iOpt [ ]RGB 32.7
STMC [ ]RGB+Hands+Skeleton
CrossModal [ ]RGBGSL SI3.52
SLRGAN [ ]RGB
CrossModal [ ]RGBGSL SD41.98
SLRGAN [ ]RGB

4.2. Isolated Sign Language Recognition

Isolated sign language recognition refers to the task of accurately detecting single sign gestures from videos and thus it is usually tackled similar to action and gesture recognition, as well as other types of video processing and classification tasks with the extraction and learning of highly discriminative features [ 63 , 64 , 65 ]. In the literature, a common approach to the task of isolated sign language recognition is the extraction of hand and mouth regions from the video sequences in an attempt to remove noisy backgrounds that can inhibit classification performance. Liao et al. in [ 66 ], proposed a video-based SLR method that was based on hand region extraction and classification using 3D ResNet networks and BLSTM layers. Similarly, Aly et al. in [ 67 ], developed an ISLR method that segmented hand regions from images using DeepLabv3+ algorithm [ 68 ], extracted features from these regions using a Convolutional Self-Organizing Map and classified the features using a deep recurrent neural network consisting of 3 BLSTM layers. Gökçe et al. in [ 69 ], proposed 3D-CNN networks for the processing of hand, upper body and face image regions and the fusion of these streams in the score level to accurately classify isolated signs. The authors stated that their method performs comparatively worse on mono-morphemic signs performed with a single hand, rather than on temporally more complex signs with two-handed gestures. On the other hand, Zhang et al. in [ 70 ], proposed the Multiple extraction and Multiple prediction (MEMP) network that consists of alternating 3D-CNN networks and Convolutional LSTM layers that extracted spatio-temporal features from video sequences multiple times, enabling the network to achieve 99.06% and 78.85% accuracy in the LSA64 and IsoGD datasets, respectively. Li et al. in [ 71 ], proposed a SLR method that was based on the transferring of cross-domain knowledge of news signs to a base model and improve its performance using domain-invariant features.

To further improve the accuracy and robustness of SLR methods, several researchers proposed the extraction of other types of features, such as optical flow and skeletal joints from visual cues. These multi-stream networks are more computationally expensive than their single stream counterparts, but they have the advantage of overcoming confusing cases regularly met when a single type of features is employed. Sarhan et al. in [ 72 ], proposed a two-stream network architecture that received as input RGB and optical flow data, extracted features using I3D networks and performed late fusion at the score level for accurate sign language recognition. Rastgoo et al. in [ 73 ], proposed a multi-stream SLR method that utilized as input hand image regions, hand heatmaps and 2D projections of hand skeletal joints to images. These input data were processed using 3D-CNN networks, concatenated and fed to LSTM layers for sign recognition. Konstantinidis et al. in [ 74 ], proposed a SLR methodology that was based on the processing and late fusion of body and hand skeletal features using LSTM layers. Apart from the raw joint coordinates, the authors also utilized joint-line distances, which led to a significant improvement in the performance of the method, reaching 98.09% accuracy in the LSA64 dataset. In a later work [ 75 ], the same authors introduced additional streams that processed RGB video sequences and optical flow data, enhancing even more the performance of their method, ultimately achieving 99.84% accuracy in the LSA64 dataset. Similarly, Papadimitriou et al. in [ 76 ], proposed a multi-stream SLR method that processes hand and mouth regions, as well as optical flow and skeletal features for the accurate classification of signs. These features were concatenated and fed to a temporal deformable convolutional attention-based encoder-decoder that predicts the sign class. Gündüz et al. in [ 77 ], employed a multi-stream SLR approach that received as input RGB video sequences, optical flow sequences and body and hand skeletal features and performed a late fusion to accurately classify Turkish signs. Bilge et al. in [ 78 ], proposed a SLR method that can generalize well on unseen signs. To achieve this, the authors employed two 3D-CNN networks followed by BLSTM layers for the extraction of short-term and long-term feature representations from body and hand video sequences. In addition, the authors employed a BERT model [ 79 ] for the extraction of textual sign representations from text descriptions of how the signs were performed. Finally, they used a bi-linear compatibility function to associate video and text representations.

In an effort to derive more discriminative features, Rastgoo et al. in [ 63 ], proposed a multi-stream SLR method that gets as input hand regions, 3D hand pose features and Extra Spatial Hand Relation features (i.e., orientation and slope of hands). These features were concatenated and fed to an LSTM layer to derive the sign class. In this way, the authors managed to achieve a really high accuracy of 86.32% in the challenging IsoGD dataset. Kumar et al. in [ 64 ], proposed Spatial 3D Relational Features for sign language recognition. These features were computed from the area and perimeter of polygons formed by quadruples of skeletal joints. Then, the class of a test sign was predicted by comparing the sign with the training set using global alignment kernels. In another work [ 80 ], Kumar et al. introduced two novel features for accurate sign language recognition that were named colour-coded topographical descriptors. These descriptors were formed as images from the computation of joint distances and angles. Finally, these descriptors were processed by 2D CNNs and merged to derive the class of the sign.

Recently, the advances in deep learning led several isolated SLR methods to leverage attention mechanisms, transformer networks and graph convolutional networks. Attention mechanisms in particular enable a deep network to pay more attention on features that are important for a classification task and are widely employed by most state-of-the-art SLR methods. Parelli et al. in [ 81 ], proposed a multi-stream SLR method that processes hand and mouth image regions as well as 3D hand skeletal data. All streams were concatenated and fed to an attention CNN network that accurately predicts the class of the sign. Attention LSTM, attention GRU and Transformer networks were also tested but they led to inferior performance. De Amorim et al. in [ 82 ], proposed an American SLR method that extracts skeletal data from video sequences and then processes them using a Spatio-Temporal Graph Convolutional Network (GCN) [ 83 ]. Tunga et al. in [ 84 ], proposed a SLR method that extracts skeletal features from video sequences and then employs a GCN network to model spatial dependencies among the skeletal data, as well as a BERT model to model temporal dependencies among the skeletal data. The two representations were finally merged to derive the class of the sign. A limitation of this approach is that the model cannot differentiate in-plane and out-of-plane movements due to the use of only 2D spatial information. In a similar fashion, Meng et al. in [ 85 ], proposed a GCN with multi-scale attention modules to process the extracted skeletal data and model their long-term spatial and temporal dependencies. In this way, the authors achieved a really high accuracy of 97.36% in the CSL-500 dataset. GCNs are computationally lighter than the image processing networks, but they often cannot extract highly enriched features, thus leading to inferior performance, as noted in [ 82 ].

Finally, the wide adoption of RGB-D sensors for action and gesture recognition has led several researchers to adopt them for multi-modal sign language recognition as well. However, the performance of such multi-modal methodologies is currently limited by the small number of large publicly available RGB-D datasets and the mediocre accuracy of depth information. Tur et al. in [ 86 ], proposed a Siamese deep network for the concurrent processing of RGB and depth sequences. The extracted features were then concatenated and passed to an LSTM layer for isolated sign language recognition. Ravi et al. in [ 87 ], proposed a multi-modal SLR method that was based on the processing of RGB, depth and optical flow sequences. Each stream employed CNN layers to process the sequences and then, all features were fused together and fed to a CNN model for classification. Rastgoo et al. in [ 88 ], proposed a multi-modal SLR method that leverages RGB and depth video sequences to achieve an accuracy of 86.1% in the IsoGD dataset. More specifically, the authors extracted pixel-level, optical flow, deep hand and hand pose features for each modality, concatenated these features across both modalities and classified them to sign classes using an LSTM layer. The authors stated that there were signs with similar appearance and motion features that led to misclassification errors and thus they proposed the use of augmentation strategies, high capacity networks and more data samples.

Huang et al. in [ 89 ], proposed the use of RGB, depth and skeletal data as input to attention-based 3D-CNNs and attention-based BLSTMs in order for the proposed SLR method to pay attention to spatio-temporal dependencies in the input data and fuse the input streams in an optimal way. Huang et al. in [ 90 ], proposed a sequence-to-sequence approach that detects key frames to remove noisy information from video sequences. Then, they extracted CNN features from these key frames, histogram-of-gradients (HOG) features from depth motion maps and trajectory features from skeletal data. These features were finally concatenated and fed to an encoder-decoder LSTM network that predicted sub-words that form the signed word. Zhang et al. in [ 91 ], proposed a highly accurate SLR method that initially selected pairs of aligned RGB-D images to reduce redundancy. Then, the proposed method computed discriminative features from hand regions using a spatial stream and extracted depth motion features using a temporal stream. Both streams were finally fused by a convolutional fusion layer and the output feature vector was used for classification. The authors reported that occlusions and the surface materials can significantly affect the quality of depth images, degrading the performance of their model. Common failure cases among most ISLR methodologies are the difficulty in differentiating signs when performed differently by users and the inability to accurately classify signs with similar hand shapes and positions. An overview of the performance of ISLR methods on well-known datasets are presented in Table 3 .

Performance of ISLR methods on well-known datasets. The best performance for each dataset appears in bold.

MethodDatasetAccuracy (%)
Konstantinidis et al.  [ ]LSA64 [ ]98.09
Zhang et al.  [ ]99.06
Konstantinidis et al.  [ ]99.84
Gündüz et al.  [ ]99.9
Huang et al.  [ ]CSL-500 [ , ]91.18
Zhang et al.  [ ]96.7
Meng et al.  [ ]97.36
Sarhan et al.  [ ]IsoGD [ ]62.09
Zhang et al.  [ ]63.78
Zhang et al.  [ ]78.85
Rastgoo et al.  [ ]86.1
Rastgoo et al.  [ ]86.32

4.3. Sign Language Translation

Sign Language Translation is the task of translating videos with sign language into spoken language by modeling not only the glosses but also the language structure and grammar. It is an important research area that facilitates the communication between the Deaf and other communities. Moreover, the SLT task is more challenging compared to CSLR due to the additional linguistic rules and the representation of spoken languages. SLT methods are usually evaluated using the bilingual evaluation understudy (BLEU) metric [ 92 ]. BLEU is a translation quality score that evaluates the correspondence between the predicted translation and the ground truth text. More specifically, BLEU- n measures the n -gram overlap between the output and the reference sentences. BLEU-1,2,3,4 are reported to provide a clear view of the actual translation performance of a method. Camgoz et al. in [ 28 ], adopted an attention-based neural machine translation architecture for SLT. The encoder consisted of a 2D-CNN and an LSTM network, while the decoder consists of word embeddings with an attention LSTM. The authors stated that the method is prone to errors when spoken words are not explicitly signed in the video but inferred from the context. Their method set the baseline performance on Phoenix-2014-T with a BLEU-4 score of 18.4. Orbay et al. in [ 93 ], compared different gloss tokenization methods using either 2D-CNN, 3D-CNN, LSTM or Transformer networks. In addition, they investigated the importance of using full frames compared to hand images as the first provide useful information regarding the face and arms of the signer for SLT. On the other hand, Ko et al. in [ 94 ], utilized human keypoints extracted from the video, which were then fed to a recurrent encoder-decoder network for sign language translation. Furthermore, the skeletal features were extracted with OpenPose and then normalized to improve the overall performance. Then, they were fed to the encoder, while the translation was generated from the attention decoder. Differently, Zheng et al. in [ 95 ], used a preprocessing algorithm to remove similar and redundant frames of the input video and increase the processing speed of the neural network without losing information. Then, they employed an SLT architecture that consisted of a 2D-CNN, temporal convolutional layers and bidirectional GRUs. Their method was able to deal with long videos that have long-term dependencies, improving the translation quality. Zhou et al. in [ 62 ], proposed a multi-modal framework for CSLR and SLT tasks. The proposed method used 2D-CNN, 1D convolutional layers and several BLSTMs and learned both spatial and temporal dependencies between different modalities. The proposed method achieved a BLEU-4 score of 23.65 on the test set of Phoenix-2014-T. However, due to the multi-modal cues, this method is very computationally heavy and requires several hours of training.

Recently, Transformer networks have also been employed for sign language translation due to their success in natural language processing tasks. Camgoz et al. in [ 96 ], introduced a joint architecture for CSLR and SLT with a Transformer encoder-decoder network. The network was trained with CTC and Cross-Entropy losses, while the gloss-level supervision improved the SLT performance. The authors evaluated various configurations of their method and stated that directly translating from video representations can improve the translation quality. A limitation of this approach was in translating numbers as there was no such context available during training. In their latest work, Camgoz et al. in [ 97 ], adopted additional modalities and a cross-modal attention to synchronize the different streams and model both inter- and intra-contextual information. Kim et al. in [ 98 ], used a deep neural network for human keypoint extraction that were fed to a transformer encoder-decoder network, while the keypoints were normalized based on the neck location. A comparison of existing methods for SLT that are evaluated on the Phoenix-2014-T dataset, is shown in Table 4 . Overall, Transformer-based SLT methods achieve slightly better performance than RNN-based methods, which indicates the importance of attention mechanism for SLT. In addition, using multiple modalities can also improve the translation quality.

Reported results on sign language translation on Phoenix-2014-T. The best performance appears in bold.

MethodValidation SetTest Set
BLEU-1BLEU-2BLEU-3BLEU-4BLEU-1BLEU-2BLEU-3BLEU-4
Sign2Gloss2Text [ ]42.8830.3023.0218.4043.2930.3922.8218.13
MCT [ ]---19.51--18.51
S2(G+T)-Transformer [ ]47.2634.4027.0522.3846.6133.7326.1921.32
STMC-T [ ]

5. Sign Language Representation

The automatic and realistic sign language representation is vital for each sign language system. The representation of a sentence in sign language instead of a plain text can make the system friendlier and more accessible to the members of the deaf community. Signs are commonly represented using avatars or synthesized videos of a real human. The challenges of this task include the difficulty in creating realistic representations due to complex hand shapes and rapid arm movements.

5.1. Realistic Avatars

A common approach to sign language representation is the use of 3D avatars that with a high degree of accuracy and realism can reproduce facial expressions and body/hand movements in a way that represent signs understandable by deaf or hearing-impaired people. Balayn et al. in [ 99 ], developed a virtual communication agent for sign language to recognize Japanese sign language sentences from video recordings and synthesize sign language animations. Their system adopted a deep LSTM encoder-decoder network to translate sign language videos to spoken text, while a separate encoder-decoder network used as input the sign language glosses and extracted specific encodings, which were then used to synthesize the avatar motion. However, the network employed for the generation task does not have enough parameters to learn complete sentence expressions, lacking an attention module that could assist in learning longer-term dependencies. Shaikh et al. in [ 100 ], employed a system to generate sign animations from audio announcements in railway stations. At first, language rules and grammar was applied in the input text to transform it into a specific format. Then, inverse kinematics were applied to calculate the avatar target positions for each word and render the final video representation. Melchor et al. in [ 101 ], used a speech recognition system that translates Mexican spoken text into sign language. Then, the signs were represented through an avatar that was digitally animated on a mobile device. Uchida et al. in [ 102 ], developed an application to automatically produce sign language animations for sports games and was able to operate on live game broadcasts. A disadvantage of the application is that the delay time between the video occurrence and the video display is large.

Das et al. in [ 103 ], developed a 3D avatar to convert Indian text or speech into sign language. The input was translated to English and then to the corresponding Indian sign language using Natural Language Processing (NLP) rules and techniques. The final avatar movements were generated using a predefined sign vocabulary and Blender. A limitation of the system is that it was developed for a limited corpus and that the avatar had no facial expressions. Mehta et al. in [ 104 ], introduced a system in order to translate online videos into Indian Sign Language (ISL) and produce sign animations with a 3D cartoon-like avatar. The audio from the videos was captioned using NLP algorithms and mapped to signs that were finally rendered with the avatar. Nevertheless, due to the limited resources available for ISL, the performance of the system may degrade when dealing with complex grammatical structures and interactions. Patel et al. in [ 105 ], developed an application for animation generation. The input speech was recognised and translated with Google Cloud Speech Recognizer. Then, the translated text was converted to Hamburg notation system (HamNoSys) [ 106 ] and sign gesture markup language (SigML) [ 107 ] notations to effectively generate animations. Kumar et al. in [ 108 , 109 ] developed a mobile application to translate English text into ISL. HamNoSys was used for sign representation, SigML for its conversion to an XML file, and an avatar was employed to generate signs. A weakness of the developed system is that it struggles to represent complex animation and facial expressions of ISL signs. Moreover, the proposed system does not index the signs based on its context and this can cause confusion on directional signs that require different handling based on the context. Brock et al. in [ 110 ], adopted deep recurrent neural networks to generate 3D skeleton data from sign language videos. Subsequently, inverse kinematics were applied to calculate joints angles and positions that were mapped to a sign language avatar for animation synthesis.

5.2. Sign Language Production

Sign language production (SLP) has gained a lot of attention lately due to the huge advances in deep learning that allows the production of realistic signed videos. Sign language production techniques aim to replace the rigid body and facial features of an avatar with the natural features of a real human. To this end, these techniques usually receive as input sign language glosses and a reference image of a human and synthesize a signed video with the human performing signs in a more realistic way than the one that could have been achieved by an avatar.

Stoll et al. in [ 111 ], proposed an SLP method using a machine translation encoder-decoder network to translate spoken language into gloss sequences. Then, each gloss was assigned to a unique 2D skeleton pose, which were extracted from sign videos, normalized and aligned. Finally, a pose-guided generative adversarial network handled the skeleton pose sequence and a reference image to generate the gloss video. However, this methods fails to generate precise videos when the hand keypoints are not detected by the pose estimation method or the timing of the glosses is not predicted correctly. In their latest work, Stoll et al. in [ 112 ], used an improved architecture with additional components. The NMT network directly transforms spoken text to pose sequences, while a motion graph was adopted to generate 2D smooth skeletal poses. An improved generative adversarial network (GAN) was used in order to produce videos with higher resolution. The motion graph and the GAN modules improved significantly the quality of the generated videos. Stoll et al. in [ 113 ], adopted an auto-regressive gloss-to-pose network that can generate skeleton poses and velocities for each sign language gloss. In addition, a pose-to-video network generated the output video using a 2D-CNN along with a GAN. This approach resulted in smooth transitions between glosses and refined details on hand and finger shapes. Saunders et al. in [ 114 ], employed Transformers to automatically generate 3D human poses from spoken text using a multiple-level configuration. A text-to-gloss-to-pose (T2G2P) network with Transformer layers translated text sentences to sign language glosses and finally to 3D poses, while a text-to-pose (T2P) network directly transformed text into human poses. Furthermore, a progressive Transformer decoder was used to generate continuous and smooth human poses one frame at a time. Furthermore, the method achieved superior performance compared to NMT-based and GAN-based methods. Xiao et al. in [ 115 ] developed a bidirectional system for SLR and SLP. A deep RNN was used to jointly recognize sign language from input skeleton poses and generated skeleton sequences that were responsible to move an avatar or generate a signed video. The generated sequences were also used for SLR and improved the robustness of the system.

Cui et al. in [ 116 ], used a pose predictor network, which contains an LSTM and an autoencoder to generate the future human poses given a reference pose and the gloss label. Moreover, an image synthesis module accepted as input the current frame and the next pose to predict the next frame of the video using a U-Net based architecture with a CNN and an LSTM. Furthermore, it extracted regions of interest to improve details, such as the hands, which were crucial for generating high-quality sign language videos. This approach was able to synthesize realistic signs with naturally evolving hand shapes.

6. Applications

The advances in sign language capturing, recognition and representation have led to the development of several related applications. Each application can be compatible either with desktop computers or with android and iOS smartphones, as it is illustrated in Table 5 . The majority of the methods use one or two CNN models integrated to their applications. The use of lightweight CNN models ensures the real-time performance of the applications.

Characteristics of sign language applications.

MethodOperating SystemSign LanguageScenario
Liang et al. [ ]Windows desktopBritishDementia screening
Zhou et al. [ ]iOSHong KongTranslation
Ozarkar et al. [ ]AndroidIndianTranslation
Joy et al. [ ]AndroidIndianLearning
Paudyal et al. [ ]AndroidAmericanLearning
Luccio et al. [ ]AndroidMultipleLearning
Chaikaew et al. [ ]Android, iOSThaiLearning
Ku et al. [ ]-AmericanTranslation
Potamianos et al. [ ]-GreekLearning
Lee et al. [ ]-KoreanTranslation
Schioppo et al. [ ]-AmericanLearning
Bansal et al. [ ]-AmericanLearning
Quandt et al. [ ]-AmericanLearning

Liang et al. in [ 117 ], introduced an automatic toolkit to recognize early stages of dementia among British Sign Language (BSL) users. Hand trajectory data, facial data and elbow distribution data were employed for feature extraction. The data were extracted using OpenPose and the dlib libraries. The final decision, whether the user was healthy or not, was taken by a CNN model. Zhou et al. in [ 118 ], created a Hong Kong sign language recognition platform, consisting of a mobile application and a Jetson Nano [ 130 ]. The mobile application was the front-end of the platform that preprocesses the sign language video. After the preprocessing, the video was transferred to the Jetson Nano that translates the video into spoken language, using a pre-trained deep learning model. Moreover, the authors created a Hong Kong sign language dataset for the purposes of the study. However, the method provides only word-level translation and predicts a relatively small vocabulary size. Furthermore, Ku et al. in [ 124 ], employed the 2d camera of the smartphone to record the signer. Hand skeleton information was extracted by OpenPose and a CNN model identified the meaning of the sign. The user could also choose to translate a pre-recorded video. However, very few gestures are recognised (three) and only finger positions are employed for feature extraction and not the entire hand. Moreover, the application does not run in real-time. On the other hand, Ozarkar et al. in [ 119 ], implemented a smartphone application consisting of three modules. The sound classification module detected and classified input sounds and alerted the user through vibrations. The gesture recognition module recognized the input Indian sign language video and converted it to natural language. In addition, the Multilingual Translation Module could either convert text to speech in different Indian regional languages or convert speech to text. Some limitations of the method are the performance degradation when more than one people appear in front of the camera, as well as the sensitivity of the sound classification module in noisy environments. Finally, Lee et al. in [ 126 ], described multiple technologies that could be integrated to a smartphone and ease the communication between speaking and hearing-impaired people. These technologies were: Text-To-Speech (TTS), Speech-To-Text (STT), Augmentative and Alternative Communication (AAC) and motion recognition.

Numerous educational oriented applications employing SLR have been also developed. These applications aim to help someone to learn or practice SL. Potamianos et al. in [ 125 ], presented a summary of the SL-ReDu project. The goal of the project was to teach the Greek sign language as a second language through recognition. The educational process was supported by self-monitoring and objective learning of the learners. Furthermore, a deep learning-based approach for isolated sign recognition of GSL was introduced. On the other hand, Joy et al. in [ 120 ], proposed a mobile application that could be used as a visual dictionary for children. It consisted of two modules: an object detection module and a word recognition module. The former enabled the user to select an object and the application displayed the corresponding sign. The latter took as input a picture of a text and it demonstrated the corresponding sign. However, the word recognition module is limited to translate a maximum number of 950 characters from a text. In addition, there are delays in loading sign animation videos due to the limited number of videos that can be stored on the mobile device. Moreover, Paudyal et al. in [ 121 ], designed a smartphone application that provides feedback to a sign language learner based on location, movement, orientation and hand-shape of his signs. A dataset was also created from 100 learners, for 25 American Sign Language (ASL) signs. However, the system does not perform continuous SLR. Schioppo et al. in [ 127 ], created a virtual environment for learning sign language, employing a virtual reality headset. A Leap Motion sensor was attached to the headset. The system was evaluated on the 26 letters of the alphabet in ASL. Luccio et al. in [ 122 ], employed an Elf Sandbot robot [ 131 ] to help people with hearing impairments to learn sign language. Two smartphone and tablet applications were also developed, with the first one controlling the movement of the robot and the second one taking a verbal or textual input of a word or sentence, translating it to sign language and demonstrating the corresponding video. Furthermore, Chaikaew et al. in [ 123 ], introduced an application that could help the communication of hearing-impaired people who want to learn the Thai sign language. The learners were able to choose the preferred vocabulary and practice with animation. Bansal et al. in [ 128 ], designed a game aiming to help Deaf children that lack continuous access to sign language, using only a high resolution camera and pose estimation software. The learner was asked to describe a scene and if the description was correct, he/she advanced to the next scene. Moreover, a dataset with RGB and depth features was created from adults with little experience with ASL. Nevertheless, the dataset consists of very few data to effectively train a deep learning model. Finally, Quandt et al. in [ 129 ], designed an avatar who served as the teacher of a virtual environment in order to teach introductory ASL to a novice signer. The users could also see a digital representation of their hands due to the usage of LEAP Motion. However, the system could not capture signs that involved touching a specific part of the body or signs that involved body part occlusion.

7. Conclusions and Future Directions

In this paper, the broad spectrum of AI technologies in the field of sign language is covered. Starting from sign language capturing methods for the collection of sign language data and moving on to sign language recognition and representation techniques for the identification and translation of sign language, this review highlights all important technologies for the construction of a complete AI-based sign language system. Additionally, it explores the in-between relations among the AI technologies and presents their advantages and challenges. Finally, it presents groundbreaking sign language applications that facilitate the communication between hearing-impaired and speaking people, as well as enable the social inclusion of hearing-impaired people in their everyday life. The aim of this review is to familiarize researchers with sign language technologies and assist them towards developing better approaches.

In the field of sign language capturing, it is essential to select an optimal sensor for capturing signs for a task that highly depends on various constraints (e.g., cost, speed, accuracy, etc.). For instance, wearable sensors (i.e., gloves) are expensive and capture only hand joints and arm movements, while in recognition applications, the user is required to use gloves. On the other hand, camera sensors, such as web or smartphone cameras, are inexpensive and capture the most substantial information, like the face and the body posture, which are crucial for sign language.

Concerning CSLR approaches, most of the existing works adopt 2D CNNs with temporal convolutional networks or recurrent neural networks that use video as input. In general, 2D methods have lower training complexity compared to 3D architectures and produce better CSLR performance. Moreover, it is experimentally shown that multi-modal architectures that utilize optical flow or human pose information, achieve slightly higher recognition rates than unimodal methods. In addition, CSLR performance on datasets with large vocabularies of more than 1000 words, such as Phoenix-2014, or datasets with unseen words on the test sets, such as CSL Split 2 and GSL SD, is far from perfect. Furthermore, ISLR methods have been extensively explored and have achieved high recognition rates on large-scale datasets. However, they are not suitable for real-life applications since they are trained to detect and classify isolated signs on pre-segmented videos.

Sign language translation methods have shown promising results although they are not exhaustively explored. The majority of the SLT methods adopt architectures from the field of neural machine translation and video captioning. These approaches are of great importance, since they translate sign language into spoken counterparts and can be used to facilitate the communication between the Deaf community and other groups. To this end, this research field requires additional attention from the research community.

Sign language representation approaches adopt either 3D avatars or video generation architectures. 3D animations require manual design of the movement and the position of each joint of the avatar, which is very time-consuming. In addition, it is extremely difficult to generate smooth and realistic animations of the fine grained movements that compose a sign, without the use of sophisticated motion capturing systems/technologies that employ multiple cameras and specialised wearable sensors. On the other hand, recent deep learning methods for sign language production have shown promising results at synthesizing sign language videos automatically. Besides, they can generate realistic videos using a reference image or video from a human, which are also preferable from the Deaf community instead of avatars.

Regarding the sign language applications, they are mostly developed to be integrated in a smartphone operating system and perform SL translation or recognition. A discrete category is the educational oriented applications, which are very useful for anyone with little or no knowledge of sign language. In order to create better and more easily accessible applications, the research should focus on the development of more robust and less computational expensive AI models, along with the further improvement of the existing software for integration of the AI models into smart devices.

Figure 3 is designed to provide objective and subjective comparisons of AI technologies and DNN architectures for sign language as seen from the perspective and the experience of the authors in the field. More specifically, Figure 3 a presents and compares the characteristics of the different AI technologies for sign language. Volume of works is used to measure the number of published papers for each sign language technology and it is calculated based on the results of the query search in the databases. Challenges is used to subjectively measure the difficulty in accurately dealing with each sign language technology and it is based on the performance of the methods on the specific area. Finally, future potential is used to express the view of the authors on which sign language technology has the most potential to deliver future research works.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05843-g003.jpg

Radar charts showcasing the findings of this survey regarding ( a ) the literature methods for CSLR, ISLR and SLP and ( b ) the characteristics of each AI sign language technology.

From the chart in Figure 3 a, it can be seen that most existing works deal with sign language recognition, while sign language capturing and translation methods are still not thoroughly explored. It is strongly believed that these research areas should be explored more in future works. Furthermore, it is assumed that there is still great room for improvement for applications, especially mobile ones, that can assist the Deaf community. Regarding future directions, improvements can still be achieved in the accuracy of sign language recognition and production systems. In addition, advances should be made in the extraction of robust skeletal features, especially in the presence of occlusions, as well as in the realism of avatars. Finally, it is crucial to develop fast and robust sign language applications that can be integrated in the everyday life of hearing-impaired people and facilitate their communication with other people and services.

On the other hand, Figure 3 b draws a comparison between various DNN architectures in terms of the performance of the proposed networks (Accuracy), hardware requirements for inference and training of the proposed networks (Hardware requirements), scope for improvement based on the performance gains and the volume of works (Future potential), computational complexity during training (Training complexity) and the number of recorded datasets that are currently available (Existing datasets). Except for the existing datasets, whose values are based on a search for publicly available datasets, all other metrics presented in the chart of Figure 3 b are calculated based on the study of the review papers and the opinions and experience of the authors. As it can be observed, ISLR methods have high accuracy with small hardware requirements but such methods have been extensively explored resulting in limited future potential. On the other hand, CSLR and SLP methods have high hardware and training requirements, as well as demonstrate significant future potential as there is still great room for improvements in future research works.

Author Contributions

Conceptualization, I.P., C.C., D.K., K.D. and P.D.; Formal analysis, I.P., C.C., D.K., K.D. and P.D.; Funding acquisition, P.D.; Project administration, P.D.; Supervision, K.D.; Writing—original draft, I.P., C.C., D.K.; Writing—review and editing, K.D. and P.D. All authors have read and agreed to the published version of the manuscript.

This research was funded by the Greek General Secretariat of Research and Technology under contract T1E Δ K-02469 EPIKOINONO.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Conflicts of interest.

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Advertisement

Advertisement

Sign language : a systematic review on classification and recognition

  • Published: 21 February 2024
  • Volume 83 , pages 77077–77127, ( 2024 )

Cite this article

sign language recognition literature review

  • S Renjith   ORCID: orcid.org/0009-0005-2803-2703 1 &
  • Rashmi Manazhy 2  

932 Accesses

5 Citations

Explore all metrics

Sign language serves as an alternative communication mode for those with limitations in hearing and speaking abilities. In India, an estimated population of around 2.7 million people have hearing or speech impairments. Among this population, a significant majority of 98% use sign language as their primary mode of communication. Unfortunately, the limited availability of human interpreters is a considerable obstacle in recognizing and identifying diverse sign languages. To tackle this issue, the present study aims to undertake a thorough examination of various computational methodologies used in various geographical areas for the purpose of categorizing and identifying discrete sign languages. Among the pool of 587 research papers deemed appropriate for qualitative assessment, a total of 95 studies were specifically focused on the categorization and identification of sign language using artificial intelligence-based techniques. The study examines prior research involving deep learning and machine learning methods for a systematic literature review of Sign Language Recognition (SLR). The categorization is made based on language, and the study investigates several facets, including sign type, signing modes, processing techniques, classification methodologies, and evaluation measures. The study reveals that extensive studies were carried out in Chinese, Arabic, and American sign languages. The findings of this review show that among the machine learning techniques, Support Vector Machine (SVM) exhibited higher performance measures, while Convolutional Neural Networks (CNN) exhibited higher performance among the deep learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

sign language recognition literature review

Similar content being viewed by others

sign language recognition literature review

Sign language detection using convolutional neural network

sign language recognition literature review

Indian Sign Language Recognition: A Comparative Study

sign language recognition literature review

Sign Language Recognition: A Comparative Analysis of Deep Learning Models

Explore related subjects.

  • Artificial Intelligence

Data Availability

All data generated or analysed during this study are included in this published article

Muradova L, Walker H, Colli F (2020) Climate change communication and public engagement in interpersonal deliberative settings: Evidence from the irish citizens’ assembly. Climate Policy 20(10):1322–1335

Google Scholar  

Organization WH (2021) World Report on Hearing. https://www.who.int/publications/i/item/world-report-on-hearing

Adeyanju I, Bello O, Adegboye M (2021) Machine learning methods for sign language recognition: A critical review and analysis. Intell Syst Appl 12:200056

Konda PLS, Kondapi A, Jesudoss A (2021) Innovative gesture-based automation system for home appliances. In: Cognitive informatics and soft computing: proceeding of CISC 2020, pp 883–892. Springer

Ravi Kumar R, Mohmmad S, Kothandaraman D, Ramesh D (2022) Static hand gesture recognition for asl using matlab platform. In: Computer communication, networking and IoT: proceedings of 5th ICICC 2021, vol 2, pp 379–392. Springer

El-Alfy E-SM, Luqman H (2022) A comprehensive survey and taxonomy of sign language research. Eng Appl Artif Intell 114:105198

Virrey RA, Liyanage CDS, MbPH Petra, Abas PE (2019) Visual data of facial expressions for automatic pain detection. J Vis Commun Image Represent 61:209–217

Farooq U, Rahim MSM, Sabir N, Hussain A, Abid A (2021) Advances in machine translation for sign language: approaches, limitations, and challenges. Neural Comput Appl 33(21):14357–14399

Subasi A, Khateeb K, Brahimi T, Sarirete A (2020) Human activity recognition using machine learning methods in a smart healthcare environment. In: Innovation in health informatics, pp123–144. Elsevier

Yang J, Wilson JP, Gupta S (2019) Diver gesture recognition using deep learning for underwater human-robot interaction. In: OCEANS 2019 MTS/IEEE SEATTLE, pp 1–5. IEEE

Aloysius N, Geetha M (2020) Understanding vision-based continuous sign language recognition. Multimed Tools Appl 79(31–32):22177–22209

Renjith S, Manazhy R (2023) A comparative analysis of islrs using cnn and vit. In: International conference on iot based control networks and intelligent systems, pp 1–9. Springer

Charan MGKS, Poorna S, Anuraj K, Praneeth CS, Sumanth PS, Gupta CVSP, Srikar K (2022) Sign language recognition using cnn and cgan. In: Inventive systems and control: proceedings of ICISC 2022, pp 489–502. Springer

Renjith S, Manazhy R (2023) Indian sign language recognition: A comparative analysis using cnn and rnn models. In: 2023 International Conference on circuit power and computing technologies (ICCPCT), pp 1573–1576. https://doi.org/10.1109/ICCPCT58313.2023.10245525

Shah SMS, Naqvi HA, Khan JI, Ramzan M, Khan HU et al (2018) Shape based pakistan sign language categorization using statistical features and support vector machines. IEEE Access 6:59242–59252

Raees M, Ullah S, Rahman SU, Rabbi I (2016) Image based recognition of pakistan sign language. J Eng Res 4:1–21

Dewani A, Bhatti S, Memon MA, Arif WA, Arain Q, Zehra SB (2018) Sign language e-learning system for hearing-impaired community of pakistan. Int J Inf Technol 10:225–232

Khan NS, Abid A, Abid K (2020) A novel natural language processing (nlp)-based machine translation model for english to pakistan sign language translation. Cognitive Comput 12:748–765

Imran A, Razzaq A, Baig IA, Hussain A, Shahid S, Rehman T-u (2021) Dataset of pakistan sign language and automatic recognition of hand configuration of urdu alphabet through machine learning. Data in Brief 36:107021

Shah F, Shah MS, Akram W, Manzoor A, Mahmoud RO, Abdelminaam DS (2021) Sign language recognition using multiple kernel learning: A case study of pakistan sign language. Ieee Access 9:67548–67558

Li J, Yin B, Wang L, Kong D (2014) Chinese sign language animation generation considering context. Multimed Tools Appl 71:469–483

Yao D, Jiang M, Huang Y, Abulizi A, Li H (2017) Study of sign segmentation in the text of chinese sign language. Universal Access in the Information Society 16:725–737

Yang X, Chen X, Cao X, Wei S, Zhang X (2016) Chinese sign language recognition based on an optimized tree-structure framework. IEEE J Biomed Health Inform 21(4):994–1004

Yu Y, Chen X, Cao S, Zhang X, Chen X (2019) Exploration of chinese sign language recognition using wearable sensors based on deep belief net. IEEE J Biomed Health Inform 24(5):1310–1320

Huang S, Mao C, Tao J, Ye Z (2018) A novel chinese sign language recognition method based on keyframe-centered clips. IEEE Signal Process Lett 25(3):442–446

Jiang X, Lu M, Wang S-H (2020) An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of chinese sign language. Multimed Tools Appl 79:15697–15715

Xiao Q, Qin M, Yin Y (2020) Skeleton-based chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw 125:41–55

Wang F, Li C, Zeng Z, Xu K, Cheng S, Liu Y, Sun S (2021) Cornerstone network with feature extractor: a metric-based few-shot model for chinese natural sign language. Appl Intell 51:7139–7150

Gao L, Li H, Liu Z, Liu Z, Wan L, Feng W (2021) Rnn-transducer based chinese sign language recognition. Neurocomputing 434:45–54

Huang S, Ye Z (2021) Boundary-adaptive encoder with attention method for chinese sign language recognition. IEEE Access 9:70948–70960

Tolba MF, Samir A, Aboul-Ela M (2013) Arabic sign language continuous sentences recognition using pcnn and graph matching. Neural Comput Appl 23:999–1010

Mohandes M, Deriche M, Johar U, Ilyas S (2012) A signer-independent arabic sign language recognition system using face detection, geometric features, and a hidden markov model. Comput Electr Eng 38(2):422–433

Mohandes M, Deriche M, Liu J (2014) Image-based and sensor-based approaches to arabic sign language recognition. IEEE Trans Hum-Mach Syst 44(4):551–557

Samir Elons A, Abull-ela M, Tolba MF (2013) Neutralizing lighting non-homogeneity and background size in pcnn image signature for arabic sign language recognition. Neural Comput Appl 22(Suppl 1):47–53

Ibrahim NB, Selim MM, Zayed HH (2018) An automatic arabic sign language recognition system (arslrs). Journal of King Saud University-Computer and Information Sciences 30(4):470–477

Hassan M, Assaleh K, Shanableh T (2019) Multiple proposals for continuous arabic sign language recognition. Sensing and Imaging 20:1–23

Deriche M, Aliyu SO, Mohandes M (2019) An intelligent arabic sign language recognition system using a pair of lmcs with gmm based classification. IEEE Sens J 19(18):8067–8078

Hisham B, Hamouda A (2021) Arabic sign language recognition using ada-boosting based on a leap motion controller. Int J Inf Technol 13:1221–1234

Elatawy SM, Hawa DM, Ewees AA, Saad AM (2020) Recognition system for alphabet arabic sign language using neutrosophic and fuzzy c-means. Educ Inf Technol 25:5601–5616

Bencherif MA, Algabri M, Mekhtiche MA, Faisal M, Alsulaiman M, Mathkour H, Al-Hammadi M, Ghaleb H (2021) Arabic sign language recognition system using 2d hands and body skeleton data. IEEE Access 9:59612–59627

Ansari ZA, Harit G (2016) Nearest neighbour classification of indian sign language gestures using kinect camera. Sadhana 41:161–182

MathSciNet   Google Scholar  

Raheja J, Mishra A, Chaudhary A (2016) Indian sign language recognition using svm. Pattern Recognit Image Anal 26:434–441

Kishore P, Kumar DA, Sastry ACS, Kumar EK (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sens J 18(8):3327–3337

Joy J, Balakrishnan K, Sreeraj M (2019) Signquiz: a quiz based tool for learning fingerspelled signs in indian sign language using aslr. IEEE Access 7:28363–28371

Kumar EK, Kishore P, Kumar DA, Kumar MTK (2021) Early estimation model for 3d-discrete indian sign language recognition using graph matching. Journal of King Saud University-Computer and Information Sciences 33(7):852–864

Gupta R, Kumar A (2020) Indian sign language recognition using wearable sensors and multi-label classification. Comput Electr Eng 90:106898

Raghuveera T, Deepthi R, Mangalashri R, Akshaya R (2020) A depth-based indian sign language recognition using microsoft kinect. Sādhanā 45:1–13

Joy J, Balakrishnan K, Madhavankutty S (2021) A novel web based dictionary framework for indian sign language. SN Comput Sci 2:1–7

Oz C, Leu MC (2011) American sign language word recognition with a sensory glove using artificial neural networks. Eng Appl Artif Intell 24(7):1204–1213

Wu J, Sun L, Jafari R (2016) A wearable system for recognizing american sign language in real-time using imu and surface emg sensors. IEEE J Biomed Health Inform 20(5):1281–1290

Aly W, Aly S, Almotairi S (2019) User-independent american sign language alphabet recognition based on depth image and pcanet features. IEEE Access 7:123138–123150

Tao W, Leu MC, Yin Z (2018) American sign language alphabet recognition using convolutional neural networks with multiview augmentation and inference fusion. Eng Appl Artif iIntell 76:202–213

Beena M, Namboodiri A, Thottungal R (2020) Hybrid approaches of convolutional network and support vector machine for american sign language prediction. Multimed Tools Appl 79:4027–4040

Lee CK, Ng KK, Chen C-H, Lau HC, Chung S, Tsoi T (2021) American sign language recognition and training method with recurrent neural network. Expert Syst Appl 167:114403

Rahman MM, Islam MS, Rahman MH, Sassi R, Rivolta MW, Aktaruzzaman M (2019) A new benchmark on american sign language recognition using convolutional neural network. In: 2019 International conference on sustainable technologies for industry 4.0 (STI), pp 1–6. IEEE

Mahdikhanlou K, Ebrahimnezhad H (2020) Multimodal 3d american sign language recognition for static alphabet and numbers using hand joints and shape coding. Multimed Tools Appl 79(31–32):22235–22259

Jain V, Jain A, Chauhan A, Kotla SS, Gautam A (2021) American sign language recognition using support vector machine and convolutional neural network. Int J Inf Technol 13:1193–1200

Sharma S, Kumar K (2021) Asl-3dcnn: American sign language recognition technique using 3-d convolutional neural networks. Multimed Tools Appl 80(17):26319–26331

Islam S, Mousumi SSS, Rabby ASA, Hossain SA, Abujar S (2018) A potent model to recognize bangla sign language digits using convolutional neural network. Procedia Comput Sci 143:611–618

Rahaman MA, Jasim M, Ali MH, Hasanuzzaman M (2020) Bangla language modeling algorithm for automatic recognition of hand-sign-spelled bangla sign language. Front Comput Sci 14:1–20

Alam MS, Tanvir M, Saha DK, Das SK (2021) Two dimensional convolutional neural network approach for real-time bangla sign language characters recognition and translation. SN Comput Sci 2:1–13

Nihal RA, Rahman S, Broti NM, Deowan SA (2021) Bangla sign alphabet recognition with zero-shot and transfer learning. Pattern Recognit Lett 150:84–93

Almeida SGM, Guimarães FG, Ramírez JA (2014) Feature extraction in brazilian sign language recognition based on phonological structure and using rgb-d sensors. Expert Syst Appl 41(16):7259–7271

Cerna LR, Cardenas EE, Miranda DG, Menotti D, Camara-Chavez G (2021) A multimodal libras-ufop brazilian sign language dataset of minimal pairs using a microsoft kinect sensor. Expert Syst Appl 167:114179

Rezende TM, Almeida SGM, Guimarães FG (2021) Development and validation of a brazilian sign language database for human gesture recognition. Neural Comput Appl 33(16):10449–10467

Eryiğit C, Köse H, Kelepir M, Eryiğit G (2016) Building machine-readable knowledge representations for turkish sign language generation. Knowl Based Syst 108:179–194

Sincan OM, Keles HY (2020) Autsl: A large scale multi-modal turkish sign language dataset and baseline methods. IEEE Access 8:181340–181355

Katılmış Z, Karakuzu C (2021) Elm based two-handed dynamic turkish sign language (tsl) word recognition. Expert Syst Appl 182:115213

Karami A, Zanj B, Sarkaleh AK (2011) Persian sign language (psl) recognition using wavelet transform and neural networks. Expert Syst Appl 38(3):2661–2667

Zadghorban M, Nahvi M (2018) An algorithm on sign words extraction and recognition of continuous persian sign language based on motion and shape features of hands. Pattern Anal Appl 21:323–335

Khomami SA, Shamekhi S (2021) Persian sign language recognition using imu and surface emg sensors. Measurement 168:108471

Tze FWH, Kin KTT et al (2011) Feature extraction from 2d gesture trajectory in malaysian sign language recognition. In: 2011 4th International Conference on Mechatronics (ICOM), pp 1–6. IEEE

Shukor AZ, Miskon MF, Jamaluddin MH, Ali F, Asyraf MF, Bahar MB et al (2015) A new data glove approach for malaysian sign language detection. Procedia Comput Sci 76:60–67

Abdullah A, Abdul-Kadir NA, Harun FKC (2020) An optimization of imu sensors-based approach for malaysian sign language recognition. In: 2020 6th International conference on computing engineering and design (ICCED), pp 1–4. IEEE

Alrubayi AH, Ahmed MA, Zaidan A, Albahri AS, Zaidan B, Albahri OS, Alamoodi AH, Alazab M (2021) A pattern recognition model for static gestures in malaysian sign language based on machine learning techniques. Comput ElectR Eng 95:107383

Sosa-Jiménez CO, Ríos-Figueroa HV, Rechy-Ramírez EJ, Marin-Hernandez A, González-Cosío ALS (2017) Real-time mexican sign language recognition. In: 2017 IEEE international autumn meeting on power, electronics and computing (ROPEC), pp 1–6. IEEE

Espejel-Cabrera J, Cervantes J, García-Lamont F, Castilla JSR, Jalili LD (2021) Mexican sign language segmentation using color based neuronal networks to detect the individual skin color. Expert Syst Appl 183:115295

Mejía-Peréz K, Córdova-Esparza D-M, Terven J, Herrera-Navarro A-M, García-Ramírez T, Ramírez-Pedraza A (2022) Automatic recognition of mexican sign language using a depth camera and recurrent neural networks. Appl Sci 12(11):5523

D RAJ R, JASUJA A (2018) British sign language recognition using hog. In: 2018 IEEE international students’ conference on electrical, electronics and computer science (SCEECS), pp 1–4. IEEE

Young A, Napier J, Oram R (2019) The translated deaf self, ontological (in) security and deaf culture. The Translator 25(4):349–368

Belk RA, Pilling M, Rogers KD, Lovell K, Young A (2016) The theoretical and practical determination of clinical cut-offs for the british sign language versions of phq-9 and gad-7. BMC Psychiatry 16:1–12

Admasu YF, Raimond K (2010) Ethiopian sign language recognition using artificial neural network. In: 2010 10th International conference on intelligent systems design and applications, pp 995–1000. IEEE

Kouremenos D, Ntalianis K, Kollias S (2018) A novel rule based machine translation scheme from greek to greek sign language: Production of different types of large corpora and language models evaluation. Comput Speech Lang 51:110–135

Nakjai P, Katanyukul T (2019) Hand sign recognition for thai finger spelling: An application of convolution neural network. J Signal Process Syst 91:131–146

Vega AR, Vasquez A, Amador W, Rojas A (2018) Deep learning for the recognition of facial expression in the colombian sign language. Ann Phys Rehabil Med 61:96

Fernando P, Wimalaratne P (2016) Sign language translation approach to sinhalese language. GSTF J Comput (JoC) 5:1–9

Naz N, Sajid H, Ali S, Hasan O, Ehsan MK (2023) Signgraph: An efficient and accurate pose-based graph convolution approach toward sign language recognition. IEEE Access 11:19135–19147

Al-Barham M, Alsharkawi A, Al-Yaman M, Al-Fetyani M, Elnagar A, SaAleek AA, Al-Odat M (2023) RGB Arabic Alphabets Sign Language Dataset. arXiv https://doi.org/10.48550/ARXIV.2301.11932

Li D, Rodriguez C, Yu X, Li H (2020) Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1459–1469

Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive hmm. In: 2016 IEEE international conference on multimedia and expo (ICME), pp 1–6. IEEE

Sridhar A, Ganesan RG, Kumar P, Khapra M (2020) Include: A large scale dataset for indian sign language recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 1366–1375

Podder KK, Chowdhury M, Mahbub ZB, Kadir M (2020) Bangla sign language alphabet recognition using transfer learning based convolutional neural network. Bangladesh J Sci Res, pp 31–33

Azar SG, Seyedarabi H (2020) Trajectory-based recognition of dynamic persian sign language using hidden markov model. Comput Speech Lang 61:101053

Johari RT, Ramli R, Zulkoffli Z, Saibani N (2023) Mywsl: Malaysian words sign language dataset. Data in Brief, pp 109338

Martínez-Sánchez V, Villalón-Turrubiates I, Cervantes-Álvarez F, Hernández-Mejía C (2023) Exploring a novel mexican sign language lexicon video dataset. Multimodal Technol Interact 7(8):83

Albanie S, Varol G, Momeni L, Bull H, Afouras T, Chowdhury H, Fox N, Woll B, Cooper R, McParland A et al (2021) Bbc-oxford british sign language dataset. arXiv:2111.03635

Bahia NK, Rani R (2023) Multi-level taxonomy review for sign language recognition: Emphasis on indian sign language. ACM Trans Asian Low-Resour Lang Inf Process 22(1):1–39

Amorim CC, Zanchettin C (2021) Asl-skeleton3d and asl-phono: Two novel datasets for the american sign language

Download references

Acknowledgements

Dr Sumi Suresh M S carefully reviewed the paper, correcting both grammatical and technical errors to improve its overall quality.

The authors declare that no funding is provided for the preparation of the manuscript.

Author information

Authors and affiliations.

Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India

Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India

Rashmi Manazhy

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to S Renjith .

Ethics declarations

Competing interest.

The authors declare that they have no conflict of interest.

Ethical and informed consent for data used

By submitting this manuscript, the authors affirm that appropriate informed consent was obtained for the use of the data in this study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Renjith, S., Manazhy, R. Sign language : a systematic review on classification and recognition. Multimed Tools Appl 83 , 77077–77127 (2024). https://doi.org/10.1007/s11042-024-18583-4

Download citation

Received : 11 May 2023

Revised : 08 January 2024

Accepted : 03 February 2024

Published : 21 February 2024

Issue Date : September 2024

DOI : https://doi.org/10.1007/s11042-024-18583-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sign language
  • Classification
  • Recognition
  • Systematic review
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review

Affiliations.

  • 1 Research Laboratory LaTICE, University of Tunis, Tunis 1008, Tunisia.
  • 2 Mada-Assistive Technology Center Qatar, Doha P.O. Box 24230, Qatar.
  • 3 Arab League Educational, Cultural, and Scientific Organization, Tunis 1003, Tunisia.
  • PMID: 37837173
  • PMCID: PMC10574929
  • DOI: 10.3390/s23198343

The analysis and recognition of sign languages are currently active fields of research focused on sign recognition. Various approaches differ in terms of analysis methods and the devices used for sign acquisition. Traditional methods rely on video analysis or spatial positioning data calculated using motion capture tools. In contrast to these conventional recognition and classification approaches, electromyogram (EMG) signals, which measure muscle electrical activity, offer potential technology for detecting gestures. These EMG-based approaches have recently gained attention due to their advantages. This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques. These techniques were categorized in this article based on their respective methodologies. The survey discussed the progress and challenges in sign language recognition systems based on surface electromyography (sEMG) signals. These systems have shown promise but face issues like sEMG data variability and sensor placement. Multiple sensors enhance reliability and accuracy. Machine learning, including deep learning, is used to address these challenges. Common classifiers in sEMG-based sign language recognition include SVM, ANN, CNN, KNN, HMM, and LSTM. While SVM and ANN are widely used, random forest and KNN have shown better performance in some cases. A multilayer perceptron neural network achieved perfect accuracy in one study. CNN, often paired with LSTM, ranks as the third most popular classifier and can achieve exceptional accuracy, reaching up to 99.6% when utilizing both EMG and IMU data. LSTM is highly regarded for handling sequential dependencies in EMG signals, making it a critical component of sign language recognition systems. In summary, the survey highlights the prevalence of SVM and ANN classifiers but also suggests the effectiveness of alternative classifiers like random forests and KNNs. LSTM emerges as the most suitable algorithm for capturing sequential dependencies and improving gesture recognition in EMG-based sign language recognition systems.

Keywords: electromyographic signal; sEMG; sign language recognition; systematic review.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Number of Papers Targeting Each…

Number of Papers Targeting Each Muscle During Data Collection.

The EMG signal generation principle.

Arm muscles.

Review steps.

Categories of recognized gestures by…

Categories of recognized gestures by sign language.

Data acquisition devices.

Subject number vs device frequency.…

Subject number vs device frequency. X = subject number, Y = device frequency,…

Class number vs dataset size.…

Class number vs dataset size. X = number of classes, Y = size…

Evolution of the used features…

Evolution of the used features in EMG-based sign language recognition.

Evolution of approaches in EMG-based…

Evolution of approaches in EMG-based sign language recognition.

Comparative analysis of accuracy in…

Comparative analysis of accuracy in the selected papers relative to dataset size, number…

Similar articles

  • Multi-Category Gesture Recognition Modeling Based on sEMG and IMU Signals. Jiang Y, Song L, Zhang J, Song Y, Yan M. Jiang Y, et al. Sensors (Basel). 2022 Aug 5;22(15):5855. doi: 10.3390/s22155855. Sensors (Basel). 2022. PMID: 35957417 Free PMC article.
  • A Novel Phonology- and Radical-Coded Chinese Sign Language Recognition Framework Using Accelerometer and Surface Electromyography Sensors. Cheng J, Chen X, Liu A, Peng H. Cheng J, et al. Sensors (Basel). 2015 Sep 15;15(9):23303-24. doi: 10.3390/s150923303. Sensors (Basel). 2015. PMID: 26389907 Free PMC article.
  • Dual Stream Long Short-Term Memory Feature Fusion Classifier for Surface Electromyography Gesture Recognition. Zhang K, Badesa FJ, Liu Y, Ferre Pérez M. Zhang K, et al. Sensors (Basel). 2024 Jun 4;24(11):3631. doi: 10.3390/s24113631. Sensors (Basel). 2024. PMID: 38894423 Free PMC article.
  • Review on electromyography based intention for upper limb control using pattern recognition for human-machine interaction. Asghar A, Jawaid Khan S, Azim F, Shakeel CS, Hussain A, Niazi IK. Asghar A, et al. Proc Inst Mech Eng H. 2022 May;236(5):628-645. doi: 10.1177/09544119221074770. Epub 2022 Feb 4. Proc Inst Mech Eng H. 2022. PMID: 35118907 Review.
  • Wearable Sensor-Based Sign Language Recognition: A Comprehensive Review. Kudrinko K, Flavin E, Zhu X, Li Q. Kudrinko K, et al. IEEE Rev Biomed Eng. 2021;14:82-97. doi: 10.1109/RBME.2020.3019769. Epub 2021 Jan 26. IEEE Rev Biomed Eng. 2021. PMID: 32845843 Review.
  • Machine learning for hand pose classification from phasic and tonic EMG signals during bimanual activities in virtual reality. Simar C, Colot M, Cebolla AM, Petieau M, Cheron G, Bontempi G. Simar C, et al. Front Neurosci. 2024 Apr 26;18:1329411. doi: 10.3389/fnins.2024.1329411. eCollection 2024. Front Neurosci. 2024. PMID: 38737097 Free PMC article.
  • Aviles M., Rodríguez-Reséndiz J., Ibrahimi D. Optimizing EMG Classification through Metaheuristic Algorithms. Technologies. 2023;11:87. doi: 10.3390/technologies11040087. - DOI
  • Aviles M., Sánchez-Reyes L.M., Fuentes-Aguilar R.Q., Toledo-Pérez D.C., Rodríguez-Reséndiz J. A Novel Methodology for Classifying EMG Movements Based on SVM and Genetic Algorithms. Micromachines. 2022;13:2108. doi: 10.3390/mi13122108. - DOI - PMC - PubMed
  • Toledo-Pérez D.C., Martínez-Prado M.A., Gómez-Loenzo R.A., Paredes-García W.J., Rodríguez-Reséndiz J. A study of movement classification of the lower limb based on up to 4-EMG channels. Electronics. 2019;8:259. doi: 10.3390/electronics8030259. - DOI
  • Toledo-Pérez D.C., Rodríguez-Reséndiz J., Gómez-Loenzo R.A., Jauregui-Correa J.C. Support vector machine-based EMG signal classification techniques: A review. Appl. Sci. 2019;9:4402. doi: 10.3390/app9204402. - DOI
  • Amor A.B.H., Ghoul O., Jemni M. Toward sign language handshapes recognition using Myo armband 2017 6th; Proceedings of the International Conference on Information and Communication Technology and Accessibility (ICTA) 2017; Muscat, Oman. 19–21 December 2017; Piscataway, NJ, USA: IEEE; 2017. pp. 1–6.

Publication types

  • Search in MeSH

Related information

Grants and funding, linkout - more resources, full text sources.

  • Europe PubMed Central
  • PubMed Central

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: sign language production: a review.

Abstract: Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. This survey aims to briefly summarize recent achievements in SLP, discussing their advantages, limitations, and future directions of research.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. (PDF) A Survey of Sign Language Recognition Systems

    sign language recognition literature review

  2. (PDF) Sign Language Recognition, Generation, and Translation: An

    sign language recognition literature review

  3. (PDF) A review of hand gesture and sign language recognition techniques

    sign language recognition literature review

  4. (PDF) Sign Language Recognition Systems: A Decade Systematic Literature

    sign language recognition literature review

  5. (PDF) Sign Language Recognition, Generation, and Modelling: A Research

    sign language recognition literature review

  6. Sign Language Recognition Techniques- A Review

    sign language recognition literature review

VIDEO

  1. Sign language recognition using CNN || Final year project 2019-2023 || Group 42 || NIET || AKTU

  2. Sign Language Recognition using Machine Learning

  3. Arabic Sign Language Recognition

  4. Sign Language: Tracking and Feature Extraction

  5. Sign Language Recognition Mobile Application

  6. Large scale visual speech recognition (LSVSR)

COMMENTS

  1. Sign Language Recognition Systems: A Decade Systematic Literature Review

    Despite the importance of sign language recognition systems, there is a lack of a Systematic Literature Review and a classification scheme for it. This is the first identifiable academic literature review of sign language recognition systems. It provides an academic database of literature between the duration of 2007-2017 and proposes a classification scheme to classify the research articles ...

  2. Review Sign Language Recognition: A Deep Survey

    The remainder of this paper is organized as follows. Section 2 includes a brief review of Deep Learning algorithms. Section 3 presents a taxonomy of the sign language recognition area. Hand sign language, face sign language, and human sign language literature are reviewed in Sections 4, 5, and 6, respectively.Section 7 presents the recent models in continuous sign language recognition.

  3. Sign Language Recognition Systems: A Decade Systematic Literature Review

    This is the first identifiable academic literature review of sign language recognition systems. It provides an academic database of literature between the duration of 2007-2017 and proposes a ...

  4. Machine learning methods for sign language recognition: A critical

    The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. ... There are limited numbers of researchers who have used mean filters in sign language ...

  5. A survey on sign language literature

    2. Sign language recognition. Sign language recognition poses significant challenges due to the intricate hand gestures, body postures, and facial expressions involved, which often incorporate rapid and complex movements (Jiang et al., 2021).Hand gesture recognition, in particular, is a complex aspect of sign language recognition, characterized by high inter-class similarities, significant ...

  6. Recent progress in sign language recognition: a review

    Sign language is a predominant form of communication among a large group of society. The nature of sign languages is visual, making them distinct from spoken languages. Unfortunately, very few able people can understand sign language making communication with the hearing-impaired infeasible. Research in the field of sign language recognition (SLR) can help reduce the barrier between deaf and ...

  7. Sign Language Recognition Using the Electromyographic Signal: A

    This review demonstrates that sign language analysis and recognition, which recognizes signs using EMG signals, is a very recent and emerging area of research. Most of the studies reviewed use both sEMG and IMU data, while a relatively limited number of studies only use sEMG data for sign language gesture recognition.

  8. (PDF) Machine learning methods for sign language recognition: A

    The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect in- telligent systems ...

  9. [2204.03328] A Comprehensive Review of Sign Language Recognition

    A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution ...

  10. Sign language recognition using the fusion of image and hand landmarks

    Sign Language Recognition is a breakthrough for communication among deaf-mute society and has been a critical research topic for years. Although some of the previous studies have successfully ...

  11. Advancements in Sign Language Recognition: A Comprehensive Review and

    This review found that deep learning techniques, such as convolutional and recurrent neural networks, have shown high accuracy in sign language recognition, and their performance in recognizing the variety of signs has steadily improved over time. Additionally, integrating non-manual features has proven pivotal in enhancing recognition accuracy.

  12. Sign Language Recognition Using the Electromyographic Signal: A ...

    This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques.

  13. Artificial Intelligence Technologies for Sign Language

    Previous literature reviews mainly concentrate on specific sign language technologies, such as video-based and sensor-based sign language recognition [3,4,5,6,7] and sign language translation [8,9].Lately, with the development of sign language applications, there are also reviews that presented sign language systems to facilitate hearing-impaired people in teaching and learning, as well as in ...

  14. Wearable Sensor-Based Sign Language Recognition: A Comprehensive Review

    Advancements in technology and machine learning techniques have led to the development of innovative approaches for gesture recognition. This literature review focuses on analyzing studies that use wearable sensor-based systems to classify sign language gestures. A review of 72 studies from 1991 to 2019 was performed to identify trends, best ...

  15. (PDF) Sign Language Recognition

    Sign Language Recognition (SLR) deals with recognizing the hand gestures acquisition and continues till text or speech is generated for corresponding hand gestures. ... LITERATURE REVIEW . 2.1 ...

  16. Sign language : a systematic review on classification and recognition

    Sign language serves as an alternative communication mode for those with limitations in hearing and speaking abilities. In India, an estimated population of around 2.7 million people have hearing or speech impairments. Among this population, a significant majority of 98% use sign language as their primary mode of communication. Unfortunately, the limited availability of human interpreters is a ...

  17. JOURNAL OF LA A Comprehensive Review of Sign Language Recognition

    JOURNAL OF LATEX CLASS FILES, VOL.XX, NO. X, APRIL 2022 1 A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets M. MADHIARASAN 1, Member, IEEE Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, PARTHA PRATIM ROY2, Member, IEEE 2Department of Computer Science and Engineering, Indian Institute of Technology Roorkee

  18. Sign Language Recognition Using the Electromyographic Signal: A

    This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques.

  19. Deep Learning for Sign Language Recognition: Current Techniques

    People with hearing impairments are found worldwide; therefore, the development of effective local level sign language recognition (SLR) tools is essential. We conducted a comprehensive review of automated sign language recognition based on machine/deep learning methods and techniques published between 2014 and 2021 and concluded that the current methods require conceptual classification to ...

  20. Systematic Literature Review: American Sign Language Translator

    Abstract. Sign Language Recognition (SLR) is a relatively popular research area yet contrary to its popularity, the implementation of SLR in daily basis is rare; this is due to the complexity and various resources required. In this literature review, the authors have analyzed various techniques that can be used to implement an automated sign ...

  21. [2103.15910] Sign Language Production: A Review

    Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production ...

  22. PDF Sign Language Recognition Systems: A Decade Systematic Literature Review

    Despite the importance of sign language recognition systems, there is a lack of a Systematic Literature Review and a clas- sication scheme for it. This is the rst identiable academic literature ...

  23. PDF Literature Review on Indian Sign Language Recognition System

    language recognition research are reviewed in this research. Data capture, preprocessing, segmentation, feature extraction, and classifi. ation are all stages of the methodologies that are examined. Each stage of the. bove-mentioned literature review is completed independently. Artificial neural networks, Support Vector Machines and Fuzzy ...

  24. A Review for Sign Language Recognition Techniques

    Recent literature on Indian sign language recognition using Feature extraction has highlighted the importance of selecting appropriate features for accurately recognizing sign language gestures [2].