Glove Detection System on Laboratory Members Using Yolov4

− The use of gloves by laboratory workers has become mandatory in laboratory work intending to maintain the safety of workers from the spread or side effects carried out in the laboratory, but there are still workers who violate the rules by not using gloves when workers are in the laboratory room. This study aims to detect the use of gloves by laboratory workers. The method used in this research is You Only Look Once (YOLO) version 4. YOLOv4 has a system that can complete computer visual tasks in detecting and detecting objects quickly in real time. Based on the results of experiments and testing conducted, the model obtain an Average IoU of 55.56%.


INTRODUCTION
The laboratory is a place of work with a high risk by requiring workers to wear safety attributes.Laboratory work must follow procedures according to general standards because there are many risks for workers in the laboratory, such as the spread of viruses and exposure to the effects of chemicals [1].One of the body parts that must be protected is the hand.Hands can be considered the most frequently used body part by laboratory workers.
Laboratories have general standards related to safety equipment on the attributes used by laboratory workers [2].Safety equipment for laboratory workers can reduce risk, including gloves.Gloves have become a mandatory thing to use, which protects hands from various risks, such as exposure to chemicals or sharp objects so that workers are at risk of experiencing side effects from these chemicals [3].The effects of the spread of chemicals can develop quickly.Negligence still occurs a lot among laboratory workers because workers violate health protocols that have become rules in the laboratory [4].
Research related to glove detection has been studied [5].A safety glove detection system was built for workers who operate power grids.The glove detection system was built using the VGG-16 network-based method.The dataset used is self-built.Another study also examined glove detection to reduce defects in glove production [6].The method used in their research is YOLOv5 which which produced the highest mAP accuracy of 0.9951.
Detected violations of helmets and masks and identified license plates using the YoloV4 method that utilizes Tesseract0OCR [7].Tesseract model training is used to improve license plate character identification.The COCO dataset containing 600 images is used to build a system that obtains mAP results of 93.38% and an F1-Score of 0.77.Their system's shortcomings are the need to add geometric transformations to detect license plates using YoloV4.
Related researched a ball detection system using You Only Look Once (YOLO).The dataset used there are 202 photos divided into 70% training data, 20% validation, and 10% test data [8].YOLOV4 performance evaluation uses a confusion matrix that calculates the results of accuracy, recall, and precision by testing under different conditions to detect an object.They conducted the test in real time with the level of the obstructed thing.The system can detect the ball with objects blocking the ball object by 50%, 60%, and 70%.Testing balls obstructed by other objects by 80%, 90%, and 100%, the system cannot detect the ball.There are areas for improvement in this detection system for testing objects with a distance above 200cm, and the system cannot detect the ball.
The glove detection system on the Personal Protective Equipment (PPE) attribute was also studied by Nurfirmansyah et al. [9].The detection system using You Only Look Once (YOLO) was developed to reduce the negligence of building construction workers by wearing PPE.The system detects PPE objects in real time using a Webcam or CCTV.The dataset used in the training stage consists of 500 images containing five types of PPE and ten classes.They conducted real-time testing using a webcam and CCTV to detect PPE.The results of the tests carried out obtained an accuracy of 80%.
In this study, we built a system that wants to detect objects in the form of gloves as one of the health protocols that must be used during a pandemic.This research has built a detection system using the You Only Look Once (YOLO) version 4 method to detect gloves on laboratory workers.This method is used to obtain optimal accuracy and speed because the YOLO method is an algorithm that can detect objects in real-time [10].The results of this study are expected to help officers to monitor laboratory workers who comply with one of the health protocols by using gloves or not and are expected to be one way to increase laboratory workers' awareness of the importance of using safety protocols at work in the laboratory [11].The image is divided into a grid or small cells of a specific size.Each cell represents a part of the image to be analyzed.After the image is divided into grids, the object detection process begins.YOLOv4 performs object detection using a deep convolutional neural network (CNN) [12].This network has a convolutional layer that contains a feature extraction layer and an object detection layer.The first convolution layer in the YOLOv4 network is responsible for feature extraction from the image [13].It extracts essential features that help in recognizing objects.After feature extraction, the object detection layer predicts the location and class of the object.Each cell in the grid predicts multiple bounding boxes containing the object.The prediction of the object class and confidence score is also performed for each bounding box.After location prediction and object classification are performed, the nonmaximum suppression (NMS) method eliminates overlapping bounding boxes and retains the bounding box with the highest confidence score.After the detection stage, the detected objects are classified based on their class [14].
YOLOv4 performs object detection on the validation data and calculates evaluation metrics to measure model performance.The validation process calculates the resulting loss from the model by comparing the model predictions with the actual annotations on the validation images.The Validation stage calculates the CIoU loss to measure the extent to which the model successfully detects the image and identifies the correct bounding box for each object.CIoU loss calculation can be formulated as follows: The matrices denoted by  and   correspond to the center points, respectively.The variable  2 is the Eclidean distance, the variable c is the diagonal length of the smallest enclosing box that covers both boxes, α is a positive trade-off parameter, and v measures the consistency of the aspect ratio [15].The validation stage also measures performance using the Intersection over Union (IoU) metric.IoU measures the extent to which the prediction bounding box overlaps with the ground truth bounding box.If the IoU between the prediction and ground truth exceeds the threshold of 0.5, the bounding box is considered a positive bounding box.If the IoU is below the threshold, the bounding box is considered a negative bounding box or there are no objects in it.This stage can evaluate whether the model is adequate or whether further tuning of the model architecture and hyperparameters is required.
Testing is done to determine the detection results by testing the test data image.The testing process loads the YOLOv4 model that has been trained.This test displays the image detected by the system in the form of a prediction bounding box for the detected glove image.Performance measurement of the results of testing glove detection is carried out by calculating IoU.

Evaluation
The evaluation carried out in this study to measure the performance of the method used is that there are several performance parameters used.from the performance parameters can be used as an analysis in the performance of the method used.a) Average IoU This performance measurement compares the image's ground truth with the model's predicted bounding box.
If the bounding box is further from the ground truth, it results in a smaller IoU value [16], [17].The IoU equation can be seen in Equation 2.
b) mAP Average Precision (AP) is calculated as the weighted mean of precisions at each threshold, the weight is the increase in recall from the prior threshold.Mean Average Precision(mAP) is the average of AP of each class [17].The mAP Equation can be seen in Equation 3.There are a total of n object classes in the dataset, so the variable i goes from 1 to n to describe each object class individually.In this context, i represents the index number of the object class being evaluated. (3)

You Only Look Once (YOLO) Version 4
YOLOv4 (You Only Look Once version 4) is an object recognition model designed to identify objects in images or videos quickly and accurately [18].It further develops the "one-stage object detection" approach, prioritizing speed and efficiency.YOLOv4 includes several improvements and innovations compared to the previous version (YOLOv3), including advanced techniques such as model optimization, accuracy improvement, and noise reduction.This method is an improvement over the previous version, YOLOv3.YOLOv4 has a 10% higher AP improvement and 12% higher FPS than the previous version [15].YOLOv4 uses the CSPDarknet 53 Backbone architecture of CNN.The working system of YOLOv4 is to perform convolution and then the max pooling process to reduce the image dimension.The confidence value and class labelling are given to the bounding box in the following process.The last process is Non-Max Suppression which eliminates all bounding boxes with low confidence values and leaves 1 with the highest confidence value [17].This method has an architecture that can be seen in Figure 2. YOLO4 mainly consists of 3 parts, namely Backbone, Neck and Head.For Backbone and Neck itself has a function for feature extraction and aggregation, and for Head has a function to detect or predict.In the YOLOv4 algorithm, the modules used are CSPDarknet53 (Backbone), PANet (Neck), and YOLOv3 (Head) [15].Darknet is a powerful and efficient convolutional neural network architecture.In YOLOv4, the Darknet backbone is enhanced by using residual blocks, which allows the model to learn more robust feature representations.YOLOv4 uses bagging (ensembling) techniques to improve its detection accuracy.In YOLOv4, several different YOLO models are trained independently, combining their predictions to produce the final prediction.Combining predictions from multiple models, YOLOv4 can reduce detection errors and improve model reliability.

RESULT AND DISCUSSION
In this section, this research conducted data training using YOLOv4.The training process is carried out three times, where the best training results can be used to test with test data.

Dataset
The dataset used in this study is a dataset taken from Kaggle's "Medical Personal Protective Equipment Dataset" [19].The glove dataset consists of training, validation, and testing data.The dataset is in the form of images with jpg data type.Table 1 shows the number of dataset image divisions used in the system development.Figure 3 shows an example of a glove image dataset.Experiments were conducted by training three times based on different hyperparameters.Batch size is one of the most important hyper parameters, because batch size is a variable that contains the number of training samples used in one batch for one iteration [20].This experiment was conducted to find the best hyperparameter to obtain the best model results.Table 2 shows the results of the experiments using different hyperparameters.The hyperparameters used in the first training are batch 32, maximum batch 6000, and random value 0. The first experiment was done with an image that was resized to 224x224.In the first experiment the results obtained from the first experiment were less than optimal due to improper hyperparameter configuration settings to obtain maximum results from the training conducted.The training process takes little time with this configuration, but with the hyperparameter configuration, it still needs to obtain good results.In addition, the random value also affects the results and duration of training that has been done.Using a random value of 0, the duration of the training process is fast, but this dramatically affects the results obtained, so the training results from the first scenario do not perform well.
The second experiment was conducted using a different hyperparameter configuration with a batch size of 64 and max batches 4000.There is an adjustment uses different size settings as well, namely by resizing the image to a 416x416 size.With this hyper-parameter configuration the performance obtain from the training process in this second experiment increased in terms of IoU.The third experiment using a batch size of 64, max batches at 6000, and a random value of 1. Figure 4 shows a decrease in training loss at each iteration which can produce good results.This third experiment was conducted using a dataset that was resized to 416x416.With the hyperparameter configuration used and the size of the trained image, the results of the training in this experiment found that a performance improvement was obtained.The best performance is obtain from training process at the 2000th iteration.However, although the performance obtained is better than before, the training duration is much longer, but the resulting training obtain better performance.
There are differences in the results of the training process performed.Hyperparameters and image size of the training process determine the results of the performance obtained.In Table 3, we can see the difference from each configuration.The improvement of the training results can be seen from the first experiment to the last.The best training results is show the third experiment using a batch size of 64 and a random value of 1 where mAP value of 76.54% and an Average IoU value of 58.25%.

Testing
The system that has been trained using the Yolov4 method is tested to find out how well the performance is obtained when detecting gloves.Tests were conducted on 64 images from the test data.Sample of test results can be seen in Figure 5 and Figure 6.  Figure 6 shows the results of incorrect glove detection.There are detection defects in the bounding box that do not match the coordinates with the intended glove.Figure 6(a) shows an example of a mismatch in the detection result caused by a blurred image.The system is unable to detect the gloves correctly.Figure 6(b) shows an undetected object.The detected gloves have a similar colour, so the system cannot detect the object as a glove.Figure 6(c) shows that undetected gloves are also caused by the object being truncated so that it is not fully visible as a glove.
The results of each image test that has been carried out, there are gloves that are detected correctly and there are gloves that are detected incorrectly.There are three conditions of incorrect glove detection, when the image is blurred, the object is the same color, and the glove is truncated.There are also detection defects with bounding boxes that deviate from the gloves.The system test results obtain an average IoU of 55.56% from the whole test.

CONCLUSION
Glove Detection using the YOLOv4 method effectively ensures safety and quality in laboratory environments.The system can help prevent contamination risks and maintain high hygiene standards with fast and accurate glove detection.Using YOLOv4, the system received an average IoU score of 55.56% on the test.The glove detection system using the YOLOv4 method has the potential to be further developed.Model updates and larger data sets can optimise and improve the method.Performance development can also be done by training with more significant iterations using a larger image size.

Figure 1 .
Figure 1.Glove Detection System Design In this stage, training is performed on the training data using Yolov4.The Yolov4 model recognizes objects based on existing classes, namely the glove class.Yolov4 performs image recognition with the number of images determined by batch size at each iteration.After training, a weight file containing the updated network weights is obtained.The data used for training is divided into training data and validation data.The image is divided into a grid or small cells of a specific size.Each cell represents a part of the image to be analyzed.After the image is divided into grids, the object detection process begins.YOLOv4 performs object detection using a deep convolutional neural network (CNN)[12].This network has a convolutional layer that contains a feature extraction layer and an object detection layer.The first convolution layer in the YOLOv4 network is responsible for feature extraction from the image[13].It extracts essential features that help in recognizing objects.After feature extraction, the object detection layer predicts the location and class of the object.Each cell in the grid predicts multiple bounding boxes containing the object.The prediction of the object class and confidence score is also performed for each bounding box.After location prediction and object classification are performed, the nonmaximum suppression (NMS) method eliminates overlapping bounding boxes and retains the bounding box with the highest confidence score.After the detection stage, the detected objects are classified based on their class[14].YOLOv4 performs object detection on the validation data and calculates evaluation metrics to measure model performance.The validation process calculates the resulting loss from the model by comparing the model predictions with the actual annotations on the validation images.The Validation stage calculates the CIoU loss to measure the extent to which the model successfully detects the image and identifies the correct bounding box for each object.CIoU loss calculation can be formulated as follows:

Figure 3 .
Figure 3. Sample of Glove Image Dataset

Figure 4 .
Figure 4. Graph of Training Process

Figure 5 .
Figure 5. Sample of Correct Glove Detection Test Figure 5 show examples of correct glove detection.With the model that has been trained, the glove detection process can be seen from the prediction of the bounding box against the glove.

Figure 6 .
Figure 6.Sample of Incorrect Glove Detection Test

Table 2 .
Result of Training