内容注記 |
-
Other
application/pdf
-
Abstract
The development of the autonomous system in the world is growing rapidly, it is becoming important to solve many problems. In case of human-machine interaction, machines such as an autonomous car or a robot that works in human living environment need to know human’s future motion for its moving trajectories. An autonomous system needs to know all possible human behaviors to estimate the future motion of human. However, our technology is still too far to remember all human behavior of movement which is unique by personality. Even though, in order to advance these trajectories, a research needs to be performed as soon as the technologies is growing itself.
Some of the previous works were using the Kinect RGB-D camera which has the depth sensor that could be used to provide the pose of human body.
This research uses the RGB camera as the other option that we can rely on. Currently, RGB-D camera is not widely available in many devices. We realize that the pose estimation which has been obtained from the RGB camera is not as precise as RGB-D camera yet. It is still reliable enough when we can optimize the data as an obstacle we need to get through. We propose the system to obtain the human body motion prediction by using regular digital RGB camera including a smartphone camera or even a surveillance camera. We set a goal to predict 1 second ahead of the motion, and 30 fps videos have been prepared which include simple motions such as hand gesture and walking movement.
We used OpenPose library from OpenCV to extract features of a human body pose including 14 points. Since OpenPose estimation is not always precise as we expected and to minimize the estimation error of the OpenPose we restricted the image area to perform human pose estimation using YOLOv3. We input distance and direction which are calculated from the features by comparing two consecutive frames into Recurrent Neural Network Long Short-Term Memory (RNN-LSTM) model and Kalman Filter. For the evaluation, we com-pare the result by the distance from the prediction result to the ground truth which is the position of the node after 1 second in the video and group the distance with value that are lower than 1.8% of the diagonal frame size, we called it the successful percentage of prediction. As the results, Kalman Filter reached 93% in average, and RNN-LSTM reached 75% in average on our dataset. While Kalman Filter reached 77% in average, and RNN-LSTM reached 52% in average on CMU dataset. Mostly, Kalman Filter show better estimate accuracy than RNN-LSTM and based on the human motions, motion such as hand gesture and moving to the right side are easier than more complex motion like hand gesture and moving to the left side. We confirmed the validity of RGB-camera based method in the simple human motion case from the result, and we conclude that this is an important step to realize the prediction of more complex human motion.
-
Other
Human Interface Laboratory Division of Information Engineering Graduate School of Emgineering Mie University
-
Other
42p
|