This document proposes an exercise recognition system using facial features extracted from a mobile device's camera. It aims to help motivate exercise by automatically measuring exercises without additional equipment. The system obtains facial images during exercise, extracts tracking points and distances as features, and uses SVM classification on the FFT of features to recognize 9 exercises with 88.2% accuracy. Experiments show the system is robust to changes in window size and user standing position, but face tracking is sometimes lost and floor exercises have lower accuracy.
Exercise Recognition System using Facial Image Information from a Mobile Device (LifeTech 2021)
1. Exercise Recognition System using
Facial Image Information from a Mobile Device
2021 IEEE 3rd Global Conference on Life Sciences and Technologies
March 10, 2021
K aho K ato , Chengshuo Xia, Yuta Sugiura
K eio Univ ersity
2. • Exercise has many effects.
• Physical effects [1]
• prevent lifestyle diseases
• prevent the decline of physical functions
• Mental effects [2]
• maintain cognitive functions
• get rid of stress and anxiety
• To keep exercise by oneself is difficult for some people.
• go to a gym and take a lesson
• exercise with someone and encourage each other
⇒ measure exercises automatically by an information system
2
Exercise’s Effects and Barrier
[1] S.R. Colberg, R.J. Sigal, J.E. Yardley, M.C.Riddell, D.W. Dunstan, et al, “Physical activity/exercise and diabetes: A position statement of the American Diabetes Association,” Diabetes Care, vol. 39, no. 11, pp. 2065-2079, 2016.
[2] B. Stubbs, A. Koyanagi, M. Hallgren, J.Firth, J.Richards, et al, “Physical activity and anxiety: A perspective from the World Health Survey,” J. Affect Disord., vol. 208, pp. 545-552, 2017.
3. • By using a camera, a user does not have to attach the device.
3
Exercise Measurement by Camera
[3] I. Ar, Y.S. Akgul: A computerized recognition system for thehome-based physiotherapy exercises using an RGBD camera,IEEE Transactions on Neural Systems and Rehabilitation En-gineering, Vol. 22, No. 6, pp. 1160-1171, 2014.
[4] R. Khurana, K. Ahuja, Z. Yu, J. Mankoff, C. Harrison, and M.Goel: GymCam: Detecting, Recognizing and Tracking Simul-taneous Exercises in Unconstrained Scenes,Proc. ACM Inter-act. Mob. Wearable Ubiquitous Technol, Vol.2,
No.4, Article.185, 2018.
Multiple people’s exercises recognition
by an RGB camera [4]
Recognition of motion patterns, the user’s pose and
the exercise object [3]
• need to get a wide space
• need to install the specific camera
4. • Some commercially available applications exist.
• easy to install
4
Exercise Support System by Mobile Device
[5] VAY Fitness Coach, VAY, https://www.vay-sports.com/index (Accessed on 07/03/2020).
[6] Personal Trainer, Kaia health, https://www.kaiahealth.com/ (Accessed on 07/06/2020).
Personal Trainer [6]
VAY Fitness Coach [5]
• need to get a wide space
• need to prepare sports clothes
5. • Purpose
• realize the exercise recognition system which intends to improve the user’s
exercise motivation
• Requirement
• use a mobile device that is familiar to our lives
• reduce the install barrier
• do not have to track the whole body
• need only a table where the device is put and
space which a user exercises
• wear whatever clothes a user likes
• recognize exercises as soon as possible
• realize real-time feedback to a user
5
Our Purpose and Requirement
Exercises with our application
7. • Develop an exercise recognition system by using a built-in camera on
a mobile device
• obtain a user’s facial image from a built-in camera on a mobile device
• extract the features on the face from the image and get their coordinates’
changes as a time-series data
• use their frequency components and recognize kinds of the exercise
• count how many times a user exercises
7
Approach
Recognize kinds of the exercise
Do machine learning
Obtain the facial image during exercises
Extract features
from the image
a mobile device
The flow of the proposed system
8. • An application for exercise measurement by using Unity (for UI) and
Python (for recognition)
• display the pink marker if a user puts the face in the view.
• send and receive the time-series data via HTTP communication
• recognize kinds of exercises and count exercise repetitions each exercise
• save 7 days’ exercise record
8
Exercise Measurement Application
The measurement application view
Counting result of each exercise
Squat
5
Repetition Count
Recognition label
7 day’
exercise
record
10. • To get data, we use the exercise measurement application.
• obtain data 60-s every exercise
• write the data into a CSV file
• A user makes sure the own face is within the camera view.
10
Getting Data for Learning
The state when getting data
Getting data phase Data preprocessing phase Learning phase
obtain a camera image extract features write into a CSV file divide data remove trend & window function FFT SVM
11. • We used Single Face Tracker for Unity Plugin [7]
and extracted 60 tracking points.
(= 30 points × 2 parameters (x, y))
• calculate 2 distances as a z-axis parameter
• use a total of 62 features for classifier training
11
Facial Features Obtained from a Camera Image
Getting data phase Data preprocessing phase Learning phase
obtain a camera image extract features write into a CSV file divide data remove trends & window function FFT SVM
62 facial features consisting of
tracking points (0~59) and distances (60, 61)
[7] Single Face Tracker Plugin, unity Asset Store, https://assetstore.unity.com/packages/tools/integration/single-face-tracker-plugin-lite-version-30-face-tracking-points-90212, (Accessed on 12/02/2019).
12. 12
The Flow of Making Classifier (Remove Trends & Window Function)
Getting data phase Data preprocessing phase Learning phase
obtain a camera image extract features write into a CSV file divide data remove trends & window function FFT SVM
Divide data
every arbitrary
frame
Apply
a Hanning
window
Remove
trends
(constant fit)
Original data
Elapsed time(s)
Coordinate
value
process each feature
Elapsed time(s)
Coordinate
value
Elapsed time(s)
Coordinate
value
Elapsed time(s)
Coordinate
value
13. 13
The Flow of Making Classifier (FFT)
Getting data phase Data preprocessing phase Learning phase
obtain a camera image extract features write into a CSV file divide data remove trends & window function FFT SVM
Apply
a Hanning
window
Remove
trends
(constant fit)
Original data
Elapsed time(s)
Coordinate
value
process each feature
FFT
(sample size is 128,
sampling rate is 30 fps) Amplitude
Sample size
Extract the first half
components
(0~14.8Hz)
Fill the lack of data
by zero-padding
from both sides
Elapsed time(s)
Coordinate
value
Elapsed time(s)
Coordinate
value
Elapsed time(s)
Coordinate
value
Frequency(Hz)
Amplitude
Divide data
every arbitrary
frame
14. 14
The Flow of Making Classifier (SVM)
Getting data phase Data preprocessing phase Learning phase
obtain a camera image extract features write into a CSV file divide data remove trends & window function FFT SVM
Apply
a Hanning
window
Remove
trends
(constant fit)
Use the components
(0~14.3Hz)
of all features
Feature1 [Amplitude value (0 ~ 14.8Hz)]
Feature2 [Amplitude value (0 ~ 14.8Hz)]
…
Feature62 [Amplitude value (0 ~ 14.8Hz)]
Label 3968-dimensional data
(= 64-dimensional×62 features)
Standardization and
making the SVM classifier
Original data
Elapsed time(s)
Coordinate
value
process each feature
FFT
(sample size is 128,
sampling rate is 30 fps) Amplitude
Sample size
Extract the first half
components
(0~14.8Hz)
Fill the lack of data
by zero-padding
from both sides
Elapsed time(s)
Coordinate
value
Elapsed time(s)
Coordinate
value
Elapsed time(s)
Coordinate
value
Frequency(Hz)
Amplitude
Divide data
every arbitrary
frame
15. 15
Counting of Exercise Repetition
• Our system counts exercise repetition automatically.
• The threshold is changed in accordance with the kind of exercise.
Counting of exercise repetitions by face tracking
16. 16
Experiment 1-1 : Evaluation of Classification Accuracy
• Evaluate the classification accuracy for nine exercises
• evaluation method: Leave-one-subject-out cross-validation (LOSO)
Participants
8 people
(male: 3, female: 5)
Frame rate 30 fps
Kinds of exercise 9 kinds
The number of feature 62 features
Mobile device A laptop computer
Time of doing each exercise
About 60-s
(1,800 frames)
Frame size for dividing 100 frames
Number of data after dividing 16 data each exercise
Experimental condition
・Squat exercise
・Heel raise and lower exercise
・Jogging
・High knee raise exercise
・Walking
・No exercise (Standing straight)
・Sit-ups exercise
・Push-ups exercise
・Back extension exercise
Standing Exercise Floor Exercise
9 exercises selected
17. 17
Experiment 1-1 : Result and Discussion
• Result
• The average classification accuracy
was 88.2%.
• The processing time was 0.0066 s.
• Discussion
• The face tracking was occasionally
lost during the floor exercises.
• The system may not sustain a high
frame rate because the brightness
is less during the floor exercises.
Result of classification by LOSO(%)
18. 18
Experiment 1-2 : Evaluation of Influence of Window Size
• Evaluate accuracy in case of changing the dividing window size
• to investigate the suitable window size that can enable the operation
speed to be accelerated
• Result and Discussion
• The accuracy was over 90% and roughly
stable above 70 frames.
⇒The suitable window size may be close to
the average period of the exercise
(= 81 frames).
• The accuracy was over 80% above 45 frames.
⇒may be able to use a shorter window size
instead of declining accuracy Relation between window size and accuracy
19. 19
Experiment 1-3 : Evaluation of Feature Reduction
• Evaluate accuracy in case of using only 4 features
• to accelerate the operation speed
• 4 features consisting of
• 2 distances
• the average coordinate values (x, y)
• Result and Discussion
• The accuracy was 87.1%.
• The processing time became 1/15.
• The accuracy of “Back extension”
decreased 17.2%.
⇒can be improved the operation speed,
but may not be able to supplement
the partial loss of information. Result of classification with 4 features(%)
20. 20
Experiment 2 : Evaluation of Influence of Standing Position
• Evaluate the classification accuracy in case of changing a user’s
standing position
• need not to take care of the standing position during exercise
• Participants did 6 exercises at 10 positions.
• Training data : the front×60cm
• Test data : the other positions
Participants 3 people (male: 1, female: 2)
The number of feature 4 features
Kinds of exercise 6 kinds (the standing exercises)
Distance from a camera 4 kinds (60cm, 90cm, 120cm, 150cm)
Position at each distance 3 kinds (the front, the right, the left)
Experimental condition
10 kinds of the standing positions
Camera
30cm
out of view
within view
front right
left
21. 21
Experiment 2 : Result and Discussion
• Result
• The average classification
accuracy was over 80.0%.
• Discussion
• At first, the face tracking did not
activate at almost “all positions
×more than 120cm.”
• Face tracking sometimes
failed during the exercises at
“all positions×150cm.”
⇒・not influence on the accuracy
・may be able to measure multiple people’s exercises simultaneously
・need to change the face tracking middleware
Result of classification by changing standing position(%)
22. 22
Limitation and Future Work
• Limitation
• Face tracking sometimes lacks.
• Ambient light and available mobile devices are limited.
• The middleware system cannot track multiple faces.
• The classification accuracy of the floor exercises is lower because of individual
differences.
• Future Work
• install the facial part tracking
• implement a machine learning method that is usable on a smartphone
• use another middleware that can track multiple faces
• make a method for estimating an exercise’s pace and intensity
23. 23
Conclusion
Background
To keep exercise is important to keep health but difficult for
some people.
Related Work Measuring exercises by cameras and mobile devices
Suggestion
Exercise recognition system using facial features from a
camera built on a mobile device
Application Exercise measurement application, Exercise game
Implementation
・Preprocessing(Remove trends→a window function→FFT)
・Recognizing exercises with an SVM classifier
Evaluation
・Evaluate the classification accuracy using 62 and 4 features
・Investigate the suitable window size and the influence of
the standing position
Result
・The accuracy for 9 exercises was 88.2% and 87.1%.
・The suitable window size is the exercise’s period.
・The system is robust to the user’s standing position.
Limitation Lack of face tracking, available devices, multiple faces’ tracking