A Proposal of a Non-Intrusive , Global Movement Analysis of Hemiparesis Treatment

Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the shortcoming of being subjective. With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. After extrinsically calibrating the cameras, the setup system should be able to build a composite skeleton of the target patient, to globally analyze patient’s movement according to a reachable workspace and specific energy. These latter both are proposed to be carried out by tracking the hand movements of the patient, and the movement volume produced. Here we present the concept of the proposed system, as well as, the idea of its parts.


A Proposal of a Non-Intrusive, Global Movement Analysis of Hemiparesis Treatment
João Paulo Vieira, Diedre Carmo, Yan Jovita and Luciano Oliveira Abstract-Hemiparesis is the most disabling condition after a stroke.Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements.To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the shortcoming of being subjective.With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras.After extrinsically calibrating the cameras, the setup system should be able to build a composite skeleton of the target patient, to globally analyze patient's movement according to a reachable workspace and specific energy.These latter both are proposed to be carried out by tracking the hand movements of the patient, and the movement volume produced.Here we present the concept of the proposed system, as well as, the idea of its parts.

I. INTRODUCTION
H EMIPARESIS is a loss of muscle strength on one side of the body, resulting in decreased capacity and ability to perform movements.Stroke is a leading cause of death and functional dependence in the world [1].About 60% to 80% of persons who have had a stroke diagnosis tend to develop hemiparesis [2].The direct consequence of a hemiparetic patient is a deprivation of functional independence, and gradual onset of a disability process, which demands an earliest possible Physiotherapy treatment [3].Accurate assessment of the disease and the determination of the precise functional diagnosis by a Physiotherapist allows a correct monitoring of progression and rapid rehabilitation [4].
The main focus of the functional diagnosis is the human movement, which has different levels of complexity.Movement is defined as simple when performed by one joint in a single anatomical plan (e.g., elbow flexion or extension) or complex.Movement complexity increases with the use of two or three anatomical planes (e.g., shoulder circumduction) or more joints (e.g.put the hand in the opposite elbow).Functional human movements, that is, those ones applied to the execution of a necessary human function, are always complex, and, when there is a task-oriented execution, they are defined as activities [5].Examples of activities are walk, move around, lie down, stand up, transfers in sitting or lying position, among others.There are many activities limitations in hemiparetic patients, which usually arise due to hemiparesis alone or along with problems, such as lack of coordination and spasticity.They are mainly characterized by slower and uncoordinated reaching and grasping movements, excessive compensatory trunk movements, difficulties in walking, reduced gross and fine manual dexterity [4].A complete comprehension and objective analysis of activities are necessary for the evaluation and characterization of the patient conditions.
When the Physiotherapist meets the patient to start the process of evaluation and treatment, the movement analysis is carried out by considering each component, separately (joint or plan), or observing all aspects in an integrated perspective.Health professionals generally use clinical tests, such as rating scales, which can be less comprehensive and require subjective input [4].Other tools are also used, such as: Force platforms [7], wearable devices [8],optoelectronic systems [6], [9], electrogoniometers [10], electromyography [11] and accelerometers [12].These latter tools are more sophisticated, requiring more physical space, much time for preparation of the patient and a special set up structure [13], turning the ordinary clinical use less practical.Table I shows a qualitative comparison among our method and others.

A. Proposal
The aforementioned instruments and clinical tests may not establish a true picture of the patient functional situation [4], because they can not accurate, objective and quantitatively measure whether the performance of activities is being improved or not.Thus, emerges the importance of developing new instruments and methods to assess functional movement in these patients.The recent availability of low-cost RGB-D cameras opens up interesting perspectives in this field [6].These devices can provide depth images, allowing for the advancement of new means to evaluate human functional movements in a non-intrusive way.On that account, the aim of this paper is to introduce a proposal to developing a method of functional assessment of hemiparetic subjects using RGB-D cameras.The direct consequence of the choice of these devices is the possibility of the ordinary utilization, with practicability and best use of time with less need for preparation of the patient (i.e., no use of joint markers or need to wear devices).Furthermore, these cameras are becoming cheaper, making this proposal quite feasible.Our purpose is then to build an integrated hardware and software system with four RGB-D cameras arranged in such a way that allows an entirely reconstruction and tracking of a human skeleton (see Fig. 1).Inside the circle of the cameras, the patient should perform actions requested by the physiotherapist (i.e.reach).Figure 1 illustrates the physical setup of the system.Table I  Rather than calculating simple components, like angles of the joints, our goal is to evaluate globally the entirely functional movement volume in order to assess treatment quality.By doing so, a critical problem should be solved: instead of focusing on local actions, global movements need to be compared to measure the advances of the treatment applied in the patient.The calculation of the movement volume will provide the amount of reachable workspace regardless of the patient's body mass, and, along with energy expenditure computation by specific energy, will form the kernel of our method.With that, the proposed method will benefit from a fast and precise way (available in daily clinical practice) of stochastically comparing the evolution of the patient's treatment, with a sequence of normalized movement volume.
To achieve all the referred goals, a composite skeleton must be computed from all the cameras, representing then a 360 degree view of the patient.It is noteworthy that although a single RGB-D camera could provide ways to detect 3D movements, a composition of four cameras is supposed to outcome a more reliable structure of the patient, deeming different poses and views, and also avoiding patient's self occlusions.Since the goal is to have a structured environment to analyze patients, patient occlusion from hand-made objects will be avoided.After building the composite skeleton, patient's hand movement must be tracked, and a probabilistic graph structure defined to represent the movement.With that, it is possible to compare, and analyze the treatment improvement by comparing movement volume and its specific energy in different periods of time.In this paper, we present a framework to accomplish these goals, at the same time that we generally discuss how to build the proposed system and its possible developing problems.The proposed framework addressed here is an evolution of that one found in [14].

B. Structure of the paper
Reminder of this paper is structured as follows: in Section II, related works are discussed; Section III presents the general idea of the proposed system and its part; in Section IV, it is shown how to build a composite skeleton from multiple kinect cameras; Section V presents the traditional methods to measure hemiparesis treatment performance and our proposal to innovate the way to do that; Section VI draw some conclusions and future works.

II. RELATED WORK
One of the main problems in the evaluation and treatment of neurological conditions has been the lack of outcome measures that can be useful for clinical therapeutic efficacy studies, especially regarding to the human functional movement [10].This assessment in clinical practice commonly uses rating scales, which are cheap and easy to handle.Despite the widespread use and numerous validation studies, these scales can be less comprehensive and often require subjective input [7].Instruments like Motor Activity Log (MAL) [16], Wolf Motor Function Test (WMFT) [17], Functional Independence Measure (FIM) [18], Fugl-Meyer Scale (FMS) [19] have been validated for this use in Brazil.
In the assessment of functional movement by rating scales, the individual performs actions that are graded by an appraiser.The WMFT consists of 17 tasks, such as "pile up" and "catch and kick", that are graded with a scale ranging from zero (performs no attempt to move the upper limb) to five (the movement seems to be normal) [4].The FIM instrument, one of the most used in Brazil for this purpose, comprises 18 items assessed against a seven point ordinal scale.The Physiotherapist observes the realization of activities such as eating, dressing upper body or writing, and asks the patient, for example, regarding to sphincters control.The rating scale designates major graduations in behavior from dependence (grade 1), passing by moderate and minimal assistance (grades 3 and 4) to independence (grade 7).The scale provides the classification of individuals by their ability to carry out an activity independently, versus their need for assistance from another person or a device.If aid is needed, the scale assesses the degree of that need, measured in percentage from the applicator observation.The differences between the grades are, for example: 2 (subject performs less than half of the effort -25 -49 percent), 3 (performs 50-75 percent of the action) and 4 (more than 75 percent) [18].The measure of these percentages is made by observation and, as much as the scale tries to be objective and quantitative, subjective aspects of the appraiser can influence graduation.
Scales such as MAL and FMS use similar structures, and they are all unable to establish an objective and measurable parameter that allows plotting a curve of the improvement of movement execution.On the other hand, quantitative devices such as force platforms, video analysis, optoelectronic systems, electrogoniometers, electromyography and accelerometers are not only costly, but also require specialized training to be handled.They detect and analyze the angular displacement, ground reaction forces and motor control [12], and are generally applied to the measurement of simple-plan human movement.Important motion analysis centers [7] generally use stereophotogrammetry with a marker-based system (MBS), like Vicon R [11] or Qualisys R [9]. Figure 2 shows human body markers and their respective skeleton made by a Vicon system R , with the goal of human movement analysis.Once the markers are positioned on the skin surface, the skeleton is obtained either via conventional photography or optoelectronic sensors.The system proposed in [6] uses eight to twelve cameras in order to reconstruct human body skeleton, requiring a lot of space and additional set up structure to accomplish the referred task.
In motion capture, joint angles and distances are measured with the aim of evaluating human movement.Alone, these variables do not solve the problem of quantification and categorization of the complex human functional movement, because they provide information only about the components of the simple movements [10].In fact, joints and angles themselves do not generally show the progress of a Physiotherapy treatment, since they are not able to show, for example, compensation body movements during body-parts evaluation; another drawback has been reported by Bonnechere et al. in [6], when they claimed that accuracy and reproducibility are main problems of MBS systems, and these evaluation systems are still controversial for the estimation of joint centers and relative segment orientations.In our work, the hand point in the skeleton will be tracked, and its speed and movement volume calculated; with that, joint positions and angles are not necessary to be found precisely.
Other studies exploit RGB-D cameras to assess human movement.Gabel et al. [20] and Auvinet et al. [21] analyzed the human gait using RGB-D cameras.The first study used an RGB-D camera and a virtual skeleton as the input to a learned model, and found that the use of the device resulted in accurate and robust measurements of a rich set of gait features.The second work exploited the use of three cameras and a mathematical process to solve symmetrical gait problems.To establish validity and reliability of measures, in [6], the volunteers performed simple-plan movements, which were made by an RGB-D camera along with a MBS system.The MBS was considered as the reference, and discrepancies between markerless systems (MLS) and MBS were evaluated by comparing the range of motion (ROM) of both systems.MLS reproducibility was found to be statistically similar to MBS results for the exercises accomplished.Measured ROMs, however, were found to be different in the systems.In [7], the authors compared a Vicon System R with an RGB-D-based system, considering 19 individuals to establish the accuracy of the latter system in clinically measuring relevant movements.

III. OUTLINE OF THE PROPOSED SYSTEM
The aim of the system is to assess the quality of the Physiotherapy treatment using a skeleton composition built from four camera views.In a word, besides making skeletons from each camera (following the idea of [22]), after an extrinsic calibration, a 360 degree view of a unique composite skeleton will be calculated similarly to [23]. Figure 3 illustrates our proposed framework.Although it is canonically feasible to make the calibration of the four cameras, the main problem is the cross-talk effect when two cameras are pointing to each other; this problem is discussed in more details, in Section IV-A.
With the composite skeleton, the limbs of the patient will be tracked in order to spatially define patient's movement.After that, the movement volume will be segmented in a semi-automatic fashion, and normalized according a reference coordinate axis.This latter step is of underlying importance, when comparing 2D or 3D shapes.Comparisons will be made by combining the idea of reachable workspace and minimum energy cost, calculated according to the specific energy (see more details on Sections V-B and V-A).
Comparing movement volumes is not conclusive to detecting patient recovery.One patient could perform a larger movement, not representing actual improvement in his/her condition to perform the activity.This is so, since activities, as complex movements, are composed of a sum of simple movements.For instance, in a reaching movement, the human being uses the shoulder, elbow and wrist joints to naturally perform the action.In hemiparetic patients, the lack of coordination and muscle strength in these joints may cause problems in the activity execution.With the treatment, the improvement in shoulder muscle strength can lead it to an increase in the range of motion, but, without proper coordination, the individual may still be unable to perform an accurate reach.Then, the reachable workspace will increase without necessarily occur improvement in functional movement.Here emerges the need to use another variable, such as energy cost, to determine the improvement (or not) in the activity performance.The actual expectation is that the functional movements become more effective, and this is done by achieving a better relationship between the reachable workspace with the minimum energy cost as possible (see Section V-A, for more details).
In our proposal, the system will establish this improvement through a combination of the amount of movement and the reduction of energy consumption within the same path or similar trajectories.The idea is to make treatment assessment with the analysis of energy reduction expenditure and the movement trajectory; this latter will be achieved by comparison of movement volumes (see more details on Section V).

A. System Setup
The system is comprised of a desktop and four RGB-D cameras; as the cameras demand a particular USB controller for each one, there are four controllers in the computer used, three in the back and one in the front.The desktop is an Intel i7 4770 3.4GHz with 16GB of DDR3 RAM and a GTX780 NVidia R Geforce graphical card.Figure 4 illustrates the system setup for our initial experiments.Fig. 3: Framework of the proposed system.From a pool of RGB-D cameras, after acquisition of the patient's depth image, a single view skeleton is built after composite calibration of the cameras.With the composite skeleton, the patient's body parts are tracked in order to define the movement volume.Along with the specific energy calculation, it will be possible to analyze the movement objectively.[24] introduced a method based on complex exponential maps and twist motions; Brubaker et al. [25] tracked body joints using physics and body dynamics mathematical models.Hou et al. [26] proposed a Gaussian Process Latent Variable Model with multiple cameras to successfully track complex movements from multiple views.Recently, [27] used the intrinsic symmetry of the human body for pose detection.All these methods do not run in real time.One interesting example of a real time body tracking without the use of RGB-D sensors or even cameras is [28], who used radio reflections and time of flight calculations to 3D track an user within the field of view of the system, even trough walls.
Nowadays, with the recent development of kinect, the use of depth images has become one of the top trending topics in body joint tracking.These devices provide a relatively accurate depth map of the scene with low computational cost and no need of markers in the body.The major part of the computational work, to build a 3D representation of the scene, is done within the sensor firmware, and only a stream of data is sent to the connected computer [13].This way, frames are easily acquired in thirty hertz with that sensor.From the depth map provided by kinect sensor, a single skeleton is obtained after the work proposed in [22], which uses decision random forests over millions of features extracted in order to estimate the position of the joints of the human body, in real time.Arai et al. [29] present a non-real time way to compute the skeleton from depth maps.
One of the main problems of the single skeleton computed by only one kinect is the occlusion of body parts by turning the limbs or poor detection of movements towards the camera axis.These problems can become a challenge for those who want high accuracy [30].In our proposed work, loosing hand position is critical for the patient movement analysis; on that account, a composite skeleton from a 360 degree view of the tracked user is necessary.A method to calculate composite skeletons can be found in [23].

A. Challenges in Composing a Unique Skeleton from Multiple Views
Some challenges in composing a unique 360 degree view skeleton come from using multiple kinects.Due to nature of kinect active sensing, multiple infra-red (IR) emitters may cause noise in the depth map captured.Berger et al. [31] measured a high error in a experiment with motion capturing and body joint calculation by using multiple kinects.In practice, errors due to noise can be easily observed with multiple kinects, occurring even if the devices are not pointing to each other; this occurs due to reflections of the infrared from one kinect being captured by the others [32].Figure 6 shows an example of misaligned skeleton over the body, because of the use of multiple kinects, not only because of the cross-interference among the cameras, but also due to the lack of calibration between the integrated RGB and IR cameras.To solve the interference problem, Schroder et al. [33] proposed a that all IR kinect emitters are blocked by a wooden spinning wheel with a hole; working as a time domain modulation, the wheel only allows each kinect to emit its infrared at a time, consequently reducing cross-talk.Butler et al. [34] found what is arguably the best solution for this problem, name it "Shake 'n' Sense".By simply vibrating each kinect camera, they almost completely removed problems caused by cross-interference.

B. Our Goals
We aim to use a 3D skeleton representation of a hemiparetic individual to track his/her hand movement by means of a 360 degree view.Although one kinect can reconstruct 3D information from the scene, accurate 3D track may demand more information because of possible body occlusion of limbs.For that, our goal is to transform all coordinates to the same coordinate system of one chosen kinect for better accuracy and reliability.Figure 7 was created to depict a 360 degree view skeleton in a circle of four kinects, placed in intervals of 90 degrees.Some works have also used multiple kinects to reach the same goal.Asteriadis et al. [35] solved the occlusion problem by using multiple kinects in the evaluation of human motion.Introducing a new method to solve the occlusion problem using a Fuzzy Interference System, Kaenchan et al. [23] used one camera as reference, and finding a fuzzy homography of the others concerning the reference one; the goal was to analyze walking posture.
So far, it is clear that an extrinsic calibration of the four cameras is necessary to achieve our composite skeleton.Yang et al. [36] discussed the importance of calibration of multiple kinects, presenting some methods to achieve that goal.Also, Williamson and Laviola [40] and Kaenchan et al. [23] also propose other methods to extrinsically calibrate multiple kinects by using one of them as reference.

PERFORMANCE
Reaching objects is a type of movement affected by patients with Hemiparesis.To perform an activity such as picking up a glass of water or a piece of paper to dry the hands, humans can move through different paths.When leaving a starting position, one can scroll through the possible paths (e.g., 1, 2 or 3 of Fig. 8).The differences among the infinite possibility of paths are determined by the ability of each individual.Healthy persons mostly always perform straight and easy trajectories in movement, similar to the path 1, in the figure, since the necessary coordination of muscle contractions and perfect motor control to move is completely matched.Figure 9 shows an example of a common reach activity performed by a person without muscle control problems.
Persons with functional motor control problems have a narrow range of choices regarding to movement patterns.Hence they try to play the activity within their means.Hemiparetic patients tend to follow the easier path available (e.g., 2 or 3 in Fig. 8) within their range of possibilities due to the injury in the central nervous system.In a word, healthy persons choose the path with the best efficiency, always trying to achieve the minimum waste of energy as possible (path 1 in Fig. 8), while ill patients try to do the best within their possibilities.Figures 9 and 10 demonstrate these ideas with examples of reach, with a Physiotherapist representing a common healthy person in 9 and a hemiparetic patient in 10.In Figure 9, the path exemplifies a choice for a smooth coordination between hand, wrist, elbow and shoulder movements that occur, while the trunk is virtually motionless.The result is a straight motor trajectory.According [47], hemiparetic patients use excessive trunk or shoulder girdle movement in reaching movements for targets placed close to the body, as illustrated in this occurs because a compensatory mechanism by which the central nervous system may extend the reach of the arm when the control of the active range of arm joints is limited and inter-joints coordination is poor.These patients have a lower efficiency in the movement, with greater muscle participation and body segments converging at higher energy expenditure.In this case, the energy cost calculation could reveal the difference between paths 1, 2 and 3 (see Fig. 8).
Regarding activities, the functional diagnosis must consider the global movement.The simple analysis of joints ROM or muscle force can not be sufficient to establish the correct patient status.As previously stated, the use of scales takes into account the movement as a whole, and has been widely applied.However, problems with the influence of subjective aspects generate losses in establishing the diagnosis.
Theories of motor learning establish that decreased energy waste may indicate improvements in the performance of a human activity [42].This is supported by the assertion that human body always attempts to minimize energy cost of its movement [41].For example, when athletes try to make an action which demands rhythm and length, they always train on more suitable strategies to minimize energy consumption [41].In rehabilitation of hemiparetic patients, there is a quest for recovery of movement, aiming to make it as effective as possible.The reduction in energy waste in motor learning for hemiparetic patient activities is the main target [42].Based on that, it is necessary that our system allows evaluating the path performed by the patient, calculating the energy wasted.This allows for the establishment of improvement or worsening in treatment.On that account, we suggest to exploit an association between specific energy cost and reachable workspace as determinant parameter for a "better" or "worse" motor learning.This idea is supported by [44].In a classic study about motor control, Hoff used the concept of "minimum jerk model" applied to the reach trajectory planning.The concept is used to quantify the trade-off between trajectory, quickness and effort in reaching movements.The author suggests that the reach movement improves when it presents better smoothness in the trajectory and energy minimized.

A. Energy cost
The measurement of energy cost is generally performed by calculating the maximum oxygen consumption or estimation of this.As the name implies, this calculation estimates the energy consumption of the entire body.Another way is to estimate the consumption by the metabolic equivalent (MET) activity by means of general equations, although it is a general measure and does not provide an accurate value [46].The maximum oxygen consumption and the MET estimation are not able to determine the energy expenditure of specific movements.Here, we propose the use of the specific energy of each movement, defined as energy per unit of mass.For the calculate of this variable, the velocity of the movement should Fig. 8: Evaluating the minimization of movement energy.be obtained by the system.The specific energy formula derives from kinetic energy, which is calculated as a function of mass and velocity, given by where the inner integral sums over the mass element dm and the second over the velocity dv.For a solid body with constant mass, one can eliminates the mass dependence by estimating the specific energy, defined as To estimate the specific energy, our system needs to measure the velocity of movements performed in three dimensions, by calculating the distance covered by the tracked point over time within the movement volume.This is usually done by optical flow methods in order to calculate pixel speed in the image.

B. Reachable workspace
The rationale of the reachable workspace is to make graphical representation of the volume movement boundaries.The concept was first used to measure the reach of a robot arm, and its first application in humans took place in the evaluation of activities of an airplane pilot [45].
In [10], the authors postulate that the use of 3D reachable workspace should be used to identify clinical changes in human functional movement.Searching for new ways of analyzing human functional movement, [43] proposes a methodology to assess 3D reachable workspace of human arms.In [10], the authors found that the use of 3D reachable workspace should be used to identify clinical changes in human functional movement.Kurillo et al. [43] perform a measurement of the 3D reachable workspace using a low-cost stereo camera.That camera system was capable of capturing and reconstructing 3D reachable workspace with robustness and minimal loss of data points.In [45], an RGB-D camera was used to analyze the 3D reachable workspace, comparing with a motion capture system (MCS).The results showed that RGB-D cameras are so accurate and reliable as MCS.Here, we intend to evolve this concept to quantify movement volume.From the comparison between the "before" and "after" volume, one can observe the evolution of the patient in the treatment.Along with the computation of specific energy, it will be possible to objective and globally quantify the improvement in the treatment.

VI. CONCLUSION
A proposal of a human functional movement analysis by specific energy of movement volume was presented here.The aim is to apply a system comprised of RGB-D cameras to hemiparetic individuals in order to assess the quality of Physiotherapy treatment.The rationale of using specific energy of movement volume over the patient's 3D reachable workspace is to avoid common errors on the calculation of joints and angles, which are found when using low cost RGB-D cameras.By computing the specific energy, one can easily achieve an amount of energy without taking into consideration the bodypart mass, which would be very expensive to compute by using camera sensors.In the future, the goal is to build the apparatus to develop our method and quantify its performance.It is noteworthy that the comparison between the "before" and "after" events, that will be analyzed by the system, must be done stochastically, turning the method more flexible to inherent system noises.

Fig. 1 :
Fig. 1: Physical setup of the proposed system: four RGB-D cameras to build a 360 degree view of a person.

Fig. 2 :
Fig. 2: Illustration of the use of a Vicon System R .

Fig. 5 :
Fig. 5: Front view skeleton on depth map generated by an RGB-D camera.

Fig. 6 :
Fig. 6: Multiple views of a target body.Misaligned skeletons due to infra-red interference and lack of intrinsic calibration.

Fig. 9 :
Fig. 9: A sequence of reaching activity performed by a normal person.

Fig. 10 :
Fig. 10: Sequence of reaching activity performed by a hypothetical patient.

TABLE I :
Qualitative comparison of movement analysis methods (a '+' correspond to the level of the feature).