|
Continuous
Gestural Interaction with Mobile Devices
Introduction Much of the
interface work on wearable computers (more complex and fully equipped
computers
than PDAs) tends to focus again on visual displays, often presented
through
head-mounted displays [1]. These are often
heavy and hard to use in bright daylight, plus
they occupy the users’ visual attention [5]. Our novel aim here is to
try to create a system that
uses little
of our users’ visual attention and to see how effective such a system
can be.
Initial work has shown non-speech audio to be very effective in
improving
interaction on mobiles[6,7]. It allows users
to keep
their visual attention on navigating the world around them and allows
information to be presented to their ears. Our aim is to develop this
further.
The user will wear a pair of lightweight open-backed headphones to hear the sounds (and not obscure the sounds of the real world), which will be spatialised in a plane around the user’s head. The user is holding a PDA. This will be the screen of the wearable, if information must be displayed visually. This will be connected to the wearable via a cable or wireless network connection. There will also be an accelerometer on top of the PDA so that it can be used for pointing or gesturing. The user might also wear a tracker on a finger to allow further pointing or gesturing. How would such a system as the
one we are
suggesting work? Whilst walking, a user might point towards an audio
source indicating
a menu by tilting PDA. The user might select the audio source of
an MP3/wav
file and enter to the options of that audio menu in audio space.
Sonification methods like Doppler effect, volume , pitch, and timber
changes help users to know their position in audio space and selecting
targets. Much work has
gone into gesture recognition in static situations. For example, hand
gestures
are often used in virtual environments for control and in sign language
recognition [8] this is often done wearing an instrumented glove [12]. Recognition is also often done using video cameras [9]. Both gloves and camera-based systems are not effective
for the
types of fully mobile applications in which we are interested. The
image-processing approach has also had the disadvantage that much of
the research
effort has gone into the image processing, and not enough into how to
model and
recognise gestures - an area which is far from well understood. If we
have a
good modelling framework for gestures, this can be used with any
sensing
equipment. We therefore plan to use standard motion trackers from
InterTrax and
Polhemus, data gloves from Essential Reality and MEMS accelerometers
from
Memsic. These are not all usable in completely mobile settings but we
can track
within a large enough space to allow users to move around freely (we
will also
use a GPS receiver with compass for calibration to help reduce sensor
drift).
We will investigate three basic types of gestures:Head
gestures:
Examples of simple gestures for interacting with mobile phones by physically gesturing with the device. There are many approaches to advanced gesture recognition, such as artificial neural networks, principal components analysis (PCA), Hidden Markov Models (HMM) [10] and prototype trajectories [11]. There are currently no good solutions to gesture recognition on the move and this project will make a strong novel contribution in this area. The approach will be to view the gesture (Figure in above) not as an observed image to be decoded, but as being the result of a dynamic system running. This approach has been used in the modelling of cursive hand-writing [12] and seems more likely to lead to insight and sustained development of theory and algorithms than the pattern recognition approach. This is expected to be especially true in gestural interaction with mobiles, where we have to understand and ignore the effect of disturbances on the measured gestures that come from movement of the user through the environment. The approach was inspired by Murray-Smith's previous work with helicopter aerodynamics [13, 14]. Learning the motion of an aircraft through space is a closely related problem to that of characterising hand-motion during a gesture. There is a trajectory through a state space including yaw, pitch and roll, with accelerations and velocities in the x, y and z-axes. This project will make strong use of the latest approaches to modelling complex non-linear systems. We plan to use recent developments in nonparametric statistical inference (Gaussian Process (GP) priors, and Functional Data Analysis (FDA) [15]) to represent complex gestures. FDA is a general framework, which is especially promising for performing inference based on functional information from a number of correlated occurrences, which is identical to the gesture-modelling problem. It has also already been used specifically for dynamic handwriting analysis. Some related approaches based on mixtures of GPs, which has been developed in project GR/M76379 as models of paraplegic patients’ standing-up trajectories, will also be tested [16]. The adaptability provided by data-driven, nonparametric models has many advantages. We can
One reason for the lack of use of gesture recognition systems in the past is that they were not reliable enough. We believe that improved recognition software will help, but that a major breakthrough will be achieved by coupling this with improved feedback. If the user immediately and in a natural manner realises that the gesture has been misunderstood, then regenerating the gesture has a low cost. The key question is how to generate the natural feedback, given the problems of visual display discussed earlier. Our initial work with audio feedback on gestures drawn on the screen of a mobile device to control a music player was very effective when users were on the move [17]. Hermann’s [18] Principal Curve Sonification is similar in ethos, although based on static assumptions, rather than the dynamic models used here. We will look at two novel approaches to the feedback issue:
Acknowledgments This project is supported by SFI BRG project Continuous Gestural Interaction with Mobile devices, Science Foundation Ireland grant 00/PI.1/C067, the Multi-Agent Control Research Training Network - EC TMR grant HPRN-CT-1999-00107, and EPSRC grant Audioclouds: three-dimensional auditory and gestural interfaces for mobile and wearable computers GR/R98105/01. References [1] Barfield, W. and Caudell, T., Eds. Fundamentals of wearable computers and augmented reality. Lawrence Erlbaum Associates, Mahwah, New Jersey, 2001.[2] Harrison, B.L., Fishkin, K.P., Gujar, A., Mochon, C. and Want, R. Squeeze me, hold me, tilt me! An exploration of manipulative user interfaces. In Proceedings of ACM CHI'98 (Los Angeles, CA) ACM Press Addison-Wesley, 1998, pp. 17-24. [3] Hinckley, K., Pierce, J., Sinclair, M. and Horvitz, E. Sensing techniques for mobile interaction. In Proceedings of ACM UIST 2000 ACM Press, 2000, pp. 91-100. [4] Hindus, D., Arons, B., Stifelman, L., Gaver, W., Mynatt, E. and Back, M. Designing auditory interactions for PDAs. In Proceedings ACM UIST'95 ACM Press, 1995, pp. 143 - 146. [5] Geelhoed, E., Falahee, M. and Latham, K. Safety and comfort of eyeglass displays. In Handheld and Ubiquitous Computing, Thomas, P. and Gellersen, H.W. (Ed.), Springer, Berlin, 2000, 236-247. [6] Pirhonen, A., Brewster, S.A. and Holguin, C. Gestural and Audio Metaphors as a Means of Control for Mobile Devices. In Accepted for publication at ACM CHI 2002 (Minneapolis, MN) ACM Press, Addison Wesley, 2002. [7] Sawhney, N. and Schmandt, C. Nomadic Radio: speech and audio interaction for contextual messaging in nomadic environments. ACM Transactions on Human-Computer Interaction 7, 3 (2000), 353-383. [8] Braffort, A. A gesture recognition architecture for sign language. In Proceedings of ACM ASSETS'96 (Vancouver, Canada) ACM Press, 1996, pp. 102 - 109. [9] Segen, J. and Kumar, S. Gesture VR: vision-based 3D hand interace for spatial interaction. In Proceedings ACM Multimedia'98 (Bristol, UK) ACM Press, 1998, pp. 455 - 464. [10] Bregler, C., Omohundro, S., Covell, M., Slaney, M., Ahmad, S., Forsyth, D. and Feldman, J. Probabilistic Models of Verbal and Body Gestures. In Computer Vision in Man-Machine Interfaces, Chipolla and Pentland, A. (Ed.), Cambridge University Press, Cambridge, 1996. [11] Wilson, A. and Bobick, A. Using configuration states for the representation and recognition of gesture. MIT Media Lab, 1995, Technical Report, 308. [12] Singer, Y. and Tishby, N. Dynamical encoding of cursive handwriting. 1994 [13] Murray-Smith, R. Modelling human control behaviour with a Markov-chain switched bank of control laws. In Proccedings of the IFAC Symposium on Man-Machine systems (Kyoto, Japan), 1998 [14] Murray-Smith, R., Johansen, T.A. and Murray-Smith, D.J. Modelling Human Control Behaviour and Cooperative Control Systems. Daimler-Benz Research, 1996, Daimler-Benz Technical Report. [15] Ramsay, J.O. and Silverman, B.W. Functional Data Analysis. Springer, 1997. [16] Shi, J.Q., Murray-Smith, R. and Titterington, D.M. Hierarchical Gaussian Process Mixtures for Regression. In Proceedings of 5th ICSA International Conference (Hong Kong), 2001. [17] Pirhonen, A., Brewster, S.A. and Holguin, C. Gestural and Audio Metaphors as a Means of Control for Mobile Devices. In Accepted for publication at ACM CHI 2002 (Minneapolis, MN) ACM Press, Addison Wesley, 2002 [18] Hermann, T., Meinicke, P. and Ritter, H. Principal Curve Sonification. In Proceedings of ICAD 2000 (Atlanta, GA) ICAD, 2000 [19] Williamson, J, and Murray-Smith, R Audio feedback for gesture recognition, DCS Technical Report TR-2002-127, 2002 |
![]() |
||||||
contact the webmaster |