Human-Computer Interaction Based Only on Auditory and Visual Informaion

Hui Sha/Arvin Agah
Transaction on Control Automation, and Systems Engineering, vol. 2, no. 4, pp.285-297, 2000

Abstract : One of the research objectives in the area of multimedia human-computer interaction is the application of artificial intelligence and robotics technologies to the development of computer interfaces. This involves utilizing many forms of media,integrating speech input,naturallanguage,graphics, hand pointing gestures, and other methods for interactive dialogues. Although current human-computer communication methods include computer keyboards, mice, and other traditional devices, the two basic ways by which people communicate with each other are voice and gesture. This paper reports on research focusing on the development of an intelligent multimedia interface system modeled based on the manner in which people communicate. This work explores the interaction between humans and computers based only on the processing of speech (words uttered by the person) and processing of images (hand pointing gestures). The purpose of the interface is to control a pan/tilt camera to point it to a location specified by the user through utterance of words and pointing of the hand. The system utilizes another stationary camera to capture images of the user's hand and a microphone to capture the user's words. Upon processing of the images and sounds, the system responds by pointing the camera. Initially, the interface uses hand pointing to locate the general position which user is referring to, and then the interface uses voice command provided by user to fine-tune the location, and change the zooming of the mera, if requested. The image of the location is captured by the pan/tilt camera and sent to a color TV monitor to be layed. This type of system has applications in tele-conferencing and other remote operations, where the system must respond to a user's command, in a manner similar to how the user would communicate with another person.....

Keyword : Human-computer interactions, intelligent interfaces, multimedia systems, speech processing, image processing

