Tom Sephton, updated 14 November 1998
An Information Technology Evolution
Computer technology has evolved from room sized mainframes, to personal desktop systems, to portable laptops. Recent improvements in miniaturization of powerful processors, storage devices, displays, and wireless communications make the next logical step possible. This will move personal computing off the desktop and integrate it into everyday life. Computers will soon be able to provide, transmit, and record relevant information while walking down the street, driving a car, riding a subway, changing a diaper, or mowing a lawn.
The Next Generation Computer System
A computer for living should be wearable like a comfortable shirt. The interface should be as natural as talking with an old friend. The display should be as unobtrusive and convenient as a stylish pair of sunglasses, yet provide anything from the latest stock quotes to a private showing of a favorite movie. Lightweight Walkman style headphones can whisper a reminder about the next meeting or play music on request.
The information is displayed and heard as an augmentation of the real world. The wiring diagram is overlaid on the damaged device while looking at it or a translation is whispered in the ear while conversing with a foreign business associate. These are idealized descriptions of an information companion for the near future. The core technologies to achieve this vision are here now.
The Device: an Augmented Reality Multimedia Computer System
Our project group will develop the basic technology, user interface, and sample content for a wearable augmented reality computer system. This time, history, and location aware system will deliver audiovisual information that relates to the immediate location, movement and requests of the user. This multimedia content will include voice, sound, music, pictures, text, animation, and video. The device will combine these existing technologies:
| 1. | Information Storage | Hard Disk and DVD-ROM drive |
| 2. | Information Processing | Wearable Computer |
| 3. | Location Detector | Differential GPS satellite receiver |
| 4. | Visual Display | Heads-up Display glasses |
| 5. | Sound Playback | Headphones, 3D sound card |
| 6. | User Input | Headmic, Voice Recognition |
| 7. | Communication | Wireless LAN or IP link |
What is this Thing Good for Anyway
The potential applications for this device may number in the thousands. Wearable computers without the location awareness or augmented reality interface that we are proposing have been used to showing wiring diagrams and repair instructions to civilian and military technicians. The personal secretary/teleconferencing/internet communications application will be widely useful. A Napa grape grower has expressed the need for such a system to map problems in his vineyard while making the rounds on his tractor.
The device can provide background information, realistic sound environments, reconstructions, and video or animated reenactments to tourists at historical or archeological sites. The wearable device becomes an individualized virtual tour guide. Visitors to a site would rent the device for a few hours and take a self guided tour. At each important location within the site, the tourist hears a description of the historical significance or a relevant story. Visual icons indicate the availability of visual information, maps, stories. On voice request, relevant images, video clips, or 3D animations are displayed. Why not see a time lapse video of the growth cycle of a tree or flower while on a nature walk in the woods.
Entertainment is also an obvious choice. Walk down Park Avenue and watch a huge dinosaur stomp down the street in front of you as you hear the earth rumble and people behind you scream. Take aim with a laser gun visible only to you and shoot down the flying saucer about to disintegrate San Francisco.
Requirements for the Next Generation User Interface
This new wearable information system is fundamentally different from existing multimedia systems. Unlike most systems it is time, user history, and location specific. It will require a new approach to both user interface design and the structuring of information delivery.
While each application will have it's own interface needs, a general consistency in interface design will help the user be quickly comfortable with a new application. With some applications, multimedia content must be delivered to the user based on his or her position and path of travel. Delivery of information must be flexible and responsive to the user's needs and interests.
For both safety and aesthetic reasons, multimedia content should not intrude on the user's connection to the real environment until the user decides it is safe and appropriate. A video news flash while driving on the freeway would be dangerous, likewise with a ubiquitous menu bar. This requires an interface that alerts the user to the availability of information at any particular time or location, but allows the user to request delivery. The system should be able to direct the user to other relevant information and points of interest. The system will also have to keep track of the position of other users and relay communications over the wireless LAN or IP link for multi-user applications. The communications link is essential to keeping the user connected with co-workers, other users, and anyone on the global Internet.
Information Structure in a Real World Computer System
The current model for structuring and accessing information is based conceptually on the office file cabinet. This works well in an office oriented computer system, but will be awkward in real world applications. Location based applications such as a virtual tour guide would work better with a two or three dimensional location based information structure. Voice, text, video clips, etc. relevant to a particular place would be grouped by location, offered when the user nears that location, and delivered when the user requests the information. Time based applications like an appointment book/calendar might better structure information on a timeline. Files and directories can still exist electronically, but the representation to the user would be designed to fit each application.
A Conversational Interface
Because most users will initially be unfamiliar with the system we must create an interface that is highly intuitive. This is not a desktop system so the familiar desktop metaphor is irrelevant and cumbersome. Typing, pointing and clicking is comfortable when sitting at a desk, but not when carrying a sack of groceries, or soldering a wiring harness in an airplane.
The most natural and intuitive model for communication is human conversation. We propose to create an interface that responds primarily to voice input from the user and prompts primarily by verbal suggestions tied where appropriate to visual cues. While natural language is an active area of research, it is not essential to build full natural language capability into a first generation system. We will develop a simple menu of voice commands that the system can quickly train to recognize with each new user.
The voice and face of the system will be represented by a character appropriate to each application. The character can be anything from a cartoon rabbit to a human teacher, depending on the application. Creating an engaging and lifelike character as an interface to the system helps break down the resistance to interacting with a machine. It also facilitates the use of humor and emotion to avoid boredom and enhance understanding.
Because the system is location and motion aware, we will also use directional cues, both visual and aural to help orient the user. The character can beckon the user to turn in some direction or guide the user to particular place. A well designed script of prompts and responses can take the place of pull down menus. 3D sound cues can draw the user's attention outside the visual field when needed. Where visual choice icons appropriate, the voice interface can activate the choices and the character will confirm the selection.
The Project: Research & Development
Our project group consists of five graduate students and a faculty adviser. We propose to research the available technology and assemble a complete working prototype system. We will use real time data input from a differential GPS receiver, an orientation sensor and a voice recognition system. We will employ 3D sound technology to enhance spatial perception and immersion in the augmented environment. Our prototype system will be a wearable computer with a head mounted visual display, lightweight headphones and mic input.
To demonstrate the interface concept we will create a representative prototype of the character driven conversational user interface. This will be a complete software application limited in scope, but including real time interactive conversation with a 3D animated character which allows both user and character to move around and interact in a natural outdoor setting.
Current Progress
Our group has researched current developments in wearable computer technology by studying the literature in print and on the Internet. We have attended the 2nd annual International Symposium on Wearable Computers hosted by Carnegie Mellon University in Pittsburg Pennsylvania and the first IEEE sponsored Workshop on Augmented Reality in San Francisco. We have met with several leading researchers in the field and begun forming academic collaborations.
The augmented reality environment is viewed through a semi transparent eyeglass display that overlays a VGA image on the view of the real world. The Sony Glasstron head mounted display is our current development platform. We are also working on a prototype of a lighter eyeglass type display. We have assembled working hardware and software that provide a an effective overlay of computer images on the real world. We are currently developing means of testing the cognitive effectiveness of these displays.
We are pursuing a fully immersive visual and aural experience for the user. To accomplish this we need to know the with a good degree of accuracy where the user is and where he or she is looking at any given moment. We have researched GPS and orientation tracking sensors to accomplish this. We are currently testing GPS and geomagnetic tracking devices for accuracy, repeatability, and responsiveness within our interface.
We have written a simple working application code to test both these devices and our concept. This test application contains 3D icons and a real time animated 3D Bee character. The interface and control code is currently being programmed in C++ under a Windows NT environment. The OpenGL API with hardware accelerated rendering is being employed to create the real time 3D visual interface. We have written and are currently perfecting control code that links location and direction input from the GPS and orientation tracking devices to the virtual world within our application. This will allow a real time update of the visual and aural environment as the user walks around.
We have been evaluating several of the voice recognition software packages on the market to select the one with the best performance at an affordable price. We are currently adapting the IBM Via Voice system to provide voice input to our system.
The Aurreal 3D sound card is currently being tested to provide real time hardware accelerated sound output that can place a voice or sound effect in the user's 360 degree sound field. We will use this to create a spatial sound field that updates with the visual display.
At the completion of the technology evaluation phase of the project, we will assemble a working prototype system. This will combine the input devices we are now testing with the smallest, lightest computer system we can assemble that provides adequate performance for real time multimedia delivery. We will then design and develop a demonstration multimedia application featuring the character driven conversational interface in an Augmented Reality environment.
| Tell us what you think! |