PC-telephone convergence for multi-modal interactive environments
David Horner ()
Dr. Beomjin Kim
CS494 Topics in virtual reality - 2006
Indiana Purdue University Fort Wayne
This document is available from Google Docs:
VirtualTelephonyWorld - http://docs.google.com/Doc?id=d8bmkqt_5zz8r43
Latest windows binary installer release:
Release Date: April 06, 2007
Source code access is available at:
SVN access is: https://dave.thehorners.com/svn/virtualtelephonyworld/trunk
VirtualTelephonyWorld is a prototype in which you can view and animate a 3d robot model within a shared multi-user virtual world. Each user can view the virtual world from a shared 3d perspective and can interact with it using the touchtone phone and keyboard. This virtual world also exposes a high performance conference bridge which allows the users to collaborate remotely with other people exploring the same environment. This prototype demonstrates how open source software has made it possible to develop sophisticated multi-modal PC-telephone converged applications. It will utilize free software libraries which strive to be cross platform and allow for free and commercial use.*
Giving a group of users the ability to interact with each other on a telephone conference enables people to work remotely. Many people spend a large amount of their work week on the telephone in conference calls. However, complex subjects like science, engineering, and math problems are much easier to solve and understand when a team of people can visualize data and communicate together.
Since the advent of the telephone, people have struggled to effectively
communicate simple things like phone numbers, credit card numbers, URLs, and
other hard to pronounce/spell words on the phone. We've developed tools like
email, web browsers, internet chat, and SMS to communicate information
accurately between colleagues. But each of these mechanisms require
a certain amount of prior knowledge and setup.
Users can not only talk to one another, but they can also interact with software interactive voice response(IVR). Instead of having to listen to available options one at a time, the user can interact with a visual representation of the IVR. This allows the user to navigate IVR prompts with ease because they are able to travel in a non-linear fashion.
This prototype is written in C and C++. I will be using the C library libapr for cross platform ADTs, threads, sockets, and mutexes. I reviewed several open source libraries for 3d rendering and chose the C++ Ogre3d framework because it was packaged cross platform and well documented. Last and most importantly, I will use Freeswitch for things relating to the conference and event subsystems. FreeSWITCH is an open source telephony platform written in C/C++ to facilitate the creation of telephony services. It handles all the termination, network events, and channel mixing for the conference.
Features / Requirements:
Trigger visual 3d animations using a keyboard or touchtone phone.
Reports participant activity and robot state information using speech synthesis engine. (Cepstral or Festival)
Allows for interactive two way communication between virtual robot model and phone system.
- Provides multi-party conferencing from traditional phones, cellphones, and softphones.
Develop an installable executable which will allow people to download and easily use (NSIS).
Use open and free libraries to accomplish my goal (Freeswitch, Asterisk, Ogre3d, libapr, and other code contributions).
This project explores the development of a multi-modal human computer
interface (HCI). It provides the following modalities of interaction:
visual display, speech synthesis, keyboard, mouse, and touchtone phone.
Speech recognition is also easy to implement using a commercial speech engine
or by using a VXML provider. Applications which expose many
modalities are more accessible for people with disabilities and provide a more
enriched human computer interaction.
does the prototype do?
VirtualTelephonyWorld is a prototype which allows many users to animate a
robot model placed in the center of a virtual world. It provides a
telephone conference bridge which can be accessed through a sip softphone,
google talk, or a plain old telephone. When a user presses a button 1
through 5 (using a keyboard or the telephone), the robot will transition to
the animation state representing the button pressed. The robot provides
text to speech notifications for the telephone users connected. Button
press events are triggered from a telephone's DTMF touchpad or from a PC
What do you mean by model?
This prototype relies on the Ogre3d rendering framework. A model in this prototype refers to a model entity within Ogre3d speak. Ogre3d provides the needed functionality to render and display the first person perspective of the shared virtual 3d world. Ogre3d exposes a scene graph which allows the programmer to add models into the 3d world and maintain relationships between these model entities. I've implemented a very simple graph for this prototype which simply loads the robot.mesh resource from the Ogre3d SDK distribution. This robot.mesh includes all the polygons, textures, animation states, etc. needed to load and display the robot within my prototype. I've also included some code to add textual overlay information. The picture to the right is a keyboard map indicating the keys to press in yellow and the animation state name to the user.
High level network diagram
The diagram below indicates how information flows between all of the components needed for this prototype.
The user picks up a phone and calls the published phone number. Audio and DTMF are transmitted from my voice provider's T1 interface into a PRI card on a Linux distro which runs Asterisk. Asterisk answers the call and automatically dials the sip URL for the Freeswitch server. The Freeswitch server then answers the sip call and places the user in a conference bridge. From this point on, the phone caller can talk and interact with the other callers and virtual models that are connected to the same virtual world.
A one time download and install provides the user with the required software. The application includes two threads of execution. One worker thread makes a connection to the Freeswitch mod_eventsocket port using libapr TCP sockets. This connection is used to read DTMF events and send DTMF events to the Freeswitch server. It is also used for sending commands to the TTS interface within Freeswitch. The main thread of execution manages the Ogre3d eventloop. This eventloop renders the virtual world on the screen and reads keyboard and mouse events from the user using the OIS library included within the Ogre3d SDK. The two threads communicate with each other using libapr's shared memory and mutex support.
Note: Why use both Asterisk and Freeswitch? Because I can. I've included Asterisk for interop sake, Freeswitch could have provided this T1 interface and removed the need for Asterisk. I'm just showing that both projects can interoperate together nicely. This prototype's event subsystem is based on Freeswitch's mod_eventsocket, so Freeswitch is required. I could have also implemented an event interface to Asterisk, I did not do this.
Free open software works
This project stands on the shoulders of giants by using free software libraries and tools available on the internet today. Open Source software has come a long way in the last 10 years I've been working with it....and it is only going to get better. With open source you've got the code, you can make the change. Trouble? You can find a developer like me to provide support! Good projects have documentation, bug/feature trackers, wikis, and other content. User communities grow and people congregate within irc.freenode.net, mailinglists, and forums. For many things, the open source way is the right way.
A big thanks goes out to....
This project could not have been possible without the hard work of many contributors. I'd like to thank all the developers associated with the following projects: Freeswitch, libapr, Ogre3d, Asterisk, and Linux. Each of these projects also include other open licensed libraries, I'd like to thank those contributors as well. Thanks again to Dr. Beomjin Kim for accepting this project as an independent study project to apply towards my Purdue CS degree.
Now for a little self promotion... ;)
If you are looking for help integrating advanced web/telephony services,
custom software development, or support....
Please checkout my company TecDev - http://www.tecdev.com/
Please visit me on my website http://dave.thehorners.com/
I'd love to hear your thoughts and comments!
--David A. Horner
BTW: I will be presenting this work at the 2007 ClueCon telephony developer conference in Chicago. Please come if you've got the time!