VirtualTelephonyWorld
PC-telephone convergence for multi-modal interactive environments
David Horner ()
Dr. Beomjin Kim
()
CS494 Topics in virtual reality - 2006
Indiana Purdue University Fort Wayne
Latest windows binary installer release:
http://dave.thehorners.com/personal/virtualtelephonyworld/VirtualTelephonyWorld-1.0.0.5.exe
Release Date: April 06, 2007
Source code access is available at:
SVN access is:
https://dave.thehorners.com/svn/virtualtelephonyworld/trunk
Giving a group of users the ability to interact with each other on a telephone conference enables people to work remotely. Many people spend a large amount of their work week on the telephone in conference calls. However, complex subjects like science, engineering, and math problems are much easier to solve and understand when a team of people can visualize data and communicate together.
Since the advent of the telephone, people have struggled to effectively
communicate simple things like phone numbers, credit card numbers, URLs, and
other hard to pronounce/spell words on the phone. We've developed tools like
email, web browsers, internet chat, and SMS to communicate information
accurately between colleagues. But each of these mechanisms require
a certain amount of prior knowledge and setup.
Users can not only talk to one another, but they can also interact with
software interactive voice response(IVR). Instead of having to listen to
available options one at a time, the user can interact with a visual
representation of the IVR. This allows the user to navigate IVR prompts
with ease because they are able to travel in a non-linear fashion.
This prototype is written in C and C++. I will be using the C
library libapr for cross platform ADTs, threads, sockets, and mutexes. I
reviewed several open source libraries for 3d rendering and chose the C++
Ogre3d framework because it was packaged cross platform and well
documented. Last and most importantly, I will use Freeswitch for things
relating to the conference and event subsystems. FreeSWITCH is an
open source telephony platform written in C/C++ to facilitate the creation of
telephony services. It handles all the termination, network events, and
channel mixing for the conference.
Features / Requirements:
Trigger visual 3d animations using a keyboard or touchtone phone.
Reports participant activity and robot state information using speech synthesis engine. (Cepstral or Festival)
Allows for interactive two way communication between virtual robot model and phone system.
Develop an installable executable which will allow people to download and easily use (NSIS).
Use open and free libraries to accomplish my goal (Freeswitch,
Asterisk, Ogre3d, libapr, and other code contributions).
This project explores the development of a multi-modal human computer
interface (HCI). It provides the following modalities of interaction:
visual display, speech synthesis, keyboard, mouse, and touchtone phone.
Speech recognition is also easy to implement using a commercial speech engine
or by using a VXML provider. Applications which expose many
modalities are more accessible for people with disabilities and provide a more
enriched human computer interaction.
What
does the prototype do?
VirtualTelephonyWorld is a prototype which allows many users to animate a
robot model placed in the center of a virtual world. It provides a
telephone conference bridge which can be accessed through a sip softphone,
google talk, or a plain old telephone. When a user presses a button 1
through 5 (using a keyboard or the telephone), the robot will transition to
the animation state representing the button pressed. The robot provides
text to speech notifications for the telephone users connected. Button
press events are triggered from a telephone's DTMF touchpad or from a PC
keyboard.
What do you mean by model?
This prototype relies on the Ogre3d rendering framework. A model in this
prototype refers to a model entity within Ogre3d speak. Ogre3d provides
the needed functionality to render and display the first person perspective of
the shared virtual 3d world. Ogre3d exposes a scene graph which allows
the programmer to add models into the 3d world and maintain relationships
between these model entities. I've implemented a very simple graph
for this prototype which simply loads the robot.mesh resource from the Ogre3d
SDK distribution. This robot.mesh includes all the polygons, textures,
animation states, etc. needed to load and display the robot within my
prototype. I've also included some code to add textual overlay
information. The picture to the right is a keyboard map indicating the
keys to press in yellow and the animation state name to the user.
High level network diagram
The diagram below indicates how
information flows between all of the components needed for this
prototype.
Telephone
users
The user picks up a phone and calls the published phone number. Audio
and DTMF are transmitted from my voice provider's T1 interface into a PRI card
on a Linux distro which runs Asterisk. Asterisk answers the call and
automatically dials the sip URL for the Freeswitch server. The
Freeswitch server then answers the sip call and places the user in a
conference bridge. From this point on, the phone caller can talk and
interact with the other callers and virtual models
that are connected to
the same virtual world.
Computer
users
A one time download and install provides the user with the required
software. The application includes two threads of execution. One
worker thread makes a connection to the Freeswitch mod_eventsocket port using
libapr TCP sockets. This connection is used to read DTMF events and send
DTMF events to the Freeswitch server. It is also used for sending
commands to the TTS interface within Freeswitch. The main thread of
execution manages the Ogre3d eventloop. This eventloop renders the
virtual world on the screen and reads keyboard and mouse events from the user
using the OIS library included within the Ogre3d SDK. The two threads
communicate with each other using
libapr's
shared memory and mutex
support.
Note: Why use both Asterisk and Freeswitch? Because I can. I've
included Asterisk for interop sake, Freeswitch could have provided this T1
interface and removed the need for Asterisk. I'm just showing that both
projects can interoperate together nicely. This prototype's event
subsystem is based on Freeswitch's mod_eventsocket, so Freeswitch is
required. I could have also implemented an event interface to Asterisk,
I did not do this.
Free open software works
This project stands on the shoulders of giants by using free software libraries and tools available on the internet today. Open Source software has come a long way in the last 10 years I've been working with it....and it is only going to get better. With open source you've got the code, you can make the change. Trouble? You can find a developer like me to provide support! Good projects have documentation, bug/feature trackers, wikis, and other content. User communities grow and people congregate within irc.freenode.net, mailinglists, and forums. For many things, the open source way is the right way.
A big thanks goes out to....
This project could not have been possible without the hard work of many contributors. I'd like to thank all the developers associated with the following projects: Freeswitch, libapr, Ogre3d, Asterisk, and Linux. Each of these projects also include other open licensed libraries, I'd like to thank those contributors as well. Thanks again to Dr. Beomjin Kim for accepting this project as an independent study project to apply towards my Purdue CS degree.
Now for a little self promotion... ;)
If you are looking for help integrating advanced web/telephony services,
custom software development, or support....
Please checkout my company TecDev -
http://www.tecdev.com/
Please visit me on my website http://dave.thehorners.com/
I'd love to hear your thoughts and comments!
Thanks.
--David A. Horner
http://dave.thehorners.com/