"hello world"
article in Personal projects-and-research

VirtualTelephonyWorld - multi-modal PC-telephone convergence prototype.

VirtualTelephonyWorld

PC-telephone convergence for multi-modal interactive environments

David Horner ()

Dr. Beomjin Kim ()
CS494 Topics in virtual reality - 2006
Indiana Purdue University Fort Wayne

 


This document is available from Google Docs:
VirtualTelephonyWorld - http://docs.google.com/Doc?id=d8bmkqt_5zz8r43

Latest windows binary installer release:
http://dave.thehorners.com/personal/virtualtelephonyworld/VirtualTelephonyWorld-1.0.0.5.exe

Release Date: April 06, 2007

Source code access is available at:
SVN access is: https://dave.thehorners.com/svn/virtualtelephonyworld/trunk



VirtualTelephonyWorld is a prototype in which you can view and animate a 3d robot model within a shared multi-user virtual world.  Each user can view the virtual world from a shared 3d perspective and can interact with it using the touchtone phone and keyboard.  This virtual world also exposes a high performance conference bridge which allows the users to collaborate remotely with other people exploring the same environment.  This prototype demonstrates how open source software has made it possible to develop sophisticated multi-modal PC-telephone converged applications.  It will utilize free software libraries which strive to be cross platform and allow for free and commercial use.*

Giving a group of users the ability to interact with each other on a telephone conference enables people to work remotely.  Many people spend a large amount of their work week on the telephone in conference calls.  However, complex subjects like science, engineering, and math problems are much easier to solve and understand when a team of people can visualize data and communicate together.   

Since the advent of the telephone, people have struggled to effectively communicate simple things like phone numbers, credit card numbers, URLs, and other hard to pronounce/spell words on the phone. We've developed tools like email, web browsers, internet chat, and SMS to communicate information accurately between colleagues.  But each of these mechanisms require a certain amount of prior knowledge and setup.

Users can not only talk to one another, but they can also interact with software interactive voice response(IVR).  Instead of having to listen to available options one at a time, the user can interact with a visual representation of the IVR.  This allows the user to navigate IVR prompts with ease because they are able to travel in a non-linear fashion.
 
This prototype is written in C and C++.  I will be using the C library libapr for cross platform ADTs, threads, sockets, and mutexes.  I reviewed several open source libraries for 3d rendering and chose the C++ Ogre3d framework because it was packaged cross platform and well documented.  Last and most importantly, I will use Freeswitch for things relating to the conference and event subsystems.  FreeSWITCH is an open source telephony platform written in C/C++ to facilitate the creation of telephony services.  It handles all the termination, network events, and channel mixing for the conference. 

Features / Requirements:

What do you mean multi-modal HCI?

This project explores the development of a multi-modal human computer interface (HCI).  It provides the following modalities of interaction: visual display, speech synthesis, keyboard, mouse, and touchtone phone.  Speech recognition is also easy to implement using a commercial speech engine or by using a VXML provider.   Applications which expose many modalities are more accessible for people with disabilities and provide a more enriched human computer interaction.
 

What does the prototype do?

VirtualTelephonyWorld is a prototype which allows many users to animate a robot model placed in the center of a virtual world.  It provides a telephone conference bridge which can be accessed through a sip softphone, google talk, or a plain old telephone.  When a user presses a button 1 through 5 (using a keyboard or the telephone), the robot will transition to the animation state representing the button pressed.  The robot provides text to speech notifications for the telephone users connected.  Button press events are triggered from a telephone's DTMF touchpad or from a PC keyboard.

What do you mean by model?
This prototype relies on the Ogre3d rendering framework.  A model in this prototype refers to a model entity within Ogre3d speak.  Ogre3d provides the needed functionality to render and display the first person perspective of the shared virtual 3d world.  Ogre3d exposes a scene graph which allows the programmer to add models into the 3d world and maintain relationships between these model entities.  I've implemented a very simple graph for this prototype which simply loads the robot.mesh resource from the Ogre3d SDK distribution.  This robot.mesh includes all the polygons, textures, animation states, etc. needed to load and display the robot within my prototype.  I've also included some code to add textual overlay information.  The picture to the right is a keyboard map indicating the keys to press in yellow and the animation state name to the user. 

High level network diagram
The diagram below indicates how information flows between all of the components needed for this prototype.

Telephone users
The user picks up a phone and calls the published phone number.  Audio and DTMF are transmitted from my voice provider's T1 interface into a PRI card on a Linux distro which runs Asterisk.  Asterisk answers the call and automatically dials the sip URL for the Freeswitch server.  The Freeswitch server then answers the sip call and places the user in a conference bridge.  From this point on, the phone caller can talk and interact with the other callers and virtual models
that are connected to the same virtual world.

Computer users
A one time download and install provides the user with the required software.  The application includes two threads of execution.  One worker thread makes a connection to the Freeswitch mod_eventsocket port using libapr TCP sockets.  This connection is used to read DTMF events and send DTMF events to the Freeswitch server.  It is also used for sending commands to the TTS interface within Freeswitch.  The main thread of execution manages the Ogre3d eventloop.  This eventloop renders the virtual world on the screen and reads keyboard and mouse events from the user using the OIS library included within the Ogre3d SDK.  The two threads communicate with each other using
libapr's shared memory and mutex support.

Note: Why use both Asterisk and Freeswitch?  Because I can.  I've included Asterisk for interop sake, Freeswitch could have provided this T1 interface and removed the need for Asterisk.  I'm just showing that both projects can interoperate together nicely.  This prototype's event subsystem is based on Freeswitch's mod_eventsocket, so Freeswitch is required.  I could have also implemented an event interface to Asterisk, I did not do this.


Free open software works

This project stands on the shoulders of giants by using free software libraries and tools available on the internet today.  Open Source software has come a long way in the last 10 years I've been working with it....and it is only going to get better.  With open source you've got the code, you can make the change.  Trouble?  You can find a developer like me to provide support!  Good projects have documentation, bug/feature trackers, wikis, and other content.  User communities grow and people congregate within irc.freenode.net, mailinglists, and forums.  For many things, the open source way is the right way.


A big thanks goes out to....

This project could not have been possible without the hard work of many contributors.  I'd like to thank all the developers associated with the following projects: Freeswitch, libapr, Ogre3d, Asterisk, and Linux.  Each of these projects also include other open licensed libraries, I'd like to thank those contributors as well.  Thanks again to Dr. Beomjin Kim for accepting this project as an independent study project to apply towards my Purdue CS degree.

 


Now for a little self promotion... ;)

 

If you are looking for help integrating advanced web/telephony services, custom software development, or support....
Please checkout my company TecDev - http://www.tecdev.com/


 


If you'd like to hear more about me and my pet projects......

Please visit me on my website http://dave.thehorners.com/


I'd love to hear your thoughts and comments!
Thanks.

--David A. Horner
http://dave.thehorners.com/



BTW: I will be presenting this work at the 2007 ClueCon telephony developer conference in Chicago. Please come if you've got the time!
Cluecon
Created: 2007-05-14 16:45:30 Modified: 2013-08-30 18:51:16
/root sections/
>peach custard pie
>linux
>windows
>programming
>random tech
>science
>research


moon and stars



My brain

Visible Dave Project


\begin{bmatrix} 1 & 0 & \ldots & 0 \\ 0 & 1 & 0 & \vdots \\ \vdots & 0 & \ddots & 0\\ 0 & \ldots & 0 & 1_{n} \end{bmatrix}