Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications
by Lakhmi C. Jain; N.M. Martin
CRC Press, CRC Press LLC
ISBN: 0849398045   Pub Date: 11/01/98
  

Previous Table of Contents Next


Chapter 10
Multimedia Telephone for Hearing-Impaired People

F. Lavagetto
Dept. of Telecommunications, Computer and Systems Sciences
University of Genova
Italy

Human behaviors are expressed through different complementary modalities working in cooperation toward motorial/sensor goals actually guaranteed by successful coordination. Everyday experience provides a quantity of clear evidence of this phenomenon like sight-motor coordination in grasping, sewing, or walking. Speech production and perception are further examples of biological multimodal mechanisms in which different sensorial channels are used to convey information (production) and whose outputs are fused to decode information (perception).

The conversion of speech into visual information addresses the fascinating world of multimedia integration and multimodal communication. The possibility of converting the communication modality while preserving the conveyed information further highlights the foreseeable applications of related techniques and systems. The advent of a unified worldwide market will require the introduction of new standards in each of the many components of the service-integrated environment like terminal equipment, interfaces, and networks. The possibility of having only a few telecommunication companies in the world will encourage the need of a less fragmented market, cooperatively oriented to providing large-scale service.

Rehabilitation technology (RT) will definitely play a central role in the above depicted scenario since an increasing share of consumers will explicitly demand applications and services in the fields of interpersonal communications, man-machine interaction, tele-work and tele-education with special aids for overcoming impairments due to age, handicaps, and temporary or permanent diseases. In a future technological society based on integrated multimedia services at low cost, easy access, high capillarity and privacy, multimedia approaches to interpersonal [1-4] communication will definitely represent a means of formidable strength to overcome most of still existing social barriers.

As far as interpersonal communications are concerned, low cost and compact terminal equipment interfaced to mass communication lines, capable of converting in real-time, incoming messages from whatever source modalities to more suitable destination modalities would provide relevant hints toward the goal of social integration. Waiting for future revolutionary services provided by intelligent networks relying on large bandwidth connections, short-term applications must be primarily oriented to exploitation of the existing public networks and, first of all, the analogue telephone lines. The deployment of the telephone switched lines has many features including the utmost advantage of guaranteeing home-to-home connections at very affordable costs, sound network management and plant maintenance, facilities for international connections, progressive upgrading due to forthcoming technological improvements, interconnection to other digital communication services, and interface to ground cellular communication between mobile stations.

1. Introduction

Speech communication is considered as the richest means of human interaction. The service providers in telecommunications have always managed this worldwide business without paying enough attention to the needs of hearing-impaired consumers who are evidently unsuited to the acoustic medium and are, therefore, partially or totally excluded from this primary source of communication. Several attempts have recently been made to process the speech analogue signal in order to filter out noise, reduce the distortion, enhance the quality, and, finally, drive suitable electro-acoustic-visual devices [5-7]. Thanks to these sophisticated techniques, many communication barriers have been overcome and new relay and mediation services are offered in the field of social and cultural integration.

Moreover, the exploitation of a priori knowledge on the bimodal acoustic-visual nature of speech production and perception makes it possible to process the speech signal and extract suitable parameters, capable of driving the animation of a synthetic mouth where lips movements are faithfully reproduced [8-9]. The hardware required is very simple and basically located at the receiver (some suitable preprocessing can be optionally performed at the transmitter). It consists of some electronics for processing incoming speech and driving the animation of the lip icon on a small display device. “Intelligent” receiver equipment, with its low complexity and cost-effectiveness, is also reasonably compact and could be optionally plugged into any conventional telephone set.

Within the European TIDE initiative (Technology Initiative for Disabled and Elderly people) in the field of Rehabilitation Technology, the consortium SPLIT1 has addressed directly the ambitious goal of converting speech into lip-readable visualization for the development and experimentation of a multimedia telephone for hearing-impaired people. A prototype version of this system is shown in Figure 1. Here a normal hearing caller is connected, through a conventional telephone (PTS) line, to an Intelligent Network (IN) node equipped with processing and converting incoming speech into corresponding visual parameters which are then transmitted along a ISDN (Integrated Service Data Network) down-link to a PC terminal located at the hearing-impaired receiver.


1SPLIT, activated in 1994 with the participation of DIST as full member, was a 2-year pre-competitive project oriented to the development and experimentation of advanced multimedia technology in the field of speech rehabilitation of deaf persons.


Figure 1  The multimedia telephone: a system for real-time conversion of analogue telephone calls into digital graphic animation (1063-6528/95$04.00 © 1995 IEEE).

Promising industrial applications are foreseen considering the large market of potential consumers interested not only in telephone interpersonal communication but also in a multitude of other unconventional applications, including better human interfaces to public offices and environments. Let us take simple examples like a school/university class, a conference room, an open-air talk, or a theater representation.

Many other examples of multimedia applications can be mentioned, where lip synchronization with speech represents a key issue, both short/medium term as cited above and medium/long term, like those oriented to more ambitious objectives: multimedia digital manipulation for content creation, virtual reality, augmented reality, and interactive gaming.


Previous Table of Contents Next

Copyright © CRC Press LLC