![]() |
Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications
by Lakhmi C. Jain; N.M. Martin CRC Press, CRC Press LLC ISBN: 0849398045 Pub Date: 11/01/98 |
Previous | Table of Contents | Next |
Figure 13 shows an isolated word recognition system. The recurrent neural fuzzy technique is used for the recognition step. Speech signals are coded using LPC cepstrum and vector quantization (VQ) is used. A VQ codebook size of 256, 9 pole LPC, 16 KHz sampling rate (with 16 bit speech amplitudes), and 300 samples per frame are used. The TIMIT [3] data base (a speech database developed by Texas Instruments and Massachusetts Institute of Technology, and sponsored by United States Defense Advanced Research Projects Agency) is used to develop the code book as well as to train the RNFS. The recurrent neural net is trained with about 200 speakers from different U.S. regions using 11 words from SA (dialect) sentences. Testing is also done using the TIMIT database (using speakers from both test and train directories). The recognition accuracy is 90%, comparable to HMM based recognition.
Sampled speech signal from the TIMIT database is applied to the LPC cepstrum block which produces cepstrum coefficients. A VQ code book is generated by grouping (using binary split codebook generation algorithm, [17]) cepstrum coefficients from many speakers into basic speech units (in this case 256, i.e., the size of the code book). Each basic speech unit is represented by a cepstrum vector of dimension 9. The next step of the training phase is to train the RNFS. Sampled speech data from TIMIT (but from the files that have desired word data to be recognized) is used. This time, the LPC cepstrum coefficients are directly applied to the VQ index generation block.
The VQ index generator compares the input coded speech (i.e., cepstrum coefficients) with the contents (i.e., basic speech units) in the VQ codebook generated by the VQ generation block. The VQ index generation block determines the basic speech unit of VQ codebook that has the closest match with the input coded speech and outputs the corresponding address which is known as the VQ codebook index. VQ indices for many speakers for the desired words are then used to train the RNFS.
Figure 13 Isolated word recognition using Recurrent Fuzzy Logic.
In the recognition phase, sampled speech data from the TIMIT data base or from the microphone is applied to the LPC cepstrum generator whose output is used by the VQ index generation block. The generated VQ indices for a word are applied to the RFL classifier. The RFL classifier then recognizes the input word. Note that there is one RFL for each word and properly labeled. The RFL that provides maximum output (or lowest error) indicates the recognized word. Note that after the training is completed, RNFS automatically generates recurrent fuzzy logic rules and membership functions which are used by the RFL in the recognition phase.
The need to solve highly nonlinear, time variant problems has been growing rapidly as many of todays applications have nonlinear and uncertain behavior which changes with time. Conventional mathematical model based techniques can effectively address linear, time invariant problems. However, their capabilities to address more complex nonlinear time variant problems are limited. Currently, no model based method exists that can effectively address complex, nonlinear, and time variant problems in a general way. These problems, coupled with others (such as problems in decision making, prediction, etc.) have inspired a growing interest in intelligent techniques including Fuzzy Logic, Neural Networks, Genetic Algorithms, Expert Systems, and Probabilistic Reasoning. Intelligent Systems, in general, use various combinations of these techniques to address real world complex problems. In this chapter, we have addressed the intelligent systems based on various combinations of neural nets and fuzzy logic, called Neural Fuzzy Systems (NFS). The rationale to combine fuzzy logic with neural nets is emphasized to alleviate the limitations of each of these technologies while adding their advantages. We have presented elegant algorithms to combine neural nets with fuzzy logic, resulting in both feed forward and recurrent neural fuzzy systems. NFS provide several key advantages/features which were highlighted and discussed. Because of the added features, NFS can address problems in wide application areas including control, battery charging, handwriting recognition, speech recognition, language translation, decision making, and forecasting.
NFS techniques are applied to solve many real world problems and we reported a few, namely, motor control, toaster control, and speech recognition. For the motor control application, NFS has reduced overshoots and settling time at start up, while maintaining approximately the same rise time. Both PID and NFS controllers produced zero steady state error. However, when load is changed, NFS produced considerably less error than the PID approach. For the toaster control problem, NFS essentially solved the key problem of maintaining the desired darkness level for variations in the moisture content, types of bread, size of bread, and initial temperature. For speech recognition, which is a more complex problem, the performance of recurrent NFS was found to be comparable to the conventional approaches. Application of NFS to speech recognition problems is relatively new and we believe that in the future NFS will significantly help improve the performance of speech recognition.
Previous | Table of Contents | Next |