Subscribe:Posts Comments

You Are Here: Home » Automobiles, Electronics Presentation Topics, Electronics Projects, Electronics Seminar Topics, Others, Paper Presentation Topics, Project Presentation Topics » Voice Recognition car with webcam-Electronics Seminar Topics

ABSTRACT

To create a car controlled by voice of humans is a innovative concept. In this seminar topics project presentation topics  we use the concept of speech recognition algorithm and algorithms that will worn on for the command of the users. The switching concept is used initially,the remote is provided with the button, when that button is pressed after that the speech recognition process starts. Then after user will command for opening window , the speech recognition system will process accordingly and the respective window will open. Accordingly the other commands will be  processed.

INTRODUCTION

In this seminar topics project presentation topics we introduced a new concept of voice recognition in car which uses the concept of speech recognition algorithm. The electrical and mechanical domains are used. The digital image processing is also used. Voice recognition is coming to remote control and car navigation system .The user will command through microphone installed in the remote control of car. The signal are commanded in analogue form which needs to be converted into digital form. The car is installed with the large database which consist of vocabulary , that compose of all keywords used for commanding the car. The system is installed with fully computer system, the size of a voice-recognition program’s effective vocabulary is directly related to the random access memory capacity of the computer in which it is installed. The car is installed with special hardware that is display, which display the all the available commands and the instructions to the users to make the system user friendly .If users will input the incorrect commands the display will generate error message and provide the most related commands to the user available in the system vocabulary and keywords on display to the users. Automatic Speech Recognition (ASR) is a model of voice recognition designed for dictation .This model is installed in the car for dictation. our concept is totally based on the concept on artificial intelligence and robotics. The paper is organised as- section 2 describes general information about the system, section 3 describes how the system works, section 4 describes how speech recognition works, in section 5 process of transformation of pcm digital audio is presented, section 6 describes spoken phenomenon, in section 7 we describes how to reduce computation and increase accuracy. Section 8 presents context free grammers. Section 9 and 10 describes continuous dictation and adaptation respectively. Atlast in section 11 the conclusion is given.

GENERAL INFORMATION

When used in conjunction with the Multi Function Steering Wheel (available on many recent models), you can also operate all principal functions and accessories. You can access phone functions, including recalling stored numbers and dialing, operate Navigation System functions, or take notes through the built-in memo function. Currently, the size of the non-speaker-dependent vocabulary includes around 30 words, including numbers and commands. Spoken sequences of commands of up to five words and columns of numbers can be recognized with a high degree of accuracy[3]. You can create a telephone book with up to 40 numbers.Dialing is then simply a matter of speaking a name. Other normal telephone functions, such as repeat dialing and call hang-up are also voice activated.

HOW IT WORKS

Voice recognition uses a neural net to “learn” to recognize your voice. As you speak, the voice recognition software remembers the way you say each word. This customization allows voice recognition, even though everyone speaks with varying accents and inflection. The voice commands you use in your car are chosen from a fixed vocabulary and are passed on to the car telephone or navigation system via the telephone interface. The system gives acoustic feedback on everything recognized The system requires no lengthy voice recognition protocol and responds to a simple series of set voice commands that are not sensitive to the accent or dialect of the speaker. The voice control is a finite speech dialog system, which follows a predefined structure. Faulty operation or error recognition caneasily be corrected by simply repeating the desired command. The voice recognizer is resistant to stationary environmental noise.

seminar topics for electronics
Block Diagram

 

 

 

 

 

 

 

 

 

 

HOW SPEECH RECOGNITION WORKS

You might have already used speech recognition in products, and maybe even incorporated it into your own application, but you still don’t know how it works. This document will give you a technical overview of speech recognition so you can understand how it works, and better understand some of the capabilities and limitations of the technology.

Speech recognition fundamentally functions as a pipeline that converts PCM (Pulse Code Modulation) digital audio from a sound card into recognized speech. The elements of the pipeline are:

  1. Transform the PCM digital audio into a better acoustic representation
  2. Apply a “grammar” so the speech recognizer knows what phonemes to expect. A grammar could be anything from a context-free grammar to full-blown English.
  3. Figure out which phonemes are spoken.
  4. Convert the phonemes into words. I’ll cover each of these steps individually

TRANSFORM THE PCM DIGITAL AUDIO

The first element of the pipeline converts digital audio coming from the sound card into a format that’s more representative of what a person hears. The digital audio is a stream of amplitudes, sampled at about 16,000 times per second. If you visualize the incoming data, it looks just like the output of an oscilloscope. It’s a wavy line that periodically repeats while the user is speaking.While in this form, the data isn’t useful to speech recognition because it’s too difficult to identify any patterns that correlate to what was actually said. To make pattern recognition easier, the PCM digital audio is transformed into the “frequency domain.” Transformations are done using a windowed fast-Fourier transform.[6] The output is similar to what a spectrograph produces. In frequency domain, you can identify the frequency components of a sound. From the frequency components, it’s possible to approximate how the human ear perceives the sound.The fast Fourier transform analyzes every 1/100th of a second and converts the audio data into the frequency domain. Each 1/100th of a second results is a graph of the amplitudes of frequency components, describing the sound heard for that 1/100th of a second. The speech recognizer has a database ofseveral thousand such graphs (called a codebook) that identify different types of sounds the human voice can make. The sound is “identified” by matching it to its closest entry in the codebook, producing a number that describes the sound. This number is called the “feature number.” (Actually, there are several feature numbers generated for every 1/100 the of a second but the process is easier to explain assuming only one.) The input to the speech recognizer began as a stream of 16,000 PCM values per second. By using fast Fourier transforms and the codebook, it is boiled down into essential information,producing 100 feature numbers per second.

REDUCING COMPUTATION AND INCREASING ACCURACY

The speech recognizer can now identify what phonemes were spoken. Figuring out what words were spoken should be an easy task. If the user spoke the phonemes, “h eh l oe”, then you know they spoke “hello”. The recognizer should only have to do a comparison of all the phonemes against a lexicon of pronunciations. It’s not that simple.

  1. The user might have pronounced “hello” as “h uh loe”, which might not be in the lexicon.
  2. The recognizer may have made a mistake and recognized “hello” as “h uh l oe”.

CONTEXT FREE GRAMMERS

One of the techniques to reduce the computation and increase accuracy is called a “Context Free Grammar” (CFG). CFG’s work by limiting the vocabulary and syntax structure of speech recognition to only those words and sentences that are applicable to the application’s current state. The speech recognition gets the phonemes for each word by\ looking the word up in a lexicon. If the word isn’t in the lexicon then it predicts the pronunciation; See the “How Textto- Speech Works” document for an explanation of pronunciation prediction. Some words have more than one pronunciation, such as “read” which can be pronounced like “reed” or “red”. The recognizer basically treats one word with multiple pronunciations the same as two “words”. CFG’s slightly change the hypothesis portion of speech recognition. Rather than hypothesizing the transition to all phonemes, the recognizer merely hypothesizes the transition to the next acceptable phonemes. From the initial “silence” phoneme the recognizer hypothesizes the “s” in send, “k” in “call”, and “eh” in exit. If the recognizer hypothesizes phoneme transitions from the “s” phoneme, it will only hypothesis “eh”, followed by “n”, “d”, “m”, “ae”, “l”, etc. You can see how this significantly reduces the computation.Instead of increasing the number of hypotheses by a factor of 50 each time, the number of hypotheses stay constant within a word, and only increase a little bit on word transitions. Given a normal amount of pruning, there are no more than about 10 hypotheses around at a time.[11] When the user has finished speaking, the recognizer returns the hypothesis with the highest score, and the words that the user spoke are returned to the application.

CONTINUOUS DICTATION

Continuous dictation allows the user to speak anything he/she wants out of a large vocabulary. This is more difficult than discrete dictation because the speech recognition engine doesn’t easily know where one word ends and the next begins.For example: Speak out loud “recognize speech” and “wreck a nice beach” quickly; They both sound similar.[15]Continuous dictation works similar to discrete dictation except the end of a word is not detected by silence. Rather, when ahypothesis reaches the end of a word in continuous dictation, it then produces thousands of new hypotheses and prunes those out. The language model probability helps to prune the hypothesis down a lot more in continuous dictation.[12]Recognizers use a lot more optimizations to optimize processing and memory in continuous dictation systems. The article won’t cover those here because their description doesn’t help explain the underlying technology.

ADAPTATION

Speech recognition system “adapt” to the user’s voice, vocabulary, and speaking style to improve accuracy. A system that has had time enough to adapt to an individual can have one fourth the error rate of a speaker independent system. Adaptation works because the speech recognition is often informed (directly or indirectly) by the user if it’s recognition was correct, and if not, what the correct recognition is. The recognizer can adapt to the speaker’s voice and variations of phoneme pronunciations in a number of ways. First, it can gradually adapt the codebook vectors used to calculate the acoustic feature number. Second, it can adapt the probability that a feature number will appear in a phoneme. Both of these are done by weighted averaging[13].The language model can also be adapted in a number of ways.The recognizer can learn new words, and slowly increase probabilities of word sequences so that commonly used word sequences are expected. Both these techniques are useful for learning names.

CONCLUSION

This seminar topics presentation was a high level overview of how speech recognitionworking in the cars. To use the voice concept is very complex process in automobiles because some applications are more complex to install and use .one can easily open the windows  by using the concept of voice recognition and close as well. The other applications possible are , controlling the music system, commanding over the power windows ,steering locking .The voice recognition concept is very much innovative and sensitive concept in the field of automobiles and iti can be made more secure using the concept of finger print analysis process..

DOWNLOAD FULL SEMINAR PROJECT PRESENTATION TOPICS ON ‘Voice Recognition car with webcam’

 

JJ

This is Mr.Jose John, 21 yrs old guy, currently pursuing final year mechanical engineering, now become an enthusiastic blogger and a successful entrepreneur.
Connect with him on:

Facebook Twitter LinkedIn Google+ 

email
Related Posts Plugin for WordPress, Blogger...

2 Comments

  1. Elvira says:

    this is amazing stuff i think this is an extra achievement keep it up

  2. joyal says:

    thanks for the share, doing a good job.

Leave a Reply

© 2012 Latest Seminar Topics | Mechanical Mini Projects | Electronics Presentation | Engineering Presentations for Download · Subscribe:PostsComments · Designed by Theme Junkie · Powered by WordPress