batltt26's picture
From batltt26 rss RSS  subscribe Subscribe

Speech To Sign Language Interpreter System 

Speech To Sign Language Interpreter System

 

 
 
Tags:  available  domain  name 
Views:  690
Published:  May 05, 2010
 
0
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
Social Mktg Secrets

Social Mktg Secrets

From: amonty26
Views: 403 Comments: 0
Social Mktg Secrets
 
Wp Clickto Buy Guide

Wp Clickto Buy Guide

From: cotte63
Views: 398 Comments: 0

 
How to get cheap domain name

How to get cheap domain name

From: akmalmarketing
Views: 26 Comments: 0
Get your cheap domain name at http://x.co/cHJr
 
Cheap Domain Names For Sale

Cheap Domain Names For Sale

From: mandrake619
Views: 11 Comments: 0
Tips for domain name registration and picking the right domain. Web hosting basics.

Godaddy Website Builder & Domains Promo Codes: Save money at Godaddy using this special
p (more)

 
Uses in bulk domain registrations

Uses in bulk domain registrations

From: brianm
Views: 319 Comments: 0

 
Domain Name Registration - Giving Identification to Your Website

Domain Name Registration - Giving Identification to Your Website

From: mandrake619
Views: 13 Comments: 0
Tips for domain name registration and picking the right domain. Web hosting basics.

Godaddy Website Builder & Domains Promo Codes: Save money at Godaddy using this special
p (more)

 
See all 
 
More from this user
[Finance]Protect Your Credit Card From Theft[6575]

[Finance]Protect Your Credit Card From Theft[6575]

From: batltt26
Views: 194
Comments: 0

Cabaret Red District

Cabaret Red District

From: batltt26
Views: 699
Comments: 0

Prezentacja Jabber

Prezentacja Jabber

From: batltt26
Views: 13
Comments: 0

Präsentation "Social Media Marketing" von Roman Anlanger

Präsentation "Social Media Marketing" von Roman Anlanger

From: batltt26
Views: 628
Comments: 0

advance auto parts 2007_AR

advance auto parts 2007_AR

From: batltt26
Views: 308
Comments: 0

Understanding Health Insurance - HealthCompare

Understanding Health Insurance - HealthCompare

From: batltt26
Views: 67
Comments: 0

See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: International Islamic University Malaysia .Kulliyyah of Engineering, ECE Dept Speech to Sign Language Interpreter System By: Khalid El-Darymli G0327887 Supervisor: Dr. Othman O. Khalifa
Slide 2: OUTLINE Problem statement. Research goal and objectives. Main parts of our system. The structure of ASR: – – – SP, Training: AM, Dictionary and LM, and Decoding: the Veterbi beam search. Sign Language, ASL and ASL alphabets. Signed English. Demo. of ASL in our SW. Milestone.
Slide 3: Problem Statement There is no free software, let alone one with a reasonable price, to convert speech into sign language in live mode. There is only one software commercially available to convert uttered speech in live mode to a video sign language This software is called iCommunicator and in order to purchase it deaf person has to pay USD 6,499! IS IT FAIR? !
Slide 4: RESEARCH GOAL AND OBJECTIVES Design and Manipulation of Speech to Sign Language Interpreter System. The SW is open source and freely available which in turn will benefit the deaf community. To fill the gap between deaf and nondeaf people in two senses. Firstly, by using this SW for educational purposes for deaf people and secondly, by facilitating the communication between deaf and nondeaf people. To increase independence and self-confidence of the deaf person. To increase opportunities for advancement and success in education, employment, personal relationships, and public access venues. To improve quality of life.
Slide 5: Main Parts of Speech to Sign Language Interpreter System Continuous Input Speech Speech-Recognition Engine Recognized Text ASL pre-recorded Video-clips Database Recognized Text ASL Translation
Slide 6: Automatic Speech Recognition (ASR): Input Voice SR Engine Recognized Text SR systems are clustered according to three categories: Isolated vs. continuous, speaker dependent vs. speaker independent and small vs. large vocabulary. The expected task of our software entails using a large vocabulary, speaker independent and continuous speech recognizer.
Slide 7: The Structure of SR Engine (LVCSR) Input Audio Signal Processing X={x1,x2, …, xT } TRAINING AM P(A1, …, AT | P1,… , Pk) Dictionary P(P1, P2, …, Pk | W ) LM P(Wn | W1, …, Wn-1) DECODING Hypothesis Evaluation P(X | W)*P(W) Decoder }H = }W1, W2, …, Wk WBEST Best Hypotheses
Slide 8: short-time signal derived from the FFT of that signal.  It MFCCto extract spectral featuresY [e jω ] theparameter. on the logarithm is used computation consists pre-emphasis inverse capture the α is the of t First computational complexity,performing be used toDFT TO reduce and Second order differences mayevaluated only for a discrete number of is of dynamic evolution of the signal.output: speechmagnitude of the filterbank through properly integrating of the ω values ω =2π k/N then the DFT of all frames of the signal is obtained: a spectrum at defined frequency ranges.N Yt [ kofM= ttriangular ], k = 0,..., N −1 ] the [e j 2πk / Y  The transfer function −1  The information of MFCC computed each 1 π ∆ct = c + 2 − c delta DFT from: The phasefirst ordert [n] =theis givenm] }. cos n(m −frame is, discardedt M − 1 t − 2 c [ by: y[n] )  n `= 0,...,  mel-weighting filters Hm[k] ln} S tsamples of Speech x[n], 16-bits ∑ 2 M  yt [n]  m =0 Final output of this stage is:Pre-emphasis Framing Windowing The S t k ] = ( real Y ∆∆ ) 2 + ( ct +1 − Yt [ k 2 waveform second order delta MFCC[computed from:t [ k ]c t = ∆imag (∆c t −1 ] ) integer data Pre-emphasis Powermel filterbank: Spectrum MFCC computation: Delta representation]defined]Delta−computation and The = x[n as∞αx[(FRONT-END) Double + the real 1] SIGNAL aPROCESSING[n −tcepstrum ]of−ajωn SFT calculated using: yt [ − The MFCC is Y[ n e jω ] = ∑y n .Q ].w[ n e windowed n =− ∞ : 0 k < f [m − 1]   TYPICALLY FOR SPEECH RECOGNITION ONLY   2(k − f [m − 1]) yt[n]  THE FIRST 13 COEFFICIENTS fAREf [USED.f [m − 1] ≤ k ≤ f [m] Framing and (Windowing[m] − m − 1]) f [m + 1] − f [m − 1])(  H m [St[m] k] =  Typical frame duration the FE recognitionMel(would k ) is f ms, − comprise 39 features The final output of in speechprocessing210[m + 1]St[k] Powerf [Spectrum + 1]  m] ≤ k ≤ f [ m IDFT ln| |2 is 25  while typical window duration ms. Filterbank  ( f [m + 1] − f [m − 1])( f [m + 1] − f [m])  Calculation vector (observations vector  t) per each processed frame. X  0 k > f [m + 1]   y t`[n] ≡ y[n − t.Q], 0 ≤ n ≤ N , 1 ≤ t ≤ T ` N −1 y t [ mel-spectrum of the  The n] ≡ w[ n]. y t [ n] power spectrum is computed by:   2πn13 ct[n] S [m] =  0.54 − 0.46 cos n = 0,..., N − 1 S t [k ]H m [k ] m = 0,1,..., M − 1  t  13 ∆ct[n] w[n] =  N − 1  k=0 13∆∆c [n] where k  the DFT 0 is domain index, N tis the length of the DFT, and M is total number otherwise  ∑ of triangular mel-weighting filters.
Slide 9: Speech waveform of a phoneme “\ae” After pre-emphasis and Hamming windowing Explanatory Example Power spectrum MFCC
Slide 10: TRAINING It is the process of learning the AM, Dictionary and LM . AM P(A1, …, AT | P1,… , Pk) Dictionary P(P1, P2, …, Pk | W ) LM P(Wn | W1, …, Wn-1) Acoustic Model (AM): The AM provides a mapping between a unit of speech and an HMM that can be scored against incoming features provided by the Front-End. It contains a pool of a Hidden Markov Models (HMM). For large vocabularies each word is represented as a sequence of phonemes, accordingly there has to be an AM per each phoneme, moreover, it has to be depending on the context (e.g. co-articulation) and even the context dependence may cross word boundary. Phones are then further refined into context-dependent triphones, i.e., phones occurring in given left and right phonetic contexts.
Slide 11: HMMs a00 S0 b0(k) a11 S1 b1(k) a22 S2 b2(k) S3 HMM is defined by the model parameters Φ =(A, B, π). For each acoustic segment, there is a probability distribution across acoustic observations bi(k). The leading technique is to represent the acoustic observations as a mixture Gaussian distribution or shortly Gaussian Mixtures (GM).
Slide 12: Dictionary: AM P(A1, …, AT | P1,… , Pk) Dictionary P(P1, P2, …, Pk | W ) LM P(Wn | W1, …, Wn-1) Dictionary is a file contains pronunciations for all the words of interest to the decoder. For large vocabulary speech recognizers pronunciations are specified as a linear sequence of phonemes. Some digits pronunciations: ZERO  Z IH R O EIGHT  EY TD Multiple pronunciations ACTUALLY  AE K CH AX W AX L IY ACTUALLY(2nd)  AE K SH AX L IY ACTUALLY(3rd)  AE K SH L IY Compound words: WANT_TO  W AA N AX
Slide 13: Language Model (LM): AM P(A1, …, AT | P1,… , Pk) Dictionary P(P1, P2, …, Pk | W ) LM P(Wn | W1, …, Wn-1) It is a statistical LM where the speaker could be talking about any arbitrary topic. The main used model is the n-gram statistics and in particular trigram (n=3), P(Wt|Wt-1,Wt-2). Bigram and Unigram LMs have to be employed as well.
Slide 14: RECOGNITION Given an input speech utterance the goal is to UNVEIL the BEST hidden state sequence. Let S=(s1,s2,…,sT) be the sequence of states that are recognized and xt be the feature samples computed at time t, where the feature sequence from time 1 to t is indicated as: X=(x1,x2,…,xt ). Accordingly, the sequence of recognized states S* could be obtained by: S*=ArgMax P(S,X|Φ ). xt Search Algorithm * St , P(xt,{st}|{st-1},Φ ) Static StructureΦ S {t-1S} Dynamic Structure
Slide 15: Initialization: For The Veterbi Beam search 1 ≤i ≤N V1 (i ) = πi bi ( X 1 ) Goto XX Recursive Step: For 2 ≤ t ≤ T { Goto XX } For 1≤ k ≤ N Vt ( k ) = Vt −1 ( j ) a jk bk ( X t ) t = T −1, T − 2,...,1 Backtracking: s t* = ArgMax Vt +1 ( s t*+1 ) * * * S * = ( s1 , s2 ,..., sT ) is the best sequence XX: For 1≤ ≤ i N Find pt(st*)= Max[Vt(i)] Calculate the threshold ϑb = p t (s t *) b For 1 ≤ j ≤ N { ≥ϑ MEMORIZE both Vt(j) and path "j" If pt(st=j) Else DISCARD Vt(j) } b Return
Slide 16: SIGN LANGUAGE Sign Language is a communication system using gestures that are interpreted visually. As a whole, sign languages share the same modality, a sign, but they differ from country to country.
Slide 17: AMERICAN SIGN LANGUAGE (ASL) ASL is the dominant sign language in the US, anglophone Canada and parts of Mexico. Currently, approximately 450,000 deaf people in the United States use ASL as their primary language ASL signs follow a certain order, just as words do in spoken English. However, in ASL one sign can express meaning that would necessitate the use of several words in speech. The grammar of ASL uses spatial locations, motion, and context to indicate syntax.
Slide 18: ASL ALPHABETS It is a manual alphabet representing all the letters of the English alphabet, using only the hands. Making words using a manual alphabet is called fingerspelling. Manual alphabets are a part of sign languages For ASL, the one-handed manual alphabet is used. Fingerspelling is used to complement the vocabulary of ASL when spelling individual letters of a word is the preferred or only option, such as with proper names or the titles of works. Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz
Slide 19: SIGNED ENGLISH (SE): SE is a reasonable manual parallel to English. The idea behind SE and other signing systems parallel to English is that deaf people will learn English better if they are exposed, visually through signs, to the grammatical features of English. SE uses two kinds of gestures: sign words and sign markers. Each sign word stands for a separate entry in a Standard English dictionary. The sign words are signed in the same order as words appear in an English sentence. Sign words are presented in singular, non-past form. Sign markers are added to these basic signs to show, for example, that you are talking about more than one thing or that some thing has happened in the past. When this does not represent the word in mind, the manual alphabet can be used to fingerspell the word. Most of signs in SE are taken from the American Sign Language. But these signs are now used in the same order as English words and with the same meaning.
Slide 20: ASL vs. SE (an Example) It is alright if you have a lot ASL Translation SE Translation IT IS ALL RIGHT IF YOU HAVE A LOT
Slide 21: DEMONSTRATION OF THE ASL IN OUR SW: Recognized Word (SR engine’s output) In case of nonbasic word, extract the basic word out of it Is the basic word within the ASL database vocabulary? No None of the database contents matched the input basic word Yes A number of 2,600 ASL prerecorded video clips Only in case of a nonbasic input word, append some suitable marker The equivalent ASL video clip of the input word, some marker could be appended The American Manual Alphabet Fingerspelling of the original input word Final Output
Slide 22: Speech to Sign Language Interpreter System - MILESTONE Thesis Writing Outline & Progress % Drafted Chapter 1: Introduction Chapter 2: State-of-the-Art of SR Chapter 3: Sphinx SR Chapter 4: Sphinx Decoder Chapter 5: Sign Language Chapter 6: SW Demo ., Conclusions & Further Work SW Development & Progress % Completed SR Engine ASL Database Overall Integrated SW Appendices
Slide 23: Thank You Your Questions Are Most Welcomed

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location