为了辅助英语口语练习,花了一天时间捣鼓了段代码自动生成所给文章的音标和三线。本文简单讲述了实现的过程。用到了网页爬虫,python,JavaScript的知识。为了练习英文写作能力,后续内容用英语书写。
In order to make practicing oral English easier, I develope the Phonetic Symbol Genereator. This article discribe the process of how I develop the Phonetic Symbol Generator which you can download HERE.
How to use it
- open “phonetic-symbol-generator.html” via browser.
- Input your article in the box, and click “Convert”. When the phonetic symbols appear, you can click it to change the position vertically.
- If the phonetic symbol is incorrect, you can click the word above the line to modify it.
- Enjoy it!
How do I develop it
- Collect the most frequently used words.
Category | Source |
---|---|
Basic words | words required in primary school/middle school/high school/CET-4/CET-6/TOEFL/IELTS/GRE |
Special form | plural(like phenomenon-phenomena), past tense and past participle(like give-gave-given), comparative degree and superlative degree(bad-worse-worst) |
Name | frequently used names |
Place | popular cities |
Use python, including
urllib2
andre
models, to grab phonetic symbols of the words I collected.Get the html content:
def get_url_content(url): i_headers = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1) Gecko/20090624 Firefox/3.5", "Referer": 'http://www.baidu.com'} req = urllib2.Request(url, headers=i_headers) return urllib2.urlopen(req).read()
Extract the phonetic symbol and store it in dictionary:
url='http://www.macmillandictionary.com/us/dictionary/american/'+wordlist[i+j][0:-1]; html_text=get_url_content(url) pron_str=re.findall(r'</span>\D*<span class="SEP" context="PRON-after"',html_text)
Write JavaScript program to generate a given passage’s phonetic symbols and “stress & light”.
- find the word in the dictionary
- if it cannot be found in the dictionary, adjust the word to get its original form:
End up with -ing
Adjustment on word | Adjustment on phonetic symbols |
---|---|
remove -ing | add ‘ɪŋ’ at the end |
remove -ing and add -e | add ‘ɪŋ’ at the end |
remove -ing and the preceding consonant | add ‘ɪŋ’ at the end |
remove -ying and add -ie | add ‘ɪŋ’ at the end |
- End up with -s:
Adjustment on word | Adjustment on phonetic symbols |
---|---|
remove -s | if the end letter of the origin word is ‘e’ then add ‘iz’ at the end; else if the end letter of the phonetic symbol is voiceless consonant(ptkfθsw∫rh) add ‘s’ at the end; else add ‘z’ at the end |
remove -es | the same as above |
remove -ies and add -y | the same as above |
remove -ves and add -f | add ‘vz’ at the end |
remove -ves and add -fe | add ‘vz’ at the end |
- End up with -ed
Adjustment on word | Adjustment on phonetic symbols |
---|---|
remove -d | if the end letter of the phonetic symbol is ‘t’ or ‘d’, add ‘ɪd’ at the end; else if the end letter of the phonetic symbol is voiceless consonant(ptkfθsw∫rh) add ‘t’ at the end; else add ‘d’ at the end |
remove -ed | the same as above |
remove -ied and add -y | the same as above |
remove -ed and the preceding consonant | the same as above |
- End up with -er:
Adjustment on word | Adjustment on phonetic symbols |
---|---|
remove -r | add ‘ər’ at the end |
remove -er | the same as above |
remove -ier and add -y | the same as above |
remove -er and the preceding consonant | the same as above |
- End up with -est:
Adjustment on word|Adjustment on phonetic symbols
—|—
remove -r|add ‘ɪst’ at the end
remove -er| the same as above
remove -ier and add -y|the same as above
remove -er and the preceding consonant|the same as above