Last modified: 2013-04-22 19:42:52 UTC
The goal is to give to mediawiki and especially for wiktionnary a IPA/SAMPA
module which work like ISBN module in wikipedia
Just type :
IPA : [toto]
SAMPA : [toto]
and it will make a link (a may be an icon) to ear in midi the phonetic
Very good idea! Of course it should read: IPA: [toto] or SAMPA: [toto], the
square brackets shouldn't be obligatory, since sometimes phonological
transcriptions may be used, so it would read: IPA: /toto/.
I'm with you. However IMHO is better a form like <ipa>roman_transcription</ipa>.
It could be possible using the LaTeX extension named TIPA and the software made
for <math></math>. In my imagination the enhancement should make an image from
the code in <ipa></ipa>, exactly like <math>.
I am working on this right now. It's a generalized module that can input and
output phonetic representations in a variety of formats:
Unicode IPA (UTF-8)
Unicode IPA (HTML entities)
as well as a few more obscure ones:
tipa (the TeX IPA package)
a modified version of the system used in _Big Book of Beastly
Mispronunciations_, which gives things like <small>KAL</small>-i-FOR-nyuh
I'm still working on coding each of the modules--I currently have a Unicode IPA
(UTF-8) reader, and a writer for Unicode IPA (UTF-8), Unicode IPA (HTML
entities), X-SAMPA, and Big_Beastly. I'm working on the X-SAMPA reader
currently. I'm not sure readers will be needed for Kirshenbaum, tipa, and
Also, I haven't yet worked out the syntax for denoting phonetic strings.
Thankfully, I've designed it so the syntax is not integral and the modules
should be compatible with any syntax.
A few other notes:
* Generating audio pronunciations will require the installation and use of a TTS
system. Unfortunately, current Free TTS systems sound like crap. I don't think
there is much point at this juncture to invest development time in automatically
generating audio pronunciations.
* Generating an image of the IPA _could_ be done by connecting the phonetics
module to texvc and installing the LaTeX TIPA package. I'm not going to invest
development time in this right now other than generating TIPA-compatible output.
* The syntax should be able to specify what format the input is in and what
format(s) the output should be in. Further, there should be a reasonable default
for both of these. I would advocate Unicode IPA -> Unicode IPA as the defaults.
Further, there should be a set of standard templates for generating phonetic
outputs in various formats. We can add some IE-specific CSS to explicitly
specify the font to a set of fonts known to contain IPA symbols (this isn't
necessary with other browsers because they substitute in Unicode characters from
other fonts when the current font doesn't contain the requested character).
Finally, there should be a user preference to set the preferred output format
for phonetic data that would override anything that uses the defaults.
<phon input=xsampa>"hE.loU</phon> -> IPA output
<phon input=xsampa output=bb>"hE.loU</phon> -> HE-loh
but there would be templates for this
This needs more thought.
Created attachment 82 [details]
Patch to Setup.php in support of files to be uploaded subsquently
This is a diff for Setup.php that includes "Phonetics.php" ( to be uploaded )
Created attachment 83 [details]
Phonetics.php file to go into includes/
This is the Phonetics.php file which supports the phonetics extensions. To be
described more fully in a forthcoming comment.
Created attachment 84 [details]
archive containing files used to generate Phonetics.php
This is an archive containing the files that are used to generate
Phonetics.php, including a Makefile. To be described in a forthcoming comment.
OK. I have uploaded 3 attachments that implement the IPA/SAMPA solution I have created.
Overview of what it does: it supports the following new tags: <ipa> <ipa-en> <xsampa> <xsampa-en>. The <ipa> tag takes IPA
Unicode input (either UTF-8 or numeric entities) and returns 2 <span>s: one containing the IPA Unicode in all numeric entities, and the
other containing the equivalent X-SAMPA. The <xsampa> tag takes X-SAMPA input and returns the same <span>s as <ipa>. The -en
versions of the tags are identical, except they also return a third <span> containing the phonetics in a "simple English" phonetic format.
This option is in a separate tag because this only works with English phonemes.
Overview of how it works: Phonetics.php is auto-generated from some files in the phonetics.tar.gz archive. The translation tables are
generated via a perl script from a tab-separated text file containing all the correspondances between phonetic systems. The translation
tables are then #included via cpp into the php source (Phonetics.phpi). PHP functions include() or require() won't work for this because
they can't be called from within a class definition.
The newest versions of this depend on Parser that supports parameters in tags
Created attachment 88 [details]
archive containing files used to generate Phonetics.php
newer version that eliminates the previous tags and now supports just the
<phon> tag which takes attributes "encoding" and "display"
Created attachment 89 [details]
Revised version of Phonetics.php
New version of Phonetics.php, generated by files in attachment 88 [details]
Do the patch work correctly ?
This is handled with templates now. See http://en.wiktionary.org/wiki/sententious#Pronunciation for an example.
Reopening after discussion on IRC. I would suggest that this be done as an extension instead of a patch to MediaWiki proper.
I'm removing the blocker on bug 26207; these days we'd want this implemented as a parser function, so no new syntax extension system is needed.
The old patch above should be looked over to see if it can be adapted or used to inspire a modern version.
John, looks like you looked at the patch and found it obsolete enough that we cannot adapt it into a modern version?
Any news on this bug ? It would be also a good candidate for wikidata to be able to extract the pronunciation of a word in many languages and generate the sound associated