Last modified: 2013-04-22 19:42:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2224, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 224 - IPA or SAMPA module
IPA or SAMPA module
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
All All
: Low enhancement with 13 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on: 684
  Show dependency treegraph
Reported: 2004-08-26 11:07 UTC by xmlizer
Modified: 2013-04-22 19:42 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Patch to Setup.php in support of files to be uploaded subsquently (594 bytes, patch)
2004-10-10 22:55 UTC, David Friedland
Phonetics.php file to go into includes/ (46.03 KB, text/plain)
2004-10-10 22:57 UTC, David Friedland
archive containing files used to generate Phonetics.php (22.94 KB, application/x-gzip)
2004-10-10 22:59 UTC, David Friedland
archive containing files used to generate Phonetics.php (23.19 KB, application/x-gzip)
2004-10-11 08:20 UTC, David Friedland
Revised version of Phonetics.php (45.82 KB, text/plain)
2004-10-11 08:22 UTC, David Friedland

Description xmlizer 2004-08-26 11:07:45 UTC
The goal is to give to mediawiki and especially for wiktionnary a IPA/SAMPA
module which work like ISBN module in wikipedia

Just type :
IPA : [toto]
SAMPA : [toto]

and it will make a link (a may be an icon) to ear in midi the phonetic
Comment 1 Martin Haase aka Maha 2004-08-26 14:13:31 UTC
Very good idea! Of course it should read: IPA: [toto] or SAMPA: [toto], the
square brackets shouldn't be obligatory, since sometimes phonological
transcriptions may be used, so it would read: IPA: /toto/.
Comment 2 Oberon 2004-09-11 09:41:27 UTC
I'm with you. However IMHO is better a form like <ipa>roman_transcription</ipa>.
It could be possible using the LaTeX extension named TIPA and the software made
for <math></math>. In my imagination the enhancement should make an image from
the code in <ipa></ipa>, exactly like <math>.
Comment 3 David Friedland 2004-10-04 20:56:20 UTC
I am working on this right now. It's a generalized module that can input and
output phonetic representations in a variety of formats:

Unicode IPA (UTF-8)
Unicode IPA (HTML entities)

as well as a few more obscure ones:
tipa (the TeX IPA package)
a modified version of the system used in _Big Book of Beastly
Mispronunciations_, which gives things like <small>KAL</small>-i-FOR-nyuh

I'm still working on coding each of the modules--I currently have a Unicode IPA
(UTF-8) reader, and a writer for Unicode IPA (UTF-8), Unicode IPA (HTML
entities), X-SAMPA, and Big_Beastly. I'm working on the X-SAMPA reader
currently. I'm not sure readers will be needed for Kirshenbaum, tipa, and

Also, I haven't yet worked out the syntax for denoting phonetic strings.
Thankfully, I've designed it so the syntax is not integral and the modules
should be compatible with any syntax.

A few other notes:

* Generating audio pronunciations will require the installation and use of a TTS
system. Unfortunately, current Free TTS systems sound like crap. I don't think
there is much point at this juncture to invest development time in automatically
generating audio pronunciations.

* Generating an image of the IPA _could_ be done by connecting the phonetics
module to texvc and installing the LaTeX TIPA package. I'm not going to invest
development time in this right now other than generating TIPA-compatible output. 

* The syntax should be able to specify what format the input is in and what
format(s) the output should be in. Further, there should be a reasonable default
for both of these. I would advocate Unicode IPA -> Unicode IPA as the defaults.
Further, there should be a set of standard templates for generating phonetic
outputs in various formats. We can add some IE-specific CSS to explicitly
specify the font to a set of fonts known to contain IPA symbols (this isn't
necessary with other browsers because they substitute in Unicode characters from
other fonts when the current font doesn't contain the requested character).
Finally, there should be a user preference to set the preferred output format
for phonetic data that would override anything that uses the defaults.

<phon input=xsampa>"hE.loU</phon> -> IPA output
<phon input=xsampa output=bb>"hE.loU</phon> -> HE-loh

but there would be templates for this


This needs more thought.
Comment 4 David Friedland 2004-10-10 22:55:38 UTC
Created attachment 82 [details]
Patch to Setup.php in support of files to be uploaded subsquently

This is a diff for Setup.php that includes "Phonetics.php" ( to be uploaded )
Comment 5 David Friedland 2004-10-10 22:57:12 UTC
Created attachment 83 [details]
Phonetics.php file to go into includes/

This is the Phonetics.php file which supports the phonetics extensions. To be
described more fully in a forthcoming comment.
Comment 6 David Friedland 2004-10-10 22:59:21 UTC
Created attachment 84 [details]
archive containing files used to generate Phonetics.php

This is an archive containing the files that are used to generate
Phonetics.php, including a Makefile. To be described in a forthcoming comment.
Comment 7 David Friedland 2004-10-10 23:07:10 UTC
OK. I have uploaded 3 attachments that implement the IPA/SAMPA solution I have created. 

Overview of what it does: it supports the following new tags: <ipa> <ipa-en> <xsampa> <xsampa-en>. The <ipa> tag takes IPA 
Unicode input (either UTF-8 or numeric entities) and returns 2 <span>s: one containing the IPA Unicode in all numeric entities, and the 
other containing the equivalent X-SAMPA. The <xsampa> tag takes X-SAMPA input and returns the same <span>s as <ipa>. The -en 
versions of the tags are identical, except they also return a third <span> containing the phonetics in a "simple English" phonetic format. 
This option is in a separate tag because this only works with English phonemes.

Overview of how it works: Phonetics.php is auto-generated from some files in the phonetics.tar.gz archive. The translation tables are 
generated via a perl script from a tab-separated text file containing all the correspondances between phonetic systems. The translation 
tables are then #included via cpp into the php source (Phonetics.phpi). PHP functions include() or require() won't work for this because 
they can't be called from within a class definition.
Comment 8 David Friedland 2004-10-11 06:28:36 UTC
The newest versions of this depend on Parser that supports parameters in tags
Comment 9 David Friedland 2004-10-11 08:20:43 UTC
Created attachment 88 [details]
archive containing files used to generate Phonetics.php

newer version that eliminates the previous tags and now supports just the
<phon> tag which takes attributes "encoding" and "display"
Comment 10 David Friedland 2004-10-11 08:22:38 UTC
Created attachment 89 [details]
Revised version of Phonetics.php

New version of Phonetics.php, generated by files in attachment 88 [details]
Comment 11 xmlizer 2005-04-02 13:00:13 UTC
Do the patch work correctly ?
Comment 12 Mark A. Hershberger 2010-12-02 17:57:24 UTC
This is handled with templates now.  See for an example.
Comment 13 Mark A. Hershberger 2010-12-02 18:13:23 UTC
Reopening after discussion on IRC.  I would suggest that this be done as an extension instead of a patch to MediaWiki proper.
Comment 14 Brion Vibber 2011-05-26 17:51:35 UTC
I'm removing the blocker on bug 26207; these days we'd want this implemented as a parser function, so no new syntax extension system is needed.

The old patch above should be looked over to see if it can be adapted or used to inspire a modern version.
Comment 15 Sumana Harihareswara 2011-08-24 14:53:04 UTC
John, looks like you looked at the patch and found it obsolete enough that we cannot adapt it into a modern version?
Comment 16 xmlizer 2013-01-17 14:49:11 UTC
Any news on this bug ? It would be also a good candidate for wikidata to be able to extract the pronunciation of a word in many languages and generate the sound associated

Note You need to log in before you can comment on or make changes to this bug.