Last modified: 2013-04-22 19:42:52 UTC
The goal is to give to mediawiki and especially for wiktionnary a IPA/SAMPA module which work like ISBN module in wikipedia Just type : IPA : [toto] or SAMPA : [toto] and it will make a link (a may be an icon) to ear in midi the phonetic
Very good idea! Of course it should read: IPA: [toto] or SAMPA: [toto], the square brackets shouldn't be obligatory, since sometimes phonological transcriptions may be used, so it would read: IPA: /toto/.
I'm with you. However IMHO is better a form like <ipa>roman_transcription</ipa>. It could be possible using the LaTeX extension named TIPA and the software made for <math></math>. In my imagination the enhancement should make an image from the code in <ipa></ipa>, exactly like <math>.
I am working on this right now. It's a generalized module that can input and output phonetic representations in a variety of formats: Unicode IPA (UTF-8) Unicode IPA (HTML entities) X-SAMPA as well as a few more obscure ones: Kirshenbaum tipa (the TeX IPA package) a modified version of the system used in _Big Book of Beastly Mispronunciations_, which gives things like <small>KAL</small>-i-FOR-nyuh I'm still working on coding each of the modules--I currently have a Unicode IPA (UTF-8) reader, and a writer for Unicode IPA (UTF-8), Unicode IPA (HTML entities), X-SAMPA, and Big_Beastly. I'm working on the X-SAMPA reader currently. I'm not sure readers will be needed for Kirshenbaum, tipa, and Big_Beastly. Also, I haven't yet worked out the syntax for denoting phonetic strings. Thankfully, I've designed it so the syntax is not integral and the modules should be compatible with any syntax. A few other notes: * Generating audio pronunciations will require the installation and use of a TTS system. Unfortunately, current Free TTS systems sound like crap. I don't think there is much point at this juncture to invest development time in automatically generating audio pronunciations. * Generating an image of the IPA _could_ be done by connecting the phonetics module to texvc and installing the LaTeX TIPA package. I'm not going to invest development time in this right now other than generating TIPA-compatible output. * The syntax should be able to specify what format the input is in and what format(s) the output should be in. Further, there should be a reasonable default for both of these. I would advocate Unicode IPA -> Unicode IPA as the defaults. Further, there should be a set of standard templates for generating phonetic outputs in various formats. We can add some IE-specific CSS to explicitly specify the font to a set of fonts known to contain IPA symbols (this isn't necessary with other browsers because they substitute in Unicode characters from other fonts when the current font doesn't contain the requested character). Finally, there should be a user preference to set the preferred output format for phonetic data that would override anything that uses the defaults. <phon input=xsampa>"hE.loU</phon> -> IPA output <phon input=xsampa output=bb>"hE.loU</phon> -> HE-loh but there would be templates for this {{xsampa_to_ipa|"hE.loU}} {{ipa_to_xsampa_and_ipa|toto}} This needs more thought.
Created attachment 82 [details] Patch to Setup.php in support of files to be uploaded subsquently This is a diff for Setup.php that includes "Phonetics.php" ( to be uploaded )
Created attachment 83 [details] Phonetics.php file to go into includes/ This is the Phonetics.php file which supports the phonetics extensions. To be described more fully in a forthcoming comment.
Created attachment 84 [details] archive containing files used to generate Phonetics.php This is an archive containing the files that are used to generate Phonetics.php, including a Makefile. To be described in a forthcoming comment.
OK. I have uploaded 3 attachments that implement the IPA/SAMPA solution I have created. Overview of what it does: it supports the following new tags: <ipa> <ipa-en> <xsampa> <xsampa-en>. The <ipa> tag takes IPA Unicode input (either UTF-8 or numeric entities) and returns 2 <span>s: one containing the IPA Unicode in all numeric entities, and the other containing the equivalent X-SAMPA. The <xsampa> tag takes X-SAMPA input and returns the same <span>s as <ipa>. The -en versions of the tags are identical, except they also return a third <span> containing the phonetics in a "simple English" phonetic format. This option is in a separate tag because this only works with English phonemes. Overview of how it works: Phonetics.php is auto-generated from some files in the phonetics.tar.gz archive. The translation tables are generated via a perl script from a tab-separated text file containing all the correspondances between phonetic systems. The translation tables are then #included via cpp into the php source (Phonetics.phpi). PHP functions include() or require() won't work for this because they can't be called from within a class definition.
The newest versions of this depend on Parser that supports parameters in tags
Created attachment 88 [details] archive containing files used to generate Phonetics.php newer version that eliminates the previous tags and now supports just the <phon> tag which takes attributes "encoding" and "display"
Created attachment 89 [details] Revised version of Phonetics.php New version of Phonetics.php, generated by files in attachment 88 [details]
Do the patch work correctly ?
This is handled with templates now. See http://en.wiktionary.org/wiki/sententious#Pronunciation for an example.
Reopening after discussion on IRC. I would suggest that this be done as an extension instead of a patch to MediaWiki proper.
I'm removing the blocker on bug 26207; these days we'd want this implemented as a parser function, so no new syntax extension system is needed. The old patch above should be looked over to see if it can be adapted or used to inspire a modern version.
John, looks like you looked at the patch and found it obsolete enough that we cannot adapt it into a modern version?
Any news on this bug ? It would be also a good candidate for wikidata to be able to extract the pronunciation of a word in many languages and generate the sound associated