Last modified: 2014-04-08 12:11:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48610, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46610 - Pronunciation recording tool (tracking)
Pronunciation recording tool (tracking)
Status: ASSIGNED
Product: MediaWiki extensions
Classification: Unclassified
PronunciationRecording (Other open bugs)
master
All All
: Normal enhancement (vote)
: ---
Assigned To: Rahul Maliakkal
http://thread.gmane.org/gmane.org.wik...
:
Depends on: 53128 53129 54351 32135
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-27 17:40 UTC by Matthew Flaschen
Modified: 2014-04-08 12:11 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Matthew Flaschen 2013-03-27 17:40:32 UTC
Several people (e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=31221, though the original report asks for computer text-to-speech, and http://comments.gmane.org/gmane.org.wikimedia.wiktionary/1265) have requested a tool to simplify the workflow of recording the pronunciation of a word.

The basic idea is to provide a wizard flow for picking a word (which may be the page you're on), recording it, choosing a free license, then uploading it to Wikimedia Commons with the appropriate metadata.
Comment 1 Amgine 2013-03-27 18:39:56 UTC
Note: This will also need to take into account the L2 sections, which are used to indicate the language. For example, https://en.wiktionary.org/wiki/chance#English https://en.wiktionary.org/wiki/chance#French etc.
Comment 2 Nemo 2013-03-28 21:59:42 UTC
Well, the tool could simply be added by a parser function called with both word and language (and possibly something else for homographs), this seems the least of the problems. :)
Comment 3 Quim Gil 2013-03-28 22:33:00 UTC
Just like the "Edit" link appears next to each section, the [Record button] could be placed next to any word missing voice recorded pronunciation, right?

Question: what is the page describing the current workflow? It is not evident to see how a user can contribute a pronunciation now.

Also: what happens if an audio file already exists but I think I can contribute a better one e.g. because of audio quality or some other defect?

Needless to say, this is a feature that is calling for a mobile UI sooner or later... Think of all those languages spoken in countries with a high penetration of mobile devices.
Comment 4 Matthew Flaschen 2013-03-28 22:44:04 UTC
The current procedure on English Wiktionary is https://en.wiktionary.org/wiki/Help:Audio_pronunciations .  Other projects probably have somewhat different procedures.

I suggest the tool initially only show on pages without existing recordings.  It would be good to solve that problem eventually, but it is more likely to require discussion (should we keep both because they have slightly different accents?, etc.)

Also, I skipped the final part of the flow, adding the template (e.g. Template:audio on English Wiktionary) to the Wiktionary page.
Comment 5 Amgine 2013-03-29 12:30:49 UTC
> I suggest the tool initially only show on pages without existing recordings. 

Not sure I would agree. The many dialects of English, for example, can be dramatically different. 'Schedule' springs to mind[1].

Although I'd love to get into a discussion about collecting metadata with recordings (geoip location of author, self-identity of dialectic origins, etc.) I think at this point we should focus on the basic mechanics: user button to record a brief audio snippet which is auto-uploaded to commons with authoring/license templates, and the local wiktionary page updated.

[1] https://en.wiktionary.org/wiki/schedule#Pronunciation
Comment 6 Matthew Flaschen 2013-03-29 19:09:41 UTC
I agree.  I wasn't proposing complicated metadata, just the basics (license template of course, Category:$LANGUAGE pronunciation, maybe a hidden category to mark recordings from the tool).

The reason I suggested keeping it simple by showing on pages without recordings is to avoid collisions.  E.g. what happens if I live in the U.S. but have a different pronunciation of https://commons.wikimedia.org/wiki/File:En-us-associate.ogg ?  But it looks like they resolve collisions by just adding a number, https://commons.wikimedia.org/wiki/File:En-us-associate-2.ogg, which is easy enough for a tool to do.
Comment 7 Rahul Maliakkal 2013-04-05 20:22:24 UTC
I have prepared a rough project proposal Please do give me your feedback and
suggestions so that i can improve on it
https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Comment 8 Quim Gil 2013-04-09 16:03:32 UTC
Hi Rahul,

Through the different discussions so far we have seen that this project might be more tricky than what it looked like initially. And the main problem is still that no mentor is stepping in. 

I recommend you to wait a couple of days more and the make a decision: bet blindly on this proposal with the hope that things will be solved in the next weeks or put it aside and bet on some other idea for GSoC.

You can still work on a voice recording tool as a pet project, but from my point of view it still lacks some essential factors to consider it for this GSOC: no mentor and no enthusiastic response from Wiktionary community.
Comment 9 Matthew Flaschen 2013-04-09 19:46:42 UTC
On the Wiktionary front, has anyone reached out to them on a prominent place on-wiki?

As far as the technology, there seems to be at least one workable approach using the HTML5 Media Capture API (http://www.w3.org/2009/dap/wiki/ImplementationStatus#HTML_Media_Capture and https://news.ycombinator.com/item?id=4001140).  I haven't tested this myself yet; the browsers are mostly mobile.  See http://mobilehtml5.org/ts/?id=23 for the syntax and a simple page for testing.

As getUserMedia develops that could become an alternate approach.

So it might be workable if a mentor becomes available.
Comment 10 Rahul Maliakkal 2013-04-09 20:02:22 UTC
I will surely have a look at them.Micheal Dale is ready to mentor :)
Comment 11 Michael Dale 2013-04-09 21:35:12 UTC
Confirmed. As mentioned on IRC, would be nice to also support the record this article, or paragraph out loud, for the spoken articles project.
Comment 12 Nemo 2013-04-09 22:04:22 UTC
(In reply to comment #9)
> On the Wiktionary front, has anyone reached out to them on a prominent place
> on-wiki?

The central discussion place for Wiktionary is Wiktionary-l. It's not Wikipedia or Wikisource. Anyway I'll send more notifications to all languages.
Comment 13 Quim Gil 2013-04-09 22:19:32 UTC
Sometimes you help by doing something and sometimes you help by NOT doing something.  :)

I'm happy to have helped indirectly finding a mentor for this project. The GSoC process continues. Thank you Rahul, thank you Michael and thank you to the rest of people helping to move this feature forward.
Comment 14 Quim Gil 2013-04-11 21:46:29 UTC
Adding dependency to WAV support to document the discussion in the past hours at wikitech-l. Feel free changing this if the plan changes.
Comment 15 Ron Surratt 2013-04-28 04:13:49 UTC
(In reply to comment #13)
> I'm happy to have helped indirectly finding a mentor for this project. The
> GSoC
> process continues. Thank you Rahul, thank you Michael and thank you to the
> rest
> of people helping to move this feature forward.

Hello all, I have developed a mediawiki extension (not released yet) that plays an ogg file on hover for any word (so far just English... abt 2500 words) that it knows about in any page. Also it inserts a play button on hover to keep playing and highlighting all words it knows.  In addition a very short definition is displayed also with words hoverable and playable. Words are from simple wiktionary. 

I also have made a javascript sound recorder that uses wami (https://code.google.com/p/wami-recorder/) to record and soundmanager (http://www.schillmania.com/projects/soundmanager2/) for playback. I believe both use HTML5 if it is available with a fallback to Flash if it is not available.  I also use ffmpeg and sox to do some server side processing (wav->ogg and trimming silence at start and end of word). This is part of another project I have been working on to help someone learn a new language. I could make this available to anyone who would want it or I could maybe take a shot at implementing Rahul's suggestions. I am totally new to mediawiki so any guidance would be helpful.
Comment 16 Matthew Flaschen 2013-04-28 04:51:17 UTC
(In reply to comment #15)
> I also have made a javascript sound recorder that uses wami
> (https://code.google.com/p/wami-recorder/) to record and soundmanager
> (http://www.schillmania.com/projects/soundmanager2/) for playback. I believe
> both use HTML5 if it is available with a fallback to Flash if it is not
> available.

As far as I can tell from the source code (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires Flash.
Comment 17 Ron Surratt 2013-04-28 05:37:55 UTC
> As far as I can tell from the source code
> (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires
> Flash.

I no idea, is flash a no-no for mediawiki?
Comment 18 Bawolff (Brian Wolff) 2013-04-28 05:47:56 UTC
(In reply to comment #17)
> > As far as I can tell from the source code
> > (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires
> > Flash.
> 
> I no idea, is flash a no-no for mediawiki?

Flash is very controversial for political reasons. It may be acceptable as a fallback mechanism on old browsers that don't support the latest and greatest html features, however its a no-no to require flash (might be considered acceptable if the particular flash thing works with gnash). Java is much more considered ok, but still not exactly loved either.
Comment 19 Michael Dale 2013-04-29 15:05:14 UTC
(In reply to comment #15)
> an ogg file on hover for any word (so far just English... abt 2500 words)
> that it knows about in any page. 

sounds like a fun, would work best as a gadget, and query the real wikitionary.  

> I also have made a javascript sound recorder that uses wami
> (https://code.google.com/p/wami-recorder/) to record and soundmanager
> (http://www.schillmania.com/projects/soundmanager2/) for playback. 

I like flash fallbacks, I understand we should not require flash, but as a fallback its great. Its much less of a patented, fragmented, and security failure than in-browser java. 

>I also use ffmpeg and sox to do some server side processing
> (wav->ogg and trimming silence at start and end of word). This is part of
> another project I have been working on to help someone learn a new language.

We should to try to trim client side if possible. 

> I could make this available to anyone who would want it or I could maybe take a
> shot at implementing Rahul's suggestions. I am totally new to mediawiki so
> any
> guidance would be helpful.

Cool I am sure Rahul will touch base with you.
Comment 20 Rahul Maliakkal 2013-04-29 15:13:38 UTC
(In reply to comment #15)

>I also use ffmpeg and sox to do some server side processing
> (wav->ogg and trimming silence at start and end of word). This is part of
> another project I have been working on to help someone learn a new language.

I am interested. Could you come on the irc where we can have good discussion regarding this.
Comment 21 Ron Surratt 2013-04-30 15:18:21 UTC
(In reply to comment #19)
> (In reply to comment #15)
> > an ogg file on hover for any word (so far just English... abt 2500 words)
> > that it knows about in any page. 
> 
> sounds like a fun, would work best as a gadget,

How does one go about getting a user script approved as a gadget?

> and query the real wikitionary.
 
the extension opens up a new tab on the real wiktionary for the definition of any word on mouse click but on hover it just uses the definition (if it exists) in the small dictionary I have made,(a json file about 60k compressed). 

The sound files it uses are mostly from what is used in the English wiktionary but I am in the process of recording new ones that "flow" better when spoken one after another in a sentence.

> 
> >I also use ffmpeg and sox to do some server side processing
> > (wav->ogg and trimming silence at start and end of word). This is part of
> > another project I have been working on to help someone learn a new language.
> 
> We should to try to trim client side if possible. 

Yes that would be a great solution to the problem of getting ffmpeg and sox executables running on a variety of servers but I have no idea on how to do that.  Perhaps Java? If anyone knows how to do that I am all ears.
Comment 22 Ron Surratt 2013-04-30 15:33:37 UTC
> I am interested. Could you come on the irc where we can have good discussion
> regarding this.

Hi Rahul,  sure,  how do I go about doing that?
Comment 23 Andre Klapper 2013-04-30 16:35:15 UTC
See http://www.mediawiki.org/wiki/MediaWiki_on_IRC for more info.
Comment 24 Matthew Flaschen 2013-04-30 19:03:27 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > (In reply to comment #15)
> > > an ogg file on hover for any word (so far just English... abt 2500 words)
> > > that it knows about in any page. 
> > 
> > sounds like a fun, would work best as a gadget,
> 
> How does one go about getting a user script approved as a gadget?

Each wiki approves them separately.  See https://en.wiktionary.org/wiki/Wiktionary:Gadgets, though I'm not sure where you ask for it to be approved.  You can ask at https://en.wiktionary.org/wiki/Wiktionary:Grease_pit .

Please use a separate bug report for the mini-dictionary (ogg, short definition) on-hover idea.  It's interesting, but separate from this.
Comment 25 Ron Surratt 2013-05-01 00:36:37 UTC
(In reply to comment #18)
> (In reply to comment #17)
 
> Flash is very controversial for political reasons. It may be acceptable as a
> fallback mechanism on old browsers that don't support the latest and greatest
> html features, however its a no-no to require flash (might be considered
> acceptable if the particular flash thing works with gnash). Java is much more
> considered ok, but still not exactly loved either.

I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client side , nothing on server. from them (at https://code.google.com/p/wami-recorder/)

"The WAMI recorder uses a light-weight Flash app to ship audio from client to server via a standard HTTP POST. Apart from the security settings to allow microphone access, the entire interface can be constructed in HTML and Javascript."


 sooo is this a deal breaker? Sounds like it. If so, sorry for wasting your time, and perhaps it would be better to wait for a GSoC solution that uses the latest and greatest technology.  Unless someone has a suggestion.
Comment 26 Rahul Maliakkal 2013-05-01 05:45:25 UTC
(In reply to comment #25)

>I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client
side , nothing on server. from them (at
https://code.google.com/p/wami-recorder/)

Its okay Ron, you took interest and wanted to help, that itself is a positive sign!
Comment 27 Michael Dale 2013-05-01 14:31:37 UTC
I don't think its a deal breaker, flash makes a great fallback. If we can use the webRTC solution for browsers that support it, then using wami as a fallback is fine. 

The restriction against flash for wikimedia projects is based on the idea, that you don't exclusively deliver an experience for proprietary platforms. Using flash or java as a fallback is fine, as long as an open standard / free browser solution is also equally well supported.
Comment 28 Amgine 2013-05-01 14:35:15 UTC
Adobe stopped producing Flash for Linux last year or the year before.
Comment 29 Rahul Maliakkal 2013-05-01 14:48:53 UTC
v 11.2 is the last version supported for linux
Comment 31 Michael Dale 2013-05-01 15:38:18 UTC
Flash support for linux is not relevant. The point is you can get the same experience ( with webRTC ) with free software. The idea is to give an equal experience on flash vs free software platforms.
Comment 32 Ron Surratt 2013-05-06 04:38:32 UTC
(In reply to comment #31)
> Flash support for linux is not relevant. The point is you can get the same
> experience ( with webRTC ) with free software. The idea is to give an equal
> experience on flash vs free software platforms.

I have been able to get a sound recorder working with the HTML5 Web Audio API in Google's Chrome (Canary version).  It is much nicer than the Flash version using WAMI I already had in that it allows things such as user controlled silence removal in the browser.  I will next try to cram them both together so as to have the Flash fallback work as closely as possible to the HTML5 version.

I also want to be able to do the editing of pre-existing sounds as well as sounds input with a microphone.
Comment 33 Rahul Maliakkal 2013-05-06 07:31:14 UTC
(In reply to comment #32)

>I have been able to get a sound recorder working with the HTML5 Web Audio API
>in Google's Chrome (Canary version).

Please can you specify the version and did you enable the flag "Web Audio Input" via "chrome://flags
Comment 34 Ron Surratt 2013-05-06 14:43:39 UTC
(In reply to comment #33)
> (In reply to comment #32)
> 
> >I have been able to get a sound recorder working with the HTML5 Web Audio API
> >in Google's Chrome (Canary version).
> 
> Please can you specify the version and did you enable the flag "Web Audio
> Input" via "chrome://flags

the chrome is Version 28.0.1499.0 canary (https://www.google.com/intl/en/chrome/browser/canary.html) and there is no "Web Audio Input" flag in chrome://flags for that version of Chrome.
Comment 35 Matthew Flaschen 2013-06-14 19:32:07 UTC
I'm removing bug 20252 as a dependency, and moving to see also.  It's a nice-to-have, but it's not a blocker in my opinion.  These are going to be short files (< 5 seconds, most likely).

Bug 20252 could also be done later, and the files transcoded internally.
Comment 36 Rahul Maliakkal 2013-06-16 19:06:27 UTC
I have undertaken this as my GSoC project, Michael Dale and Matthew Flaschen will be my mentors during the course. The primary benefit is laying the groundwork for contributor-created audio to MediaWiki sites in any current browser. I have a done a little bit of research on the method to upload the pronunciations so far  and based on that the use of the Upload:API is essential, other API's like the Edit:API will also come handy. The first step that I plan on doing is to add .wav support to the THM extension. Link to my proposal http://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Comment 37 Gerrit Notification Bot 2013-07-24 20:24:08 UTC
Change 75770 had a related patch set uploaded by Rahul21:
Pronunciation Recording Tool( Not working )

https://gerrit.wikimedia.org/r/75770
Comment 38 Gerrit Notification Bot 2013-07-24 20:53:57 UTC
Change 75770 abandoned by Rahul21:
Pronunciation Recording Tool( Not working )

https://gerrit.wikimedia.org/r/75770
Comment 39 Matthew Flaschen 2013-08-20 21:53:02 UTC
6f9c18509b858d89e50d145a685eb5308dcdff7e implemented a special page, Special:PronunciationRecording.  That includes support for recording a pronunciation and playing it back.  The next main step is allowing uploading to the same wiki where the special page is (bug 53127).

There is also now a Bugzilla component for this extension.
Comment 40 Quim Gil 2013-09-17 16:22:06 UTC
GSoC "soft pencils down" date was yesterday and all coding must stop on 23 September. Has this project been completed?
Comment 41 Matthew Flaschen 2013-09-17 20:04:06 UTC
The overall project has not been completed, so Rahul will have to keep working until the final pencil down (September 23, as you noted).

The following parts are complete (parts that still need final review are noted).  Rahul can add anything I'm missing:

* Uploading to the stash is complete.  Fitting this into the overall upload flow (initially publishing from the stash to the main File page) is in progress and under review.
* Extension and special page setup
* WAV support for TimedMediaHandler
* Some refactoring to UploadWizard (which PronunciationRecorder is using as a library).  Mostly merged, a little more in progress
* Upload permissions check (not merged)
Comment 42 Matthew Flaschen 2013-09-27 23:48:22 UTC
This is not fully complete.  However, it's complete enough that it could be useful.  You can try it at http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording .

The main aspect that is not ready is integrating into Wiktionary pages.  It also can not currently upload to Commons (it uploads to the current wiki) from another wiki.

However, it does generate the Information template and categories needed for Commons.
Comment 43 Matthew Flaschen 2013-09-27 23:54:46 UTC
Actually, use http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording?debug=true due to bug 54351.

Also, note that you need to use a modern browser with sufficient Web Audio support.  Currently, that probably means Chrome, but Firefox is working on the same standards, so it will eventually work in Firefox and other browsers.
Comment 44 Quim Gil 2013-10-22 19:39:24 UTC
If you have open tasks or bugs left, one possibility is to list them at https://www.mediawiki.org/wiki/Google_Code-In and volunteer yourself as mentor.

We have heard from Google and free software projects participating in Code-in that students participating in this programs have done a great work finishing and polishing GSoC projects, many times mentores by the former GSoC student. The key is to be able to split the pending work in little tasks.

More information in the wiki page. If you have questions you can ask there or you can contact me directly.
Comment 45 Andre Klapper 2014-02-27 17:25:04 UTC
Rahul: Are you (still) working on this? If not, please reset the assignee to default and the status to NEW. Thanks!
Comment 46 Matthew Flaschen 2014-02-27 19:21:46 UTC
I don't know if we should keep this open now that's it's an in-progress extension with its own Bugzilla component.

However, if we want to, we can use it to mark when the initial Wiktionary functionality (see https://www.mediawiki.org/wiki/User:Rahul21/Gsoc2013/Proposal#Simple_workflow) is done.  Basically, a Minimum Viable Product.
Comment 47 Rainer Rillke @commons.wikimedia 2014-03-08 22:57:52 UTC
I didn't see any progress here, therefore I re-launched
https://meta.wikimedia.org/wiki/Grants:IEG/Finish_Pronunciation_Recording

You may see this as a competing product product or a chance to get some useful feedback. Cheers!
Comment 48 Matthew Flaschen 2014-03-27 17:36:55 UTC
(In reply to Matthew Flaschen from comment #42)
> This is not fully complete.  However, it's complete enough that it could be
> useful.  You can try it at
> http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:
> PronunciationRecording .

It's moved to http://pronunciationrecording.wmflabs.org/wiki/Special:PronunciationRecording?debug=true .  The new server is open for normal account creations.

If anyone would like special access (e.g. an admin account to test gadgets), let me know.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links