Last modified: 2014-04-08 12:11:21 UTC
Several people (e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=31221, though the original report asks for computer text-to-speech, and http://comments.gmane.org/gmane.org.wikimedia.wiktionary/1265) have requested a tool to simplify the workflow of recording the pronunciation of a word. The basic idea is to provide a wizard flow for picking a word (which may be the page you're on), recording it, choosing a free license, then uploading it to Wikimedia Commons with the appropriate metadata.
Note: This will also need to take into account the L2 sections, which are used to indicate the language. For example, https://en.wiktionary.org/wiki/chance#English https://en.wiktionary.org/wiki/chance#French etc.
Well, the tool could simply be added by a parser function called with both word and language (and possibly something else for homographs), this seems the least of the problems. :)
Just like the "Edit" link appears next to each section, the [Record button] could be placed next to any word missing voice recorded pronunciation, right? Question: what is the page describing the current workflow? It is not evident to see how a user can contribute a pronunciation now. Also: what happens if an audio file already exists but I think I can contribute a better one e.g. because of audio quality or some other defect? Needless to say, this is a feature that is calling for a mobile UI sooner or later... Think of all those languages spoken in countries with a high penetration of mobile devices.
The current procedure on English Wiktionary is https://en.wiktionary.org/wiki/Help:Audio_pronunciations . Other projects probably have somewhat different procedures. I suggest the tool initially only show on pages without existing recordings. It would be good to solve that problem eventually, but it is more likely to require discussion (should we keep both because they have slightly different accents?, etc.) Also, I skipped the final part of the flow, adding the template (e.g. Template:audio on English Wiktionary) to the Wiktionary page.
> I suggest the tool initially only show on pages without existing recordings. Not sure I would agree. The many dialects of English, for example, can be dramatically different. 'Schedule' springs to mind[1]. Although I'd love to get into a discussion about collecting metadata with recordings (geoip location of author, self-identity of dialectic origins, etc.) I think at this point we should focus on the basic mechanics: user button to record a brief audio snippet which is auto-uploaded to commons with authoring/license templates, and the local wiktionary page updated. [1] https://en.wiktionary.org/wiki/schedule#Pronunciation
I agree. I wasn't proposing complicated metadata, just the basics (license template of course, Category:$LANGUAGE pronunciation, maybe a hidden category to mark recordings from the tool). The reason I suggested keeping it simple by showing on pages without recordings is to avoid collisions. E.g. what happens if I live in the U.S. but have a different pronunciation of https://commons.wikimedia.org/wiki/File:En-us-associate.ogg ? But it looks like they resolve collisions by just adding a number, https://commons.wikimedia.org/wiki/File:En-us-associate-2.ogg, which is easy enough for a tool to do.
I have prepared a rough project proposal Please do give me your feedback and suggestions so that i can improve on it https://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Hi Rahul, Through the different discussions so far we have seen that this project might be more tricky than what it looked like initially. And the main problem is still that no mentor is stepping in. I recommend you to wait a couple of days more and the make a decision: bet blindly on this proposal with the hope that things will be solved in the next weeks or put it aside and bet on some other idea for GSoC. You can still work on a voice recording tool as a pet project, but from my point of view it still lacks some essential factors to consider it for this GSOC: no mentor and no enthusiastic response from Wiktionary community.
On the Wiktionary front, has anyone reached out to them on a prominent place on-wiki? As far as the technology, there seems to be at least one workable approach using the HTML5 Media Capture API (http://www.w3.org/2009/dap/wiki/ImplementationStatus#HTML_Media_Capture and https://news.ycombinator.com/item?id=4001140). I haven't tested this myself yet; the browsers are mostly mobile. See http://mobilehtml5.org/ts/?id=23 for the syntax and a simple page for testing. As getUserMedia develops that could become an alternate approach. So it might be workable if a mentor becomes available.
I will surely have a look at them.Micheal Dale is ready to mentor :)
Confirmed. As mentioned on IRC, would be nice to also support the record this article, or paragraph out loud, for the spoken articles project.
(In reply to comment #9) > On the Wiktionary front, has anyone reached out to them on a prominent place > on-wiki? The central discussion place for Wiktionary is Wiktionary-l. It's not Wikipedia or Wikisource. Anyway I'll send more notifications to all languages.
Sometimes you help by doing something and sometimes you help by NOT doing something. :) I'm happy to have helped indirectly finding a mentor for this project. The GSoC process continues. Thank you Rahul, thank you Michael and thank you to the rest of people helping to move this feature forward.
Adding dependency to WAV support to document the discussion in the past hours at wikitech-l. Feel free changing this if the plan changes.
(In reply to comment #13) > I'm happy to have helped indirectly finding a mentor for this project. The > GSoC > process continues. Thank you Rahul, thank you Michael and thank you to the > rest > of people helping to move this feature forward. Hello all, I have developed a mediawiki extension (not released yet) that plays an ogg file on hover for any word (so far just English... abt 2500 words) that it knows about in any page. Also it inserts a play button on hover to keep playing and highlighting all words it knows. In addition a very short definition is displayed also with words hoverable and playable. Words are from simple wiktionary. I also have made a javascript sound recorder that uses wami (https://code.google.com/p/wami-recorder/) to record and soundmanager (http://www.schillmania.com/projects/soundmanager2/) for playback. I believe both use HTML5 if it is available with a fallback to Flash if it is not available. I also use ffmpeg and sox to do some server side processing (wav->ogg and trimming silence at start and end of word). This is part of another project I have been working on to help someone learn a new language. I could make this available to anyone who would want it or I could maybe take a shot at implementing Rahul's suggestions. I am totally new to mediawiki so any guidance would be helpful.
(In reply to comment #15) > I also have made a javascript sound recorder that uses wami > (https://code.google.com/p/wami-recorder/) to record and soundmanager > (http://www.schillmania.com/projects/soundmanager2/) for playback. I believe > both use HTML5 if it is available with a fallback to Flash if it is not > available. As far as I can tell from the source code (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires Flash.
> As far as I can tell from the source code > (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires > Flash. I no idea, is flash a no-no for mediawiki?
(In reply to comment #17) > > As far as I can tell from the source code > > (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires > > Flash. > > I no idea, is flash a no-no for mediawiki? Flash is very controversial for political reasons. It may be acceptable as a fallback mechanism on old browsers that don't support the latest and greatest html features, however its a no-no to require flash (might be considered acceptable if the particular flash thing works with gnash). Java is much more considered ok, but still not exactly loved either.
(In reply to comment #15) > an ogg file on hover for any word (so far just English... abt 2500 words) > that it knows about in any page. sounds like a fun, would work best as a gadget, and query the real wikitionary. > I also have made a javascript sound recorder that uses wami > (https://code.google.com/p/wami-recorder/) to record and soundmanager > (http://www.schillmania.com/projects/soundmanager2/) for playback. I like flash fallbacks, I understand we should not require flash, but as a fallback its great. Its much less of a patented, fragmented, and security failure than in-browser java. >I also use ffmpeg and sox to do some server side processing > (wav->ogg and trimming silence at start and end of word). This is part of > another project I have been working on to help someone learn a new language. We should to try to trim client side if possible. > I could make this available to anyone who would want it or I could maybe take a > shot at implementing Rahul's suggestions. I am totally new to mediawiki so > any > guidance would be helpful. Cool I am sure Rahul will touch base with you.
(In reply to comment #15) >I also use ffmpeg and sox to do some server side processing > (wav->ogg and trimming silence at start and end of word). This is part of > another project I have been working on to help someone learn a new language. I am interested. Could you come on the irc where we can have good discussion regarding this.
(In reply to comment #19) > (In reply to comment #15) > > an ogg file on hover for any word (so far just English... abt 2500 words) > > that it knows about in any page. > > sounds like a fun, would work best as a gadget, How does one go about getting a user script approved as a gadget? > and query the real wikitionary. the extension opens up a new tab on the real wiktionary for the definition of any word on mouse click but on hover it just uses the definition (if it exists) in the small dictionary I have made,(a json file about 60k compressed). The sound files it uses are mostly from what is used in the English wiktionary but I am in the process of recording new ones that "flow" better when spoken one after another in a sentence. > > >I also use ffmpeg and sox to do some server side processing > > (wav->ogg and trimming silence at start and end of word). This is part of > > another project I have been working on to help someone learn a new language. > > We should to try to trim client side if possible. Yes that would be a great solution to the problem of getting ffmpeg and sox executables running on a variety of servers but I have no idea on how to do that. Perhaps Java? If anyone knows how to do that I am all ears.
> I am interested. Could you come on the irc where we can have good discussion > regarding this. Hi Rahul, sure, how do I go about doing that?
See http://www.mediawiki.org/wiki/MediaWiki_on_IRC for more info.
(In reply to comment #21) > (In reply to comment #19) > > (In reply to comment #15) > > > an ogg file on hover for any word (so far just English... abt 2500 words) > > > that it knows about in any page. > > > > sounds like a fun, would work best as a gadget, > > How does one go about getting a user script approved as a gadget? Each wiki approves them separately. See https://en.wiktionary.org/wiki/Wiktionary:Gadgets, though I'm not sure where you ask for it to be approved. You can ask at https://en.wiktionary.org/wiki/Wiktionary:Grease_pit . Please use a separate bug report for the mini-dictionary (ogg, short definition) on-hover idea. It's interesting, but separate from this.
(In reply to comment #18) > (In reply to comment #17) > Flash is very controversial for political reasons. It may be acceptable as a > fallback mechanism on old browsers that don't support the latest and greatest > html features, however its a no-no to require flash (might be considered > acceptable if the particular flash thing works with gnash). Java is much more > considered ok, but still not exactly loved either. I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client side , nothing on server. from them (at https://code.google.com/p/wami-recorder/) "The WAMI recorder uses a light-weight Flash app to ship audio from client to server via a standard HTTP POST. Apart from the security settings to allow microphone access, the entire interface can be constructed in HTML and Javascript." sooo is this a deal breaker? Sounds like it. If so, sorry for wasting your time, and perhaps it would be better to wait for a GSoC solution that uses the latest and greatest technology. Unless someone has a suggestion.
(In reply to comment #25) >I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client side , nothing on server. from them (at https://code.google.com/p/wami-recorder/) Its okay Ron, you took interest and wanted to help, that itself is a positive sign!
I don't think its a deal breaker, flash makes a great fallback. If we can use the webRTC solution for browsers that support it, then using wami as a fallback is fine. The restriction against flash for wikimedia projects is based on the idea, that you don't exclusively deliver an experience for proprietary platforms. Using flash or java as a fallback is fine, as long as an open standard / free browser solution is also equally well supported.
Adobe stopped producing Flash for Linux last year or the year before.
v 11.2 is the last version supported for linux
http://9to5google.com/2012/02/22/pepper-based-flash-player-coming-to-chrome-later-this-year-adobe-dropping-standalone-plug-in-download-on-linux/ Nice Article to read :)
Flash support for linux is not relevant. The point is you can get the same experience ( with webRTC ) with free software. The idea is to give an equal experience on flash vs free software platforms.
(In reply to comment #31) > Flash support for linux is not relevant. The point is you can get the same > experience ( with webRTC ) with free software. The idea is to give an equal > experience on flash vs free software platforms. I have been able to get a sound recorder working with the HTML5 Web Audio API in Google's Chrome (Canary version). It is much nicer than the Flash version using WAMI I already had in that it allows things such as user controlled silence removal in the browser. I will next try to cram them both together so as to have the Flash fallback work as closely as possible to the HTML5 version. I also want to be able to do the editing of pre-existing sounds as well as sounds input with a microphone.
(In reply to comment #32) >I have been able to get a sound recorder working with the HTML5 Web Audio API >in Google's Chrome (Canary version). Please can you specify the version and did you enable the flag "Web Audio Input" via "chrome://flags
(In reply to comment #33) > (In reply to comment #32) > > >I have been able to get a sound recorder working with the HTML5 Web Audio API > >in Google's Chrome (Canary version). > > Please can you specify the version and did you enable the flag "Web Audio > Input" via "chrome://flags the chrome is Version 28.0.1499.0 canary (https://www.google.com/intl/en/chrome/browser/canary.html) and there is no "Web Audio Input" flag in chrome://flags for that version of Chrome.
I'm removing bug 20252 as a dependency, and moving to see also. It's a nice-to-have, but it's not a blocker in my opinion. These are going to be short files (< 5 seconds, most likely). Bug 20252 could also be done later, and the files transcoded internally.
I have undertaken this as my GSoC project, Michael Dale and Matthew Flaschen will be my mentors during the course. The primary benefit is laying the groundwork for contributor-created audio to MediaWiki sites in any current browser. I have a done a little bit of research on the method to upload the pronunciations so far and based on that the use of the Upload:API is essential, other API's like the Edit:API will also come handy. The first step that I plan on doing is to add .wav support to the THM extension. Link to my proposal http://www.mediawiki.org/wiki/User:Rahul21/Gsoc
Change 75770 had a related patch set uploaded by Rahul21: Pronunciation Recording Tool( Not working ) https://gerrit.wikimedia.org/r/75770
Change 75770 abandoned by Rahul21: Pronunciation Recording Tool( Not working ) https://gerrit.wikimedia.org/r/75770
6f9c18509b858d89e50d145a685eb5308dcdff7e implemented a special page, Special:PronunciationRecording. That includes support for recording a pronunciation and playing it back. The next main step is allowing uploading to the same wiki where the special page is (bug 53127). There is also now a Bugzilla component for this extension.
GSoC "soft pencils down" date was yesterday and all coding must stop on 23 September. Has this project been completed?
The overall project has not been completed, so Rahul will have to keep working until the final pencil down (September 23, as you noted). The following parts are complete (parts that still need final review are noted). Rahul can add anything I'm missing: * Uploading to the stash is complete. Fitting this into the overall upload flow (initially publishing from the stash to the main File page) is in progress and under review. * Extension and special page setup * WAV support for TimedMediaHandler * Some refactoring to UploadWizard (which PronunciationRecorder is using as a library). Mostly merged, a little more in progress * Upload permissions check (not merged)
This is not fully complete. However, it's complete enough that it could be useful. You can try it at http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording . The main aspect that is not ready is integrating into Wiktionary pages. It also can not currently upload to Commons (it uploads to the current wiki) from another wiki. However, it does generate the Information template and categories needed for Commons.
Actually, use http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording?debug=true due to bug 54351. Also, note that you need to use a modern browser with sufficient Web Audio support. Currently, that probably means Chrome, but Firefox is working on the same standards, so it will eventually work in Firefox and other browsers.
If you have open tasks or bugs left, one possibility is to list them at https://www.mediawiki.org/wiki/Google_Code-In and volunteer yourself as mentor. We have heard from Google and free software projects participating in Code-in that students participating in this programs have done a great work finishing and polishing GSoC projects, many times mentores by the former GSoC student. The key is to be able to split the pending work in little tasks. More information in the wiki page. If you have questions you can ask there or you can contact me directly.
Rahul: Are you (still) working on this? If not, please reset the assignee to default and the status to NEW. Thanks!
I don't know if we should keep this open now that's it's an in-progress extension with its own Bugzilla component. However, if we want to, we can use it to mark when the initial Wiktionary functionality (see https://www.mediawiki.org/wiki/User:Rahul21/Gsoc2013/Proposal#Simple_workflow) is done. Basically, a Minimum Viable Product.
I didn't see any progress here, therefore I re-launched https://meta.wikimedia.org/wiki/Grants:IEG/Finish_Pronunciation_Recording You may see this as a competing product product or a chance to get some useful feedback. Cheers!
(In reply to Matthew Flaschen from comment #42) > This is not fully complete. However, it's complete enough that it could be > useful. You can try it at > http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special: > PronunciationRecording . It's moved to http://pronunciationrecording.wmflabs.org/wiki/Special:PronunciationRecording?debug=true . The new server is open for normal account creations. If anyone would like special access (e.g. an admin account to test gadgets), let me know.