Last modified: 2013-06-13 17:26:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23164, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21164 - Regularly publish updated word lists and definition lists
Regularly publish updated word lists and definition lists
Status: RESOLVED WORKSFORME
Product: Wiktionary tools
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-17 06:37 UTC by Andrew Dunbar
Modified: 2013-06-13 17:26 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrew Dunbar 2009-10-17 06:37:22 UTC
The world at large would greatly benefit from Wiktionary publishing lists of words and definitions on a regular basis much like Wikimedia publishes raw dump files of all its wikis.

I envisage several levels, the rougher ones will be trivial to implement. The better ones will take a little more work. For each English is obivously wanted with all other languages also desired.

1. Raw list of words (page titles).
2. List of words with all "common misspellings" removed.
3. As per 2. but with all inflected forms removed (alternative spellings should stay)

4. List of words per 3 (or possibly 2) with all definitions but lacking information on homonyms, example sentences, quotations, etc
5. As per 3 but with senses clearly separated from homonyms

Note that 4 and 5 will require some structure. A very basic XML format seems obvious.
Comment 1 Andrew Dunbar 2009-12-27 01:12:03 UTC
There is Python code for a Wiktionary translation extractor now on toolserver: https://svn.toolserver.org/svnroot/p_enwikt/translations/

It's not fully documented and is no longer maintained by its author but apparently worked on many different Wiktionaries.
Comment 2 Andre Klapper 2013-06-13 17:26:31 UTC
Wiktionary dumps are available at http://dumps.wikimedia.org/backup-index.html - closing as WORKSFORME.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links