Last modified: 2013-06-13 17:26:31 UTC
The world at large would greatly benefit from Wiktionary publishing lists of words and definitions on a regular basis much like Wikimedia publishes raw dump files of all its wikis. I envisage several levels, the rougher ones will be trivial to implement. The better ones will take a little more work. For each English is obivously wanted with all other languages also desired. 1. Raw list of words (page titles). 2. List of words with all "common misspellings" removed. 3. As per 2. but with all inflected forms removed (alternative spellings should stay) 4. List of words per 3 (or possibly 2) with all definitions but lacking information on homonyms, example sentences, quotations, etc 5. As per 3 but with senses clearly separated from homonyms Note that 4 and 5 will require some structure. A very basic XML format seems obvious.
There is Python code for a Wiktionary translation extractor now on toolserver: https://svn.toolserver.org/svnroot/p_enwikt/translations/ It's not fully documented and is no longer maintained by its author but apparently worked on many different Wiktionaries.
Wiktionary dumps are available at http://dumps.wikimedia.org/backup-index.html - closing as WORKSFORME.