Last modified: 2014-09-22 15:21:52 UTC
Mediawiki has the extracts api function. It should be implemented in Pywikibot too. * prop=extracts (ex) * Returns plain-text or limited HTML extracts of the given page(s) https://www.mediawiki.org/wiki/Extension:TextExtracts#API This module requires read rights Parameters: exchars - How many characters to return, actual text returned might be slightly longer. The value must be no less than 1 exsentences - How many sentences to return The value must be between 1 and 10 exlimit - How many extracts to return No more than 20 (20 for bots) allowed Default: 1 exintro - Return only content before the first section explaintext - Return extracts as plaintext instead of limited HTML exsectionformat - How to format sections in plaintext mode: plain - No formatting wiki - Wikitext-style formatting == like this == raw - This module's internal representation (section titles prefixed with <ASCII 1><ASCII 2><section level><ASCII 2><ASCII 1> One value: plain, wiki, raw Default: wiki excontinue - When more results are available, use this to continue exvariant - Convert content into this language variant` Example: Get a 175-character extract: api.php?action=query&prop=extracts&exchars=175&titles=Therion https://nl.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=175&titles=Nicolaas_IJzendoorn&format=json
How do you intend to use this?
I'm already using it to extract date of birth and date of death. Extracts already gets rid of the infobox template or image so I don't have to do that myself.
Why not extract those dates from the infobox?
A lot of articles don't have an infobox with this information.