Last modified: 2014-09-22 15:21:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72682, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 70682 - Implement extracts in Pywikibot


Summary:	Implement extracts in Pywikibot

Status:	NEW

Product:	Pywikibot
Classification:	Unclassified
Component:	General (Other open bugs)
Version:	core-(2.0)
Hardware:	All All

Importance:	Unprioritized enhancement
Target Milestone:	---
Assigned To:	Pywikipedia bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-09-10 19:48 UTC by Maarten Dammers
Modified:	2014-09-22 15:21 UTC (History)
CC List:	1 user (show)

See Also:	54569
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Maarten Dammers 2014-09-10 19:48:22 UTC

Mediawiki has the extracts api function. It should be implemented in Pywikibot too.

* prop=extracts (ex) *
  Returns plain-text or limited HTML extracts of the given page(s)
  https://www.mediawiki.org/wiki/Extension:TextExtracts#API

This module requires read rights
Parameters:
  exchars             - How many characters to return, actual text returned might be slightly longer.
                        The value must be no less than 1
  exsentences         - How many sentences to return
                        The value must be between 1 and 10
  exlimit             - How many extracts to return
                        No more than 20 (20 for bots) allowed
                        Default: 1
  exintro             - Return only content before the first section
  explaintext         - Return extracts as plaintext instead of limited HTML
  exsectionformat     - How to format sections in plaintext mode:
                         plain - No formatting
                         wiki - Wikitext-style formatting == like this ==
                         raw - This module's internal representation (section titles prefixed with <ASCII 1><ASCII 2><section level><ASCII 2><ASCII 1>
                        One value: plain, wiki, raw
                        Default: wiki
  excontinue          - When more results are available, use this to continue
  exvariant           - Convert content into this language variant`
Example:
  Get a 175-character extract:
    api.php?action=query&prop=extracts&exchars=175&titles=Therion

https://nl.wikipedia.org/w/api.php?action=query&prop=extracts&exchars=175&titles=Nicolaas_IJzendoorn&format=json

Comment 1 John Mark Vandenberg 2014-09-18 09:43:35 UTC

How do you intend to use this?

Comment 2 Maarten Dammers 2014-09-20 09:02:24 UTC

I'm already using it to extract date of birth and date of death. Extracts already gets rid of the infobox template or image so I don't have to do that myself.

Comment 3 John Mark Vandenberg 2014-09-21 12:44:13 UTC

Why not extract those dates from the infobox?

Comment 4 Maarten Dammers 2014-09-22 15:21:52 UTC

A lot of articles don't have an infobox with this information.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links