Last modified: 2014-05-12 02:49:46 UTC
It's handy to have a means of summarizing or describing page contents, so as to generate meta descriptions tags, blurbs for inclusion in feeds, etc. Several approaches have been tried: 1) Grabbing the first x characters of an article without regard to where sentences cut off (e.g. [[mw:Extension:Blurb]] or [mw:Extension:TextExtracts]]) 2) Using a template, e.g. {{PageSummary|'''[[Humility]]''' is a psychological state, that is the opposite of [[dominance]]. |Humility allows one to see the intrinsic value of others (as opposed to only extrinsic value), and is therefore the largest factor of [[empathy]]. A person with humility therefore sees minors as having intrinsic value, as contrasted with being objects of domination, which they are mostly regarded as being by the laws and practices of the status quo. Like dominance, humility is an innate psychological trait.}} See docs at http://childwiki.net/wiki/Template:PageSummary . This is implemented by [[mw:Extension:BedellPenDragon]] Notice that there are two parameters here, parameter #1 for the first sentence of the lead and parameter #2 for the remainder of the lead. 3) Adding/modifying the description by means of a separate text box ([[mw:Extension:Advanced_Meta]]) or separate page ([[mw:Extension:ExplicitDescription]]) from the article text or Wikidata. Ideally, we could implement a feature to automatically grab the first sentence of the lead; however, it's hard for software to detect the ends of sentences, since punctuation marks such as the period can appear in the middle of sentences ("Afterward, Mr. Brown went to the U.S. District Courthouse . . . and when he came back, everyone was gone.") If you have any ideas on the best way to do this, feel free to post them. Thanks.
MobileFrontend implemented that for one of their APIs: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=extracts&format=json&exlimit=1&exintro=&explaintext=&titles=Barack_Obama It was supposedly migrated to https://www.mediawiki.org/wiki/Extension:TextExtracts since.
*** Bug 5335 has been marked as a duplicate of this bug. ***
(In reply to Nathan Larson from comment #0) > Ideally, we could implement a feature to automatically grab the first > sentence of the lead; however, it's hard for software to detect the ends of > sentences, since punctuation marks such as the period can appear in the > middle of sentences ("Afterward, Mr. Brown went to the U.S. District > Courthouse . . . and when he came back, everyone was gone.") > > If you have any ideas on the best way to do this, feel free to post them. > Thanks. For TextExtracts, that would be bug 57669. And yeah, any insights on better sentence handling would be highly appreciated:)