Last modified: 2014-11-17 10:35:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T42633, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 40633 - Auto-categorize pages that contain invalid HTML
Auto-categorize pages that contain invalid HTML
Status: NEW
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low enhancement with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-30 02:05 UTC by MZMcBride
Modified: 2014-11-17 10:35 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-09-30 02:05:35 UTC
Splitting this from bug 40329, MediaWiki should, if possible, auto-categorize pages that contain invalid HTML attributes or elements. This will allow diligent and caring wiki editors to improve the code that these pages uses, if they feel inclined to.
Comment 1 Antoine "hashar" Musso (WMF) 2012-10-01 07:21:12 UTC
We first need a system to record any invalid HTML, and I would prefer we do not use categories for that but a special page instead.

I am wondering how we will be able to report that error Foo is happening at line XXX, character YYYY.
Comment 2 TMg 2012-10-29 19:05:18 UTC
Don't make this to complicated. A simple category or special page "ContainsDeprecatedFeature" is enough. Use a generic name so it can be used for everything including deprecated parser functions and such. Add a possibility to filter the special page by namespace if you can. Done.

Start with an empty list of deprecated features. Add one feature at a time. I suggest the <font> tag. Let the Wikipedia community know when that specific HTML tag or attribute will be dropped. I suggest something between 3 and 12 months. The community will do the work.

If the special page is almost empty in most Wikipedia languages add the next feature to the list. I suggest the <center> tag. Then <stroke>. Then <big>. Then <tt>. The last one will be align="..." and valign="..." because it's most used and therefor requires a lot of work to replace. (Note: As explained in bug 40329 it is *NOT* possible to simply replace all align="..." with text-align: ... Replacing such stuff always requires a user to look at the code and to understand what it does. Some replacements can be done with a semi-automatic bot or user scripts. But it never should be done by the MediaWiki software.)

We are currently trying to collect possible replacements: http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_HTML5 We would like to start but we need a possibility to search for "<font", for example.
Comment 3 Antoine "hashar" Musso (WMF) 2012-10-30 18:42:58 UTC
My point is that categories should probably not be used as a way to add metadata on articles. Moreover the category table is really huge :/
Comment 4 MZMcBride 2012-10-30 19:49:22 UTC
(In reply to comment #1)
> We first need a system to record any invalid HTML, and I would prefer we do not
> use categories for that but a special page instead.

How do you envision that working? The benefit to categorization is that you can have "lazy-loading": when a page is reparsed, it can be auto-categorized. How would a Special page work?

> I am wondering how we will be able to report that error Foo is happening at
> line XXX, character YYYY.

I'm not sure that's necessary.

(In reply to comment #3)
> My point is that categories should probably not be used as a way to add
> metadata on articles.

Umm, can you expand on this point, please? Categories are _classic_ page metadata, aren't they?

> Moreover the category table is really huge :/

And?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links