Last modified: 2014-07-07 12:09:23 UTC
SUMMARY: I propose that each page have an "attention score" describing how often
the page has been edited. This feature is intended to address the "Siegenthaler
problem" by indicating whether a page has been lightly or heavily edited.
The "attention score" would be calculated from the data displayed in page
history, and would ideally have the following features:
(1) Attention scores would be higher if more total edits were made;
(2) Attention scores would be higher if more unique authors made edits;
(3) Attention scores would be higher if the time between edits were lower.
Item (3) is the key point, as a page that has gone through fierce editing (or an
"edit war") should be distinguished from one that hasn't been gone over
carefully. Since (1) and (3) are counting tasks, the algorithm should be O(n)
with respect to number of edits. Including (2) might cause the algorithm to be
O(n^2), however; the exact impact on Wikipedia server load would depend on the
number of unique authors, and might need to be established empiracally.
Another possible feature of an "attention score" would be the following;
(4) Attention scores would be higher if the page was edited recently.
However, this might be dropped for computational expense reasons. If only
factors (1) - (3) were involved, the attention score would only need to be
calculated once per edit; adding (4) would require a dynamic calculation with
each page view.
The advantages of such a feature are twofold:
(1) it allows any user to make a snap judgement as to whether the page is rarely
edited (and therefore potentially questionable) or heavily edited (and
therefore, if not necessarily trustworthy, at least well examined).
(2) it allows for the possibility of editors targeting either lightly or heavily
edited pages, as necessary, using one convienent statistic.
I made a similar proposal at
In my opinion you don't even need a calcuted "attention score". It would already
be helpful to indicate 1. the number of registered users, who edited a page, 2.
the number of respective anonymous users (IPs), 3. the total number of edits.
This data could be presented quite simple, like (34, 65, 340). This offers more
transparancy than a calculated attention score. Nevertheless, I agree that a
short resumee of the page history should be presented.
Please compare this study: Andrew Lih, "Wikipedia as Participatory Journalism:
Reliable Sources? Metrics for evaluating collaborative media as a news resource"
Lih writes: For the purposes of the study, two metrics are used as a simple
measure for the
reputation of the article within the Wikipedia:
• Rigor (total number of edits for an article) – The assumption is that more
editing cycles on an article provides for a deeper treatment of the subject or
more scrutiny of the content. In Wikipedia, edits can be marked as major or
minor, with the latter used for indicating something that can largely be ignored
by others and inconsequential to the overall editorial position, such as fixing a
typo or reformatting the page. Since this is a voluntary flag, and the use of the
minor edit flag is inconsistent, at this time the study considers all edits, major
or minor, as equal. In the future, a more intelligent decision could be used with
minor edits in combination with the edit comments.
• Diversity (total number of unique users) – With more editors, there are more
voices and different points of view for a given subject. Users come in the
form of registered users (ie. User:Bob) or anonymous users, who do not
register but show up as Internet addresses (ie. 192.168.0.10). The study tracks
the number of unique users who have edited the article in question, regardless
of whether they are registered or anonymous.
The problem with this is that articles with low attention scores will be targeted.