Last modified: 2011-03-13 18:04:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3843, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 1843 - Automatic archiving


Summary:	Automatic archiving

Status:	CLOSED WONTFIX

Product:	MediaWiki
Classification:	Unclassified
Component:	Page editing (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement with 2 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Duplicates:	23814 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2005-04-08 05:59 UTC by Brian Jason Drake
Modified:	2011-03-13 18:04 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Brian Jason Drake 2005-04-08 05:59:56 UTC

On talk pages, as well as an [edit] link, there should be 
an [archive] link that will take you to a list of existing 
archives for that talk page that you can add that section 
to. This should significantly reduce the amount of time 
spent refactoring talk pages.

Comment 1 David Kernow 2005-11-01 02:55:48 UTC

I very much second this idea - thanks Brian!

Comment 2 Brion Vibber 2005-11-01 02:59:10 UTC

How would you define archives in a machine-understandable way?

Comment 3 Brian Jason Drake 2005-11-01 05:35:15 UTC

(In reply to comment #2)
> How would you define archives in a machine-
understandable way?

I don't understand this.

Comment 4 Brion Vibber 2005-11-01 05:39:03 UTC

As the computer, I don't necessarily know what "existing archives for that talk 
page" are. Can you explain how one would go about looking for them in a regular, 
mechanistic way, given a talk page?

How would they work? What would they look like? How are they organized?

Comment 5 Brian Jason Drake 2005-11-01 06:34:59 UTC

This is not my area of expertise; however, others might 
have ideas. I was thinking that the users could define 
the archives, or the computer could create a new page 
to be an archive, since it seems like most talk pages 
don't have archives.

Comment 6 David Kernow 2005-11-01 07:29:24 UTC

First thoughts: When the software detects that a talk page exceeds a certain
size (say 48kb) it automatically creates a new archive webpage and, working from
== Heading == to == Heading ==, determines how many posts it would need to move
from the top of the page to the new archive page such that the talk page drops
below (say) 16kb in size.

Example:
   Talk:Blah is detected to be 50kb. No archives previously created.
   Software creates Talk:Blah/Archive 1
   Software determines first to fourth post by == Heading == to be (say) 30kb,
but first to
      fifth post to be 43kb. 50kb less 30kb not < 16kb, but 50kb less 43kb < 16kb.
   Software therefore moves first to fifth posts from Talk:Blah to
Talk:Blah/Archive 1
   Talk:Blah now 7kb, Talk:Blah/Archive 1 is 43kb.
   The next time Talk:Blah > 48kb, software detects Talk:Blah/Archive 1 already
created,
      so next new archive would be /Archive 2.
   ...etc.

The archive names could be more sophisticated, e.g. the software detects
earliest and most recent months included in (say) signatures and then names
archive using those months to give some idea of timespan covered (e.g. /Archive
1 - Jun 2005 to Oct 2005).

Yes, this would probably mean the archiving process would need to be taken out
of the hands of (standard) users so that the archive-naming process etc would be
kept consistent by the software (and therefore remain useable by the software).

...Well, something like that. I'd offer to code it myself, except I don't know
mainstream languages. I certainly think something is possible and would benefit
Wikimedia if implemented, if only to standardise the creation and handling of
archive material.

Comment 7 Brian Jason Drake 2005-11-01 07:40:33 UTC

If you were to do that, 32kb might be better, since I 
think that is the threshold for displaying a warning on 
the edit page that some browsers can't handle it.

However, I just thought of an even better idea: Instead 
of arbitarily grouping topics together into "archive" 
pages (are they really archives if the topics are still 
active?) or under specific talk pages (some sections 
are posted multiple times; this creates other 
problems), store each topic separately and group them 
into pages for viewing.

In most cases, it should be sufficient to view the 
contents page and then either view one topic or add a 
new one.

However, the page may have to be refactored, in which 
case it might be good to group certain topics together.

There should be an easy way of finding discussions you 
participated in that have not been resolved.

For those who are just looking for questions to answer, 
there should be an easy way of doing just that.

Comment 8 David Kernow 2005-11-01 14:01:42 UTC

I suggested limits such as 16kb and 48kb to give the software some bandwidth to
work in, otherwise with a particularly active talk page it might start trying to
create archive pages too quickly in succession. Then again, that might not be
much of a problem.

I see your thinking re storing each topic (which I take means 'thread under a ==
Heading ==') separately but wonder if this might overly multiply the number of
pages and/or parameters to be handled by a wiki... This would also need someone
in the know to comment.

Re using dates and deciding whether or not topics are still active: Yes, this
would be heuristic, but, in addition to using posts' dates (not in signatures, I
realise, but the date recorded when submitted or last edited) a sufficiently
sophisticated routine should be able to retain topics that, although begun
sometime previously, have seen recent addition or editing. It wouldn't be
foolproof, but I reckon the trade-off between the software overzealously or
underwhelmingly archiving material should probably be acceptable. Again, this
really needs a developer to evaluate and comment on.

Thanks for your interest - I hope something along the lines we're discussing can
be implemented.

Comment 9 Rowan Collins [IMSoP] 2005-11-02 16:51:43 UTC

Hm, I'm not sure I like the idea of the software *automatically* archiving
discussions; as with so many other things, I think we should think more in terms
of tools that *assist* a user in doing this. My main reason for this is that,
although they sometimes seem that way, current discussion pages are *not*
threaded forums, they are a free-form page, and the sections may be arranged in
all sorts of ways, and subject to rearranging, refactoring, splitting, merging,
etc. There's also no straight-forward way - within the current setup - for the
software to determine when a section was last editted; it could be computed from
the history of the page as a whole, but that would involve analysing a lot of
diffs...

A much simpler, and more flexible, feature would be a way of selecting (tick-box
style) one or more sections of a page, and telling the software to append them
to a given page. So, a user would view Talk:Blah and click an "archive" tab (or
perhaps a more general label?); they would then tick the boxes for the
discussions which seem to have "concluded", and supply a pagename (such as
Talk:Blah/Archive1) and those sections would be appended to that page. If the
page didn't exist, it could of course be automatically created first.

Meanwhile, there's a lot of discussion on the wikitech-l mailing list right now
about the more fundamental problems of discussion pages. See
http://www.mediawiki.org/wiki/Communication for details of how to access it.

Comment 10 David Kernow 2005-11-04 08:11:27 UTC

Thanks for your comments, Rowan; your idea of adding an "archive" tab followed
by checklist is much better and probably far easier to code. The routine could
also add a link to the (newly-created) archive page at the top of the talk page,
perhaps with a TOC-style list of the discussions (i.e. headings) in the archive
- but, unlike a TOC, with a default state of hidden (so that clicking on a
'show' link beside it would display the list).

I now second Rowan's idea - and if any developers are (still) following this
thread, would appreciate some idea if it plus my TOC-style idea stand a chance
of being incorporated.

Thanks also, Rowan, for the pointer to the wikitech-l discussion. I certainly
believe adding something like your "archive" tab along with the opportunity to
standarise how and where archive page names are created and placed would improve
the maintenance of talk pages considerably. Would you say I need to sign onto
the list and add this comment?

Comment 11 Brian Jason Drake 2005-11-05 08:16:37 UTC

We seem to like our current "free-form" talk pages; yet 
we don't use them - we use mailing lists that are much 
more difficult to work with than the talk pages, even 
when using a newsreader.

Comment 12 Rob Church 2005-11-05 17:00:54 UTC

Talk pages are for discussing particular articles. The mailing list is for
discussing particular issues on the wikis.

Comment 13 Brian Jason Drake 2005-11-07 06:40:39 UTC

Why can't talk pages be used for discussing "issues" on 
the wikis as well (they are already used for discussing 
some of them)?

Comment 14 Rob Church 2005-11-07 07:41:17 UTC

Some of them are. Some of the Wikimedia wikis have specific pages and talk pages
devoted to discussing policies, ideas, guidelines, best practises, etc. -
examples springing to mind include the Village Pump on the English Wikipedia;
the Water Cooler on Wikinews, etc.

At the development end of things, it's not up to us to dictate how the
individual projects use the functionality; that's their job.

Comment 15 Filip Maljkovic [Dungodung] 2005-11-07 19:44:54 UTC

Automatic archiving would require a daemon, which is not very good. I like the
variant with the button (or a tab) for archiving, although a separate subpage
for every talk page is pretty good too (I image the main talk page to be the
table of contents here).

Comment 16 Brian Jason Drake 2005-11-08 07:25:07 UTC

(In reply to comment #14)
> Some of them are. Some of the Wikimedia wikis have 
specific pages and talk pages devoted to discussing 
policies, ideas, guidelines, best practises, etc. -
examples springing to mind include the Village Pump on 
the English Wikipedia; the Water Cooler on Wikinews, 
etc. At the development end of things, it's not up to 
us to dictate how the individual projects use the 
functionality; that's their job.

What is to stop us from using freeform talk pages too?

Of course it's not up to us to dictate how individual 
projects use their pages.

Comment 17 Brian Jason Drake 2005-11-08 07:26:26 UTC

(In reply to comment #15)
> Automatic archiving would require a daemon, which is 
not very good. I like the variant with the button (or a 
tab) for archiving, although a separate subpage for 
every talk page is pretty good too (I image the main 
talk page to be the table of contents here).

Why would automatic archiving require a daemon?

Comment 18 Filip Maljkovic [Dungodung] 2005-11-08 10:31:24 UTC

There has to be a process that will run occasionally (or all the time) in  the
background which will determine whether a page needs archiving or not. And
daemons arn't really good for web applications.

Comment 19 Brian Jason Drake 2005-11-08 11:11:13 UTC

The page only needs to be checked:
* once, when the automatic process is introduced,
* once, when the automatic process is updated, or
* when it is updated.

Comment 20 David Kernow 2005-11-09 02:06:44 UTC

I suggest we turn from thinking of -automatic- processes (which would place
further demands on software and its speed) to something like the 'archive' tab
idea above. Talk pages where each thread involves votes (e.g. VfD-type pages)
could also place [archive] links beside the [edit] links so visitors could
easily archive votes that have been completed. Yes, perhaps more open to abuse,
but as easily revertable as usual.

Are any developers (still) reading this and care to comment?

Comment 21 Brian Jason Drake 2005-11-10 08:03:05 UTC

This idea was discussed in comments 9 and 10, and if we 
insist on manual processes, it sounds good.

However, it would be nice to have "archive" links on 
*all* sections in *all* talk pages, so I can decide 
that a section seems to have "concluded" and archive it 
*immediately*.

Comment 22 Steve Sanbeg 2006-10-25 18:49:01 UTC

There should be ways of simplifying the archive without using a daemon.

One example, if we had the labeled section transclusion feature described in
[[wikisource:project:Labeled section transclusion]] and bug #5881, then you
could mark closed discusions like:

<section begin=closed>
==some topic==

this is resolved
<section end=closed>

So that you can easily find the closed discussions between the markers.

Then, you could archive by creating a new page like [[talk:page/archive 1]] with
the contents:
{{subst:talk:page|include=closed}}

Which would substitute only the closed discussions into the archive. 
Conversely, you could replace the contents of talk:page with
{{subst:talk:page|exclude|closed}}

Which would grab the contents of the current page, minus the closed discussions.

Although that was written for something completely different, it works pretty
well  there.

Other alternatives would be to mark the begin/end of closed discussions with a
template, and have a bot check the size, and move closed discussions to a new
archive; or write a custom extension with markers for the begin/end of each
closed discussion; where the extension would check the page size/number of
closed discussions, etc, and add a job to the job queue to archive the page if
needed.  Of course, that assumes they would allow making visible edits from the
job queue.

Comment 23 Brian Jason Drake 2006-10-26 00:18:05 UTC

We scan through every page every time it's saved anyway, to check for banned links; 
how hard would it be to check for whatever marker we used for closed discussions and 
move them to the archive immediately, when the page is saved?

Comment 24 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-10-26 02:48:03 UTC

This will presumably be automatic when LiquidThreads is finally finished, so I
doubt anyone will waste their time putting in the effort only for it to become
completely obsolete in a year or whatever.  And yes, this would require a fair
bit of effort, which could be better invested elsewhere.  Bots work fine for now.

Comment 25 Brian Jason Drake 2006-10-27 00:49:37 UTC

(In reply to comment #24)
> This will presumably be automatic when LiquidThreads is finally finished, so Idoubt 
anyone will waste their time putting in the effort only for it to becomecompletely 
obsolete in a year or whatever. And yes, this would require a fairbit of effort, 
which could be better invested elsewhere. Bots work fine for now.

(bug 1234, [[Meta:LiquidThreads]])

Comment 26 ipatrol 2009-06-06 14:50:14 UTC

Stale

Comment 27 Chad H. 2010-06-07 11:45:12 UTC

*** Bug 23814 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links