Last modified: 2014-11-18 18:07:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T8220, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 6220 - Shared repositories support for Special:WantedFiles
Shared repositories support for Special:WantedFiles
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Normal normal with 22 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 8683 9924 13314 15688 27107 28580 69391 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-06 13:28 UTC by Eugene Zelenko
Modified: 2014-11-18 18:07 UTC (History)
17 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Eugene Zelenko 2006-06-06 13:28:51 UTC
Will be great to have ability to list all missing files (on both local wiki and
Commons). It could be used for fixing pages referenced to such files.

In any case (if I understand correctly) list of all images are constructed for
Special:Mostimages, so only check for file existence must be added.
Comment 1 Rob Church 2006-07-12 11:44:56 UTC
A special page which loaded a list of all images, then checked for file
existence on each, would be too expensive.

A special page which checks for inline inclusion of images which don't appear to
exist won't work with shared image repositories.
Comment 2 Daniel Kinzler 2006-08-09 19:32:08 UTC
It works fine with shared repositories if there's access to the image table of
the repository - which is needed anyway in order to use it, right? SQL mockup:

SELECT page_namespace, page_title, il_to as img_name
FROM imagelinks
JOIN page ON page_id = il_from
WHERE NOT EXISTS( SELECT * FROM image WHERE img_name = il_to )
AND NOT EXISTS( SELECT * FROM commonswiki.image WHERE img_name = il_to )

Using LEFT JOIN instead of NOT EXISTS would be faster for a full list, but
slower if a limit in the hundrets is used.
Comment 3 Rob Church 2007-01-18 10:13:41 UTC
*** Bug 8683 has been marked as a duplicate of this bug. ***
Comment 4 Rob Church 2007-05-15 20:00:43 UTC
*** Bug 9924 has been marked as a duplicate of this bug. ***
Comment 5 Eugene Zelenko 2008-03-10 19:01:35 UTC
*** Bug 13314 has been marked as a duplicate of this bug. ***
Comment 6 Eugene Zelenko 2008-04-15 20:53:36 UTC

*** This bug has been marked as a duplicate of bug 13702 ***
Comment 7 Siebrand Mazeland 2008-04-15 21:06:17 UTC
Not a dupe. The patch in bug 13702 also does not take shared repositories into account.
Comment 8 Chad H. 2009-03-03 22:29:22 UTC
Broken implementation or not, this is still a dupe to 13702 (or it's a dupe to here, but that bug was marked FIXED :)

*** This bug has been marked as a duplicate of bug 13702 ***
Comment 9 Guillaume Paumier 2009-12-28 19:38:16 UTC
Reopening the bug and making it explicit that it requests support for shared repos.
Comment 10 Krinkle 2010-06-15 21:31:58 UTC
See also https://bugzilla.wikimedia.org/show_bug.cgi?id=15688
Comment 11 p858snake 2010-06-20 01:17:50 UTC
*** Bug 15688 has been marked as a duplicate of this bug. ***
Comment 12 Ilmari Karonen 2010-12-04 16:14:30 UTC
r77725 at least makes images on shared repos show up as struck-out bluelinks instead of redlinks in the output.  It does nothing to fix the actual problem, but at least now you can visually tell the false positives apart from the actually missing files.
Comment 13 Chad H. 2011-02-02 14:50:21 UTC
*** Bug 27107 has been marked as a duplicate of this bug. ***
Comment 14 Alexandre Emsenhuber [IAlex] 2011-04-17 08:43:57 UTC
*** Bug 28580 has been marked as a duplicate of this bug. ***
Comment 15 Nemo 2011-04-19 20:58:57 UTC
This is not an enhancement request, the page like it is just doesn't make any sense. 
Example: http://meta.wikimedia.org/wiki/Special:WantedFiles
Comment 16 Purodha Blissenbach 2011-05-04 08:02:44 UTC
This page as it is lends itself nicely towards amending it to a "List of files used from remote (shared) repositories" one - see bug 28807
Comment 17 Gregor Hagedorn 2011-12-31 13:46:29 UTC
(In reply to comment #12)
> r77725 at least makes images on shared repos show up as struck-out bluelinks
> instead of redlinks in the output.  It does nothing to fix the actual problem,
> but at least now you can visually tell the false positives apart from the
> actually missing files.

Given that this has been achieved, I wonder whether the bug cannot be closed by simply adding a filter option to hide the struck-out bluelinks? I have no insight into the code, but it seems the filter could be added with very little performance loss, provided we don't expect the precise number of returns and the filter automatically switches to a high browsing interval (2000-5000), and adds an explanation like:

"2000/ACTUAL NUMBER files have been found that are not present in the local wiki. Of these, some or many are available in a shared file repository. These are not shown below. As a result, the number or missing files shown is variable."

This may be not ideal, but clearly better than the present consistent, but rather useless behavior. Who is likely to browser through 100s of pages of struck-out blue links to find the truly missing red-links? In fact on metawiki nobody seem to be doing this, so many broken links exist...

--

Mark: you changed priority from Highest to Low without arguing - I think it would be better interaction with the community if you could argue or comment why. In some of your changes that may be evident from previous discussion, here I think not. You may well have much more information than Jan Kucera. Please share it.
Comment 18 Nemo 2011-12-31 14:03:40 UTC
(In reply to comment #18)
> Mark: you changed priority from Highest to Low without arguing - I think it
> would be better interaction with the community if you could argue or comment
> why. In some of your changes that may be evident from previous discussion, here
> I think not. You may well have much more information than Jan Kucera. Please
> share it.

It's not a matter of interaction with the community, you probably missed bug 23816.
As a member of the community who voted for this bug, I'd rather mark it lowest priority or LATER, and disable the special page entirely on WMF wikis (see bug 31491).
Comment 19 Gregor Hagedorn 2011-12-31 14:46:13 UTC
a) I certainly miss bug 23816 if nobody is referring to it. Thank you for doing so!

b) There are certainly multiple "communities" with different opinions here.

c) I don't see through this at all. Either the bug should be closed, and a new one opened, or ... The largest Wikipedias may have reached a number of broken file links that make this functionality less likely to be essential, but smaller Wikis can substantially improve their quality by fixing these errors. I believe many who voted for this bug see this as an important function, even if Nemo_bis does not. It is widely agreed that the present implementation is broken. The bluelink-solution is a very good step, but it is still offputting potential users (the first pages are usually all clean). I am opening a new Bug 33446 in an attempt to focus on my proposal for a possible solution that makes it more likely that editors are willing to research fix broken file links.

I am sure I have overlooked many other things :-)
Comment 20 Nemo 2011-12-31 15:23:54 UTC
(In reply to comment #20)
> b) There are certainly multiple "communities" with different opinions here.

Questionable. 

> c) I don't see through this at all. Either the bug should be closed, and a new
> one opened, or ... 

...we could close this and don't open any.

> The largest Wikipedias may have reached a number of broken
> file links that make this functionality less likely to be essential, but
> smaller Wikis can substantially improve their quality by fixing these errors. 

I don't see any usefulness in this page on any of the (many) small projects I'm active in, now that there's the tracking category.

> I
> believe many who voted for this bug see this as an important function, even if
> Nemo_bis does not. 

Not really, those votes are very old and they all came before the tracking category (mine too).

> It is widely agreed that the present implementation is
> broken. The bluelink-solution is a very good step, but it is still offputting
> potential users (the first pages are usually all clean). I am opening a new Bug
> 33446 in an attempt to focus on my proposal for a possible solution that makes
> it more likely that editors are willing to research fix broken file links.

:/
Comment 21 Gregor Hagedorn 2011-12-31 16:15:20 UTC
(ignoring what is best ignored:) I disagree that Special:WantedPages is redundant. 

However, the basic assumption that it is easier to work by page than by file is, in my opinion, erroneous. A missing file often occurs on dozens of pages. Look at Metawiki (there for multilinguality mostly). In other cases it is because repo files are renamed without keeping redirects. Or, out of old habit, deleted and re-uploaded under a different name.

In cases where a file is missing on dozens of pages, I consider an improved Special:WantedPages desirable.
Comment 22 Bawolff (Brian Wolff) 2012-01-01 10:13:59 UTC
So I have some ideas how to fix this.

Basically, GlobalUsage stores what images that don't exist locally are in use. So I was thinking a query something like:

select '6' as namespace, gil_to as title, count(*) as value from globalimagelinks LEFT JOIN image on gil_to = img_name where img_name is null and gil_wiki = 'jawikinews' group by gil_to order by count(*) DESC;

(Using jawikinews as an example, since it's a smallish size wiki (5480 entries in global usage) thus I can easily test these queries on toolserver). 6 == NS_FILE.

This seemed to work, however with one problem. Image redirects were still included. I'm not sure if that's a globalusage issue (should the links be to the target image) or if its intentional behaviour. Filtering those out in the sql gives:

select '6' as namespace, gil_to as title, count(*) as value from globalimagelinks LEFT JOIN image on gil_to = img_name LEFT JOIN page on (gil_to = page_title and page_namespace=6) where img_name is null and gil_wiki = 'jawikinews' and (page_is_redirect is null or page_is_redirect = 0) group by gil_to order by count(*) DESC;

However, that seems to slow down the query by quite a bit (10 seconds went to 2 minutes). OTOH, the query is slow regardless, and its going to be cached (I'm not sure how slow is too slow). This still would mess up on some edge cases though, such as if the page is a redirect to a non-existant file (or even to something not in NS_FILE). [And of course it doesn't address the more general problem of files from Foreign repos in general. I'm not sure if the general problem is addressable without a schema change]


So possible way forward - Add to GlobalUsage extension a new special page that overrides the built in special:wantedfiles with the new query. Even with the first query i mentioned, it would cut down on false positives significantly.
Comment 23 Krinkle 2012-01-01 23:15:24 UTC
So it determines that a remove file exists by checking if it is used anywhere according to global usage. That's a smart idea. Although maybe not semantically correct, it should be good in practice.

If there is a link to an image on a local wiki and the image doesn't exist on the local wiki, it's going into global usage.

One problem though, right now the system works in such a way that if a file exists neither locally nor in the repository, globalusage catches it, not the local wiki (meaning, it's added to GlobalUsage as a redlink, not to the local wiki as a redlink). This is means four things.

Three good things, which would hold us back from changing this behaviour
* This is used to fix things if a file in the repo was deleted and is restored, the usage in globalusage is still there and can be restored if needed
* This is used by gadget authors to track global usage. They make a comment in the script with the [[File:]] syntax in it with an inexisting file name. Requesting global usage for it will yield locations of copies of the script. This one can be worked around by uploading a bogus image to the repo, were this behavior to change and only tracking usage of existing images.
* It acts a little bit like a global WantedFiles, files that are wanted by multiple wikis.

One bad thing that can compromise Bawolff's proposal:
* Being in globalfileusage does not mean the file exists there...
Comment 24 Krinkle 2012-01-01 23:16:52 UTC
..

* Being in globalfileusage does not mean the file exists there..., just like an entry in the local *links table doesn't mean the target exists.

-

On the other hand, if a connection to globalfileusage is possible, perhaps a connection to the actual repository wiki database is possible as well ? One could (ahum) "simply" check the commonswiki database.
Comment 25 Bawolff (Brian Wolff) 2012-01-01 23:33:04 UTC
[mid air collision]
>This is used to fix things if a file in the repo was deleted and is restored,
>the usage in globalusage is still there and can be restored if needed

I'm not sure I understand. Do you mean If a file at commons is deleted then
restored? the outer join on image should take care of that (I'm assuming that global usage is in the same db as commons is). If you mean the
local file was deleted/restored I assumed that would re-add/delete the entries
in global usage. Is that incorrect?


>* This is used by gadget authors to track global usage. They make a comment in
>the script with the [[File:]] syntax in it with an inexisting file name.
>Requesting global usage for it will yield locations of copies of the script.
>This one can be worked around by uploading a bogus image to the repo, were this
>behavior to change and only tracking usage of existing images.

Hmm, that is an interesting hack. At the end of the day, those would still
appear in special:wantedfiles if it was working properly. I don't really think
we should worry too much about that, having special:wantedfiles into a somewhat
working direction even with such links is an improvement over the current
situation.

>It acts a little bit like a global WantedFiles, files that are wanted by
>multiple wikis.

Well in my example query i filter by gil_wiki to do only one wiki. But we could
also make a special:globallywantedfiles which gives the most wanted file across
all the wikis.


>One bad thing that can compromise Bawolff's proposal:
>* Being in globalfileusage does not mean the file exists there...

I'm not sure I know what you mean. My proposal relies on the fact that there
are entries in globalusage for files that don't exist on the commons repo.
Comment 26 Krinkle 2012-01-02 00:00:05 UTC
Can you join IRC for sec ?
Comment 27 Gerrit Notification Bot 2014-07-03 07:42:53 UTC
Change 143835 had a related patch set uploaded by Brian Wolff:
Make Special:Wantedfiles not include foreign false positives.

https://gerrit.wikimedia.org/r/143835
Comment 28 Bartosz Dziewoński 2014-08-11 10:33:02 UTC
*** Bug 69391 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links