Last modified: 2005-12-07 15:06:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 3848 - googlebot shouldn't be given spoilers
googlebot shouldn't be given spoilers
Status: CLOSED INVALID
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://wikipedia.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-31 15:33 UTC by Stefan Monov
Modified: 2005-12-07 15:06 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Stefan Monov 2005-10-31 15:33:30 UTC
IMPORTANT!!! DO *NOT* READ THIS BUGREPORT IF YOU HAVEN'T READ THE BOOK "Harry   
Potter and the Half-Blood Prince" AND YOU PLAN TO READ IT!  
-see the bugreport below-  
...  
...  
...  
...  
...  
...  
...  
...  
How to reproduce:  
1. Query Google for "half blood prince"  
2. See the fourth result - it reads:  
  
Harry Potter and the Half-Blood Prince - Wikipedia, the free ...  
For information on the character, see Half-Blood Prince (character). ...  
 Harry pursues Snape, who identifies himself as the Half-Blood Prince before fleeing ...  
en.wikipedia.org/wiki/Harry_ Potter_and_the_Half-Blood_Prince - 47k -  Cached - Similar pages  
  
As you can see, anybody who searches for information about the book in Google will be spoiled  
instantly. Internally, Wikipedia has the solution - the {{spoiler}} template. However, in  
Google searches the warning does not display. Therefore another approach is needed.  
Expected behavior:  
1. On article access check the user-agent string sent.  
2. If it's Googlebot's, return the page with all spoilers replaced by "---SPOILER---" or  
something similar. Spoilers be written in articles like <spoiler>dumbledore dies</spoiler> 
(or other markup).
Comment 1 Ævar Arnfjörð Bjarmason 2005-10-31 15:46:36 UTC
INVALID, google has a "Dissatisfied? Help us improve" link on every search
result, I suggest you use it.

Comment 2 Mathias Schindler 2005-10-31 15:53:44 UTC
Snape kills dumbledore.
Comment 3 Stefan Monov 2005-10-31 16:32:13 UTC
>INVALID, google has a "Dissatisfied? Help us improve" link on every search result, I     
suggest you use it.     
   
How's a computer algorithm supposed to recognize something as abstract as plot spoilers?   
No, users should cope with it. Please, before replying, think whether it's really   
possible a satiable antispoil algorithm in Google to work. 
 
>Snape kills dumbledore. 
No comment... 
Comment 4 Ævar Arnfjörð Bjarmason 2005-10-31 16:38:22 UTC
(In reply to comment #3)
> >INVALID, google has a "Dissatisfied? Help us improve" link on every search
result, I     
> suggest you use it.     
>    
> How's a computer algorithm supposed to recognize something as abstract as plot
spoilers?   
> No, users should cope with it. Please, before replying, think whether it's
really   
> possible a satiable antispoil algorithm in Google to work.

Well that's something for the google people to work out, not us.

Comment 5 Stefan Monov 2005-10-31 22:14:12 UTC
I don't agree, but I have nothing else to say. I used the method you recommended. 
Comment 6 Rob Church 2005-11-01 01:05:37 UTC
It is not technically possible for us to prevent Google from indexing spoilers -
how is the software supposed to know what is a spoiler and what isn't?
Comment 7 Stefan Monov 2005-11-01 08:49:40 UTC
>It is not technically possible for us to prevent Google from indexing spoilers -  
 how is the software supposed to know what is a spoiler and what isn't? 
 
I wrote that above. For example, we surround spoiling parts with <spoiler></spoiler> manually. As I 
said, IMO it's humans' job to tell spoilers apart from regular text. 
Comment 8 Ævar Arnfjörð Bjarmason 2005-12-07 14:19:29 UTC
See also: http://lists.w3.org/Archives/Public/www-html/2005Dec/0009.html
Comment 9 Rob Church 2005-12-07 14:23:44 UTC
(In reply to comment #7)
> >It is not technically possible for us to prevent Google from indexing
spoilers -  
>  how is the software supposed to know what is a spoiler and what isn't? 
>  
> I wrote that above. For example, we surround spoiling parts with
<spoiler></spoiler> manually. As I 
> said, IMO it's humans' job to tell spoilers apart from regular text. 

So? Surround something in those tags either at the wikitext or XHTML level, and
you'll find it's ignored at the former and rejected as invalid XHTML at the
latter. And how does that stop GoogleBot seeing it?
Comment 10 Stefan Monov 2005-12-07 15:06:52 UTC
(In reply to comment #9)      
> (In reply to comment #7)      
> > >It is not technically possible for us to prevent Google from indexing      
> spoilers -        
> >  how is the software supposed to know what is a spoiler and what isn't?       
> >        
> > I wrote that above. For example, we surround spoiling parts with      
> <spoiler></spoiler> manually. As I       
> > said, IMO it's humans' job to tell spoilers apart from regular text.       
>       
> So? Surround something in those tags either at the wikitext or XHTML level, and      
> you'll find it's ignored at the former and rejected as invalid XHTML at the      
> latter. And how does that stop GoogleBot seeing it?      
I'm sorry, apparently I didn't make myself clear. What I meant was that some markup      
analogous to <spoiler></spoiler> has to be added to the wikicode specs, meaning that    
MediaWiki itself should be changed.      
 (In reply to comment #8)     
> See also: http://lists.w3.org/Archives/Public/www-html/2005Dec/0009.html     
Now that I read this, I realized that indeed such a thing would be much better off in     
general XHTML. However, as seen at  
http://lists.w3.org/Archives/Public/www-html/2005Dec/0021.html  
the proposal seems to have been declined. I'm probably going to argue with them, 
because I don't agree with some of their points. For now I am convinced that the 
INVALID resolution fits this bug. 

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links