Last modified: 2012-01-10 18:26:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T12023, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 10023 - Should return http status code 304 NOT MODIFIED appropriately when accessing a &action=raw page
Should return http status code 304 NOT MODIFIED appropriately when accessing ...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: easy, patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-24 14:52 UTC by Gary van der Merwe
Modified: 2012-01-10 18:26 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
patch for 1.15.1 (3.39 KB, patch)
2009-10-28 23:50 UTC, Nikola Kovacs
Details
patch 2 (3.45 KB, patch)
2009-10-29 01:39 UTC, Nikola Kovacs
Details
Patch for last-checked (717 bytes, patch)
2011-11-20 04:56 UTC, Jose
Details

Description Gary van der Merwe 2007-05-24 14:52:53 UTC
Most web browsers will, if a page is cached, but it is reloading the page, it will send a If-Modified-Since http header, containing the last modified date of the version in the cache. 

MediaWiki should examine the above mentioned header, and if the article has not changed, it should retunr a http status code 304 NOT MODIFIED, with no body, there by instructing the browser to us the version from it cache, and saving bandwidth.

Also investigate writing the article version id to the ETag header and examine the If-None-Match header.

I am going to try do this. I am clear on how to do this for installations where there is now squid server, as I have done this in other cms system I have developed. I have no clue how to get this working with the squid server setup the Wikipedia uses. If anyone has ideas - please let me know.
Comment 1 Gary van der Merwe 2007-05-24 19:49:47 UTC
After I did some investigation, I found that mediawiki already dose this, but not for &action=raw pages. Patch to follow.
Comment 2 Brion Vibber 2007-05-24 19:51:09 UTC
Before you patch, ensure that you're using correct values for the caching parameters on action=raw.
Comment 3 Nikola Kovacs 2009-10-28 23:50:20 UTC
Created attachment 6732 [details]
patch for 1.15.1

I copied the code from OutputPage.php and applied it to RawPage.php
Comment 4 Nikola Kovacs 2009-10-29 01:29:51 UTC
Comment on attachment 6732 [details]
patch for 1.15.1

>--- RawPage.php.ori	2009-01-25 02:31:48.000000000 +0100
>+++ RawPage.php	2009-10-29 01:34:21.044675925 +0100
>@@ -197,6 +197,84 @@
> 		}
> 	}
> 
>+	function checkLastModified( $timestamp ) {
>+		global $wgCachePages, $wgCacheEpoch, $wgUser;
>+		
>+		if ( !$timestamp || $timestamp == '19700101000000' ) {
>+			wfDebug( __METHOD__ . ": CACHE DISABLED, NO TIMESTAMP\n" );
>+			return false;
>+		}
>+		if( !$wgCachePages ) {
>+			wfDebug( __METHOD__ . ": CACHE DISABLED\n", false );
>+			return false;
>+		}
>+		if( $wgUser->getOption( 'nocache' ) ) {
>+			wfDebug( __METHOD__ . ": USER DISABLED CACHE\n", false );
>+			return false;
>+		}
>+
>+		$timestamp = wfTimestamp( TS_MW, $timestamp );
>+		$modifiedTimes = array(
>+			'page' => $timestamp,
>+			'user' => $wgUser->getTouched(),
>+			'epoch' => $wgCacheEpoch
>+		);
>+
>+		$maxModified = max( $modifiedTimes );
>+		$lastmod = wfTimestamp( TS_RFC2822, $maxModified );
>+		header( "Last-modified: $lastmod" );
>+
>+		if( empty( $_SERVER['HTTP_IF_MODIFIED_SINCE'] ) ) {
>+			wfDebug( __METHOD__ . ": client did not send If-Modified-Since header\n", false );
>+			return false;
>+		}
>+
>+		# Make debug info
>+		$info = '';
>+		foreach ( $modifiedTimes as $name => $value ) {
>+			if ( $info !== '' ) {
>+				$info .= ', ';
>+			}
>+			$info .= "$name=" . wfTimestamp( TS_ISO_8601, $value );
>+		}
>+
>+		# IE sends sizes after the date like this:
>+		# Wed, 20 Aug 2003 06:51:19 GMT; length=5202
>+		# this breaks strtotime().
>+		$clientHeader = preg_replace( '/;.*$/', '', $_SERVER["HTTP_IF_MODIFIED_SINCE"] );
>+
>+		wfSuppressWarnings(); // E_STRICT system time bitching
>+		$clientHeaderTime = strtotime( $clientHeader );
>+		wfRestoreWarnings();
>+		if ( !$clientHeaderTime ) {
>+			wfDebug( __METHOD__ . ": unable to parse the client's If-Modified-Since header: $clientHeader\n" );
>+			return false;
>+		}
>+		$clientHeaderTime = wfTimestamp( TS_MW, $clientHeaderTime );
>+
>+		wfDebug( __METHOD__ . ": client sent If-Modified-Since: " . 
>+			wfTimestamp( TS_ISO_8601, $clientHeaderTime ) . "\n", false );
>+		wfDebug( __METHOD__ . ": effective Last-Modified: " . 
>+			wfTimestamp( TS_ISO_8601, $maxModified ) . "\n", false );
>+		if( $clientHeaderTime < $maxModified ) {
>+			wfDebug( __METHOD__ . ": STALE, $info\n", false );
>+			return false;
>+		}
>+
>+		# Not modified
>+		# Give a 304 response code and disable body output 
>+		wfDebug( __METHOD__ . ": NOT MODIFIED, $info\n", false );
>+		ini_set('zlib.output_compression', 0);
>+		header( "HTTP/1.1 304 Not Modified" );
>+
>+		// Don't output a compressed blob when using ob_gzhandler;
>+		// it's technically against HTTP spec and seems to confuse
>+		// Firefox when the response gets split over two packets.
>+		wfClearOutputBuffers();
>+
>+		return true;
>+	}
>+
> 	function getArticleText() {
> 		$found = false;
> 		$text = '';
>@@ -208,13 +286,23 @@
> 				# If the message doesn't exist, return a blank
> 				if( wfEmptyMsg( $key, $text ) )
> 					$text = '';
>+				$rev = Revision::newFromTitle( $this->mTitle, 0 );
>+				if( $rev ) {
>+					$lastmod = /*wfTimestamp( TS_RFC2822,*/ $rev->getTimestamp() /*)*/;
>+					if ( $this->checkLastModified( $lastmod ) ) {
>+						return '';
>+					}
>+				}
> 				$found = true;
> 			} else {
> 				// Get it from the DB
> 				$rev = Revision::newFromTitle( $this->mTitle, $this->mOldId );
> 				if( $rev ) {
>-					$lastmod = wfTimestamp( TS_RFC2822, $rev->getTimestamp() );
>-					header( "Last-modified: $lastmod" );
>+					$lastmod = /*wfTimestamp( TS_RFC2822,*/ $rev->getTimestamp() /*)*/;
>+					//header( "Last-modified: $lastmod" );
>+					if ( $this->checkLastModified( $lastmod ) ) {
>+						return '';
>+					}
> 
> 					if( !is_null($this->mSection ) ) {
> 						global $wgParser;
Comment 5 Nikola Kovacs 2009-10-29 01:33:43 UTC
Comment on attachment 6732 [details]
patch for 1.15.1

--- RawPage.php.ori	2009-01-25 02:31:48.000000000 +0100
+++ RawPage.php	2009-10-29 02:32:10.924676164 +0100
@@ -197,6 +197,84 @@
 		}
 	}
 
+	function checkLastModified( $timestamp ) {
+		global $wgCachePages, $wgCacheEpoch, $wgUser;
+		
+		if ( !$timestamp || $timestamp == '19700101000000' ) {
+			wfDebug( __METHOD__ . ": CACHE DISABLED, NO TIMESTAMP\n" );
+			return false;
+		}
+		if( !$wgCachePages ) {
+			wfDebug( __METHOD__ . ": CACHE DISABLED\n", false );
+			return false;
+		}
+		if( $wgUser->getOption( 'nocache' ) ) {
+			wfDebug( __METHOD__ . ": USER DISABLED CACHE\n", false );
+			return false;
+		}
+
+		$timestamp = wfTimestamp( TS_MW, $timestamp );
+		$modifiedTimes = array(
+			'page' => $timestamp,
+			'user' => $wgUser->getTouched(),
+			'epoch' => $wgCacheEpoch
+		);
+
+		$maxModified = max( $modifiedTimes );
+		$lastmod = wfTimestamp( TS_RFC2822, $maxModified );
+		header( "Last-modified: $lastmod" );
+
+		if( empty( $_SERVER['HTTP_IF_MODIFIED_SINCE'] ) ) {
+			wfDebug( __METHOD__ . ": client did not send If-Modified-Since header\n", false );
+			return false;
+		}
+
+		# Make debug info
+		$info = '';
+		foreach ( $modifiedTimes as $name => $value ) {
+			if ( $info !== '' ) {
+				$info .= ', ';
+			}
+			$info .= "$name=" . wfTimestamp( TS_ISO_8601, $value );
+		}
+
+		# IE sends sizes after the date like this:
+		# Wed, 20 Aug 2003 06:51:19 GMT; length=5202
+		# this breaks strtotime().
+		$clientHeader = preg_replace( '/;.*$/', '', $_SERVER["HTTP_IF_MODIFIED_SINCE"] );
+
+		wfSuppressWarnings(); // E_STRICT system time bitching
+		$clientHeaderTime = strtotime( $clientHeader );
+		wfRestoreWarnings();
+		if ( !$clientHeaderTime ) {
+			wfDebug( __METHOD__ . ": unable to parse the client's If-Modified-Since header: $clientHeader\n" );
+			return false;
+		}
+		$clientHeaderTime = wfTimestamp( TS_MW, $clientHeaderTime );
+
+		wfDebug( __METHOD__ . ": client sent If-Modified-Since: " . 
+			wfTimestamp( TS_ISO_8601, $clientHeaderTime ) . "\n", false );
+		wfDebug( __METHOD__ . ": effective Last-Modified: " . 
+			wfTimestamp( TS_ISO_8601, $maxModified ) . "\n", false );
+		if( $clientHeaderTime < $maxModified ) {
+			wfDebug( __METHOD__ . ": STALE, $info\n", false );
+			return false;
+		}
+
+		# Not modified
+		# Give a 304 response code and disable body output 
+		wfDebug( __METHOD__ . ": NOT MODIFIED, $info\n", false );
+		ini_set('zlib.output_compression', 0);
+		header( "HTTP/1.1 304 Not Modified" );
+
+		// Don't output a compressed blob when using ob_gzhandler;
+		// it's technically against HTTP spec and seems to confuse
+		// Firefox when the response gets split over two packets.
+		wfClearOutputBuffers();
+
+		return true;
+	}
+
 	function getArticleText() {
 		$found = false;
 		$text = '';
@@ -208,13 +286,22 @@
 				# If the message doesn't exist, return a blank
 				if( wfEmptyMsg( $key, $text ) )
 					$text = '';
+				$rev = Revision::newFromTitle( $this->mTitle, 0 );
+				if( $rev ) {
+					$lastmod = $rev->getTimestamp();
+					if ( $this->checkLastModified( $lastmod ) ) {
+						return '';
+					}
+				}
 				$found = true;
 			} else {
 				// Get it from the DB
 				$rev = Revision::newFromTitle( $this->mTitle, $this->mOldId );
 				if( $rev ) {
-					$lastmod = wfTimestamp( TS_RFC2822, $rev->getTimestamp() );
-					header( "Last-modified: $lastmod" );
+					$lastmod = $rev->getTimestamp();
+					if ( $this->checkLastModified( $lastmod ) ) {
+						return '';
+					}
 
 					if( !is_null($this->mSection ) ) {
 						global $wgParser;
Comment 6 Nikola Kovacs 2009-10-29 01:39:09 UTC
Created attachment 6733 [details]
patch 2

Fixed it to return an empty string if not modified to prevent further processing.
Comment 7 p858snake 2011-04-30 00:09:47 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 8 Sumana Harihareswara 2011-11-09 03:50:01 UTC
+need-review to signal to developers that this patch needs reviewing. Nicola, thank you for your patch, and my apologies that it's been so long since you submitted it and you haven't gotten a response yet.  I hope we will review it soon.
Comment 9 Gabriel Wicke 2011-11-09 20:44:48 UTC
Adding the following near the top of onView in RawAction.php also checks the last-modified time with much less code and duplication:

if ( $this->getOutput()->checkLastModified( $this->page->getTouched() ) ) {
	return; // Client cache fresh and headers sent, nothing more to do.
}

Unfortunately this triggers the page data to be loaded in WikiPage, which should be more expensive than what is currently done in RawAction. To prevent the expensive load, an option would be to retrieve the revision outside getRawText and use the returned timestamp to call checkLastModified. The full text retrieval can then be avoided if checkLastModified returns true.
Comment 10 Sumana Harihareswara 2011-11-09 20:56:09 UTC
Nikola, sorry for misspelling your name earlier.

Gabriel noted that the patch you wrote duplicates code unnecessarily; "might work though, but won't get the fixes the central function gets."  I'm marking your patch as reviewed.

Also adding the "easy" keyword since Gabriel has a hint in comment 9 on how to fix this with a few minutes' work.
Comment 11 Jose 2011-11-20 04:56:47 UTC
Created attachment 9498 [details]
Patch for last-checked

Fix to check last-modified and return 304 code HTTP.
Comment 12 Jose 2011-11-20 04:58:47 UTC
(In reply to comment #11)
> Created attachment 9498 [details]
> Patch for last-checked
> 
> Fix to check last-modified and return 304 code HTTP.

this also revises previous patch.
Comment 13 Gabriel Wicke 2012-01-10 18:21:17 UTC
Committed the simple version in r10023. Performance impact for plain GETs was not too bad (drop from 108 to 104 req/second), and should be offset by unmodified responses. Performance for plain GETs could still be improved though, as alluded to in comment #9.
Comment 14 Sumana Harihareswara 2012-01-10 18:26:25 UTC
The commit is in r108529 .  Thanks, Jose.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links