Last modified: 2012-01-10 18:26:25 UTC
Most web browsers will, if a page is cached, but it is reloading the page, it will send a If-Modified-Since http header, containing the last modified date of the version in the cache. MediaWiki should examine the above mentioned header, and if the article has not changed, it should retunr a http status code 304 NOT MODIFIED, with no body, there by instructing the browser to us the version from it cache, and saving bandwidth. Also investigate writing the article version id to the ETag header and examine the If-None-Match header. I am going to try do this. I am clear on how to do this for installations where there is now squid server, as I have done this in other cms system I have developed. I have no clue how to get this working with the squid server setup the Wikipedia uses. If anyone has ideas - please let me know.
After I did some investigation, I found that mediawiki already dose this, but not for &action=raw pages. Patch to follow.
Before you patch, ensure that you're using correct values for the caching parameters on action=raw.
Created attachment 6732 [details] patch for 1.15.1 I copied the code from OutputPage.php and applied it to RawPage.php
Comment on attachment 6732 [details] patch for 1.15.1 >--- RawPage.php.ori 2009-01-25 02:31:48.000000000 +0100 >+++ RawPage.php 2009-10-29 01:34:21.044675925 +0100 >@@ -197,6 +197,84 @@ > } > } > >+ function checkLastModified( $timestamp ) { >+ global $wgCachePages, $wgCacheEpoch, $wgUser; >+ >+ if ( !$timestamp || $timestamp == '19700101000000' ) { >+ wfDebug( __METHOD__ . ": CACHE DISABLED, NO TIMESTAMP\n" ); >+ return false; >+ } >+ if( !$wgCachePages ) { >+ wfDebug( __METHOD__ . ": CACHE DISABLED\n", false ); >+ return false; >+ } >+ if( $wgUser->getOption( 'nocache' ) ) { >+ wfDebug( __METHOD__ . ": USER DISABLED CACHE\n", false ); >+ return false; >+ } >+ >+ $timestamp = wfTimestamp( TS_MW, $timestamp ); >+ $modifiedTimes = array( >+ 'page' => $timestamp, >+ 'user' => $wgUser->getTouched(), >+ 'epoch' => $wgCacheEpoch >+ ); >+ >+ $maxModified = max( $modifiedTimes ); >+ $lastmod = wfTimestamp( TS_RFC2822, $maxModified ); >+ header( "Last-modified: $lastmod" ); >+ >+ if( empty( $_SERVER['HTTP_IF_MODIFIED_SINCE'] ) ) { >+ wfDebug( __METHOD__ . ": client did not send If-Modified-Since header\n", false ); >+ return false; >+ } >+ >+ # Make debug info >+ $info = ''; >+ foreach ( $modifiedTimes as $name => $value ) { >+ if ( $info !== '' ) { >+ $info .= ', '; >+ } >+ $info .= "$name=" . wfTimestamp( TS_ISO_8601, $value ); >+ } >+ >+ # IE sends sizes after the date like this: >+ # Wed, 20 Aug 2003 06:51:19 GMT; length=5202 >+ # this breaks strtotime(). >+ $clientHeader = preg_replace( '/;.*$/', '', $_SERVER["HTTP_IF_MODIFIED_SINCE"] ); >+ >+ wfSuppressWarnings(); // E_STRICT system time bitching >+ $clientHeaderTime = strtotime( $clientHeader ); >+ wfRestoreWarnings(); >+ if ( !$clientHeaderTime ) { >+ wfDebug( __METHOD__ . ": unable to parse the client's If-Modified-Since header: $clientHeader\n" ); >+ return false; >+ } >+ $clientHeaderTime = wfTimestamp( TS_MW, $clientHeaderTime ); >+ >+ wfDebug( __METHOD__ . ": client sent If-Modified-Since: " . >+ wfTimestamp( TS_ISO_8601, $clientHeaderTime ) . "\n", false ); >+ wfDebug( __METHOD__ . ": effective Last-Modified: " . >+ wfTimestamp( TS_ISO_8601, $maxModified ) . "\n", false ); >+ if( $clientHeaderTime < $maxModified ) { >+ wfDebug( __METHOD__ . ": STALE, $info\n", false ); >+ return false; >+ } >+ >+ # Not modified >+ # Give a 304 response code and disable body output >+ wfDebug( __METHOD__ . ": NOT MODIFIED, $info\n", false ); >+ ini_set('zlib.output_compression', 0); >+ header( "HTTP/1.1 304 Not Modified" ); >+ >+ // Don't output a compressed blob when using ob_gzhandler; >+ // it's technically against HTTP spec and seems to confuse >+ // Firefox when the response gets split over two packets. >+ wfClearOutputBuffers(); >+ >+ return true; >+ } >+ > function getArticleText() { > $found = false; > $text = ''; >@@ -208,13 +286,23 @@ > # If the message doesn't exist, return a blank > if( wfEmptyMsg( $key, $text ) ) > $text = ''; >+ $rev = Revision::newFromTitle( $this->mTitle, 0 ); >+ if( $rev ) { >+ $lastmod = /*wfTimestamp( TS_RFC2822,*/ $rev->getTimestamp() /*)*/; >+ if ( $this->checkLastModified( $lastmod ) ) { >+ return ''; >+ } >+ } > $found = true; > } else { > // Get it from the DB > $rev = Revision::newFromTitle( $this->mTitle, $this->mOldId ); > if( $rev ) { >- $lastmod = wfTimestamp( TS_RFC2822, $rev->getTimestamp() ); >- header( "Last-modified: $lastmod" ); >+ $lastmod = /*wfTimestamp( TS_RFC2822,*/ $rev->getTimestamp() /*)*/; >+ //header( "Last-modified: $lastmod" ); >+ if ( $this->checkLastModified( $lastmod ) ) { >+ return ''; >+ } > > if( !is_null($this->mSection ) ) { > global $wgParser;
Comment on attachment 6732 [details] patch for 1.15.1 --- RawPage.php.ori 2009-01-25 02:31:48.000000000 +0100 +++ RawPage.php 2009-10-29 02:32:10.924676164 +0100 @@ -197,6 +197,84 @@ } } + function checkLastModified( $timestamp ) { + global $wgCachePages, $wgCacheEpoch, $wgUser; + + if ( !$timestamp || $timestamp == '19700101000000' ) { + wfDebug( __METHOD__ . ": CACHE DISABLED, NO TIMESTAMP\n" ); + return false; + } + if( !$wgCachePages ) { + wfDebug( __METHOD__ . ": CACHE DISABLED\n", false ); + return false; + } + if( $wgUser->getOption( 'nocache' ) ) { + wfDebug( __METHOD__ . ": USER DISABLED CACHE\n", false ); + return false; + } + + $timestamp = wfTimestamp( TS_MW, $timestamp ); + $modifiedTimes = array( + 'page' => $timestamp, + 'user' => $wgUser->getTouched(), + 'epoch' => $wgCacheEpoch + ); + + $maxModified = max( $modifiedTimes ); + $lastmod = wfTimestamp( TS_RFC2822, $maxModified ); + header( "Last-modified: $lastmod" ); + + if( empty( $_SERVER['HTTP_IF_MODIFIED_SINCE'] ) ) { + wfDebug( __METHOD__ . ": client did not send If-Modified-Since header\n", false ); + return false; + } + + # Make debug info + $info = ''; + foreach ( $modifiedTimes as $name => $value ) { + if ( $info !== '' ) { + $info .= ', '; + } + $info .= "$name=" . wfTimestamp( TS_ISO_8601, $value ); + } + + # IE sends sizes after the date like this: + # Wed, 20 Aug 2003 06:51:19 GMT; length=5202 + # this breaks strtotime(). + $clientHeader = preg_replace( '/;.*$/', '', $_SERVER["HTTP_IF_MODIFIED_SINCE"] ); + + wfSuppressWarnings(); // E_STRICT system time bitching + $clientHeaderTime = strtotime( $clientHeader ); + wfRestoreWarnings(); + if ( !$clientHeaderTime ) { + wfDebug( __METHOD__ . ": unable to parse the client's If-Modified-Since header: $clientHeader\n" ); + return false; + } + $clientHeaderTime = wfTimestamp( TS_MW, $clientHeaderTime ); + + wfDebug( __METHOD__ . ": client sent If-Modified-Since: " . + wfTimestamp( TS_ISO_8601, $clientHeaderTime ) . "\n", false ); + wfDebug( __METHOD__ . ": effective Last-Modified: " . + wfTimestamp( TS_ISO_8601, $maxModified ) . "\n", false ); + if( $clientHeaderTime < $maxModified ) { + wfDebug( __METHOD__ . ": STALE, $info\n", false ); + return false; + } + + # Not modified + # Give a 304 response code and disable body output + wfDebug( __METHOD__ . ": NOT MODIFIED, $info\n", false ); + ini_set('zlib.output_compression', 0); + header( "HTTP/1.1 304 Not Modified" ); + + // Don't output a compressed blob when using ob_gzhandler; + // it's technically against HTTP spec and seems to confuse + // Firefox when the response gets split over two packets. + wfClearOutputBuffers(); + + return true; + } + function getArticleText() { $found = false; $text = ''; @@ -208,13 +286,22 @@ # If the message doesn't exist, return a blank if( wfEmptyMsg( $key, $text ) ) $text = ''; + $rev = Revision::newFromTitle( $this->mTitle, 0 ); + if( $rev ) { + $lastmod = $rev->getTimestamp(); + if ( $this->checkLastModified( $lastmod ) ) { + return ''; + } + } $found = true; } else { // Get it from the DB $rev = Revision::newFromTitle( $this->mTitle, $this->mOldId ); if( $rev ) { - $lastmod = wfTimestamp( TS_RFC2822, $rev->getTimestamp() ); - header( "Last-modified: $lastmod" ); + $lastmod = $rev->getTimestamp(); + if ( $this->checkLastModified( $lastmod ) ) { + return ''; + } if( !is_null($this->mSection ) ) { global $wgParser;
Created attachment 6733 [details] patch 2 Fixed it to return an empty string if not modified to prevent further processing.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
+need-review to signal to developers that this patch needs reviewing. Nicola, thank you for your patch, and my apologies that it's been so long since you submitted it and you haven't gotten a response yet. I hope we will review it soon.
Adding the following near the top of onView in RawAction.php also checks the last-modified time with much less code and duplication: if ( $this->getOutput()->checkLastModified( $this->page->getTouched() ) ) { return; // Client cache fresh and headers sent, nothing more to do. } Unfortunately this triggers the page data to be loaded in WikiPage, which should be more expensive than what is currently done in RawAction. To prevent the expensive load, an option would be to retrieve the revision outside getRawText and use the returned timestamp to call checkLastModified. The full text retrieval can then be avoided if checkLastModified returns true.
Nikola, sorry for misspelling your name earlier. Gabriel noted that the patch you wrote duplicates code unnecessarily; "might work though, but won't get the fixes the central function gets." I'm marking your patch as reviewed. Also adding the "easy" keyword since Gabriel has a hint in comment 9 on how to fix this with a few minutes' work.
Created attachment 9498 [details] Patch for last-checked Fix to check last-modified and return 304 code HTTP.
(In reply to comment #11) > Created attachment 9498 [details] > Patch for last-checked > > Fix to check last-modified and return 304 code HTTP. this also revises previous patch.
Committed the simple version in r10023. Performance impact for plain GETs was not too bad (drop from 108 to 104 req/second), and should be offset by unmodified responses. Performance for plain GETs could still be improved though, as alluded to in comment #9.
The commit is in r108529 . Thanks, Jose.