Last modified: 2012-12-03 17:44:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T5028, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 3028 - Raw access to apache/squid logs would be nice
Raw access to apache/squid logs would be nice
Product: Datasets
Classification: Unclassified
Webstatscollector (Other open bugs)
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: 3029 3030 (view as bug list)
Depends on:
  Show dependency treegraph
Reported: 2005-08-03 15:58 UTC by Philip Stoev
Modified: 2012-12-03 17:44 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Philip Stoev 2005-08-03 15:58:33 UTC
Raw access to apache/squid http access log files would be nice. This would allow 
individual enthusiasts (including me) to run various statistics on those logs files 
for the purpose of boosting communal participation.

For example, I am interested in knowing which articles in English are most accessed 
from Bulgaria and which of them are missing so that I can spend some effort improving 
them. To the best of my understanding, this information is not available on any of 
the reports automatically generated by Wikipedia.

I also believe that many other legitimate uses of the raw log files would be found, 
including academical ones, which could regard Wikipedia as a mini-Intenet of sorts, 
for which both the full contents (the SQL article dump), the change history, and the 
access logs are known. Non of this is available for the real Internet, which may make 
Wikipedia a valuable playground for the evaluation of PageRank-like relevancy metrics 
and such.

Finally, I believe that downloading compressed logs should not place undue burden on 
Wikipedia's servers.

Thank you in advance for considering this suggestion and keep up the good work.
Comment 1 Philip Stoev 2005-08-03 17:15:11 UTC
*** Bug 3029 has been marked as a duplicate of this bug. ***
Comment 2 Philip Stoev 2005-08-03 17:16:04 UTC
*** Bug 3030 has been marked as a duplicate of this bug. ***
Comment 3 Antoine "hashar" Musso (WMF) 2005-08-03 17:26:59 UTC
Isn't what logwood is for ? Probably innocence can help you there.
Comment 4 Antoine "hashar" Musso (WMF) 2005-08-03 17:28:30 UTC
Sorry forgot the link:

Data are in a database, so most probably more reports can be made.
Comment 5 River Tarnell 2005-08-03 17:30:53 UTC
this has been requested several times in the past and refused each time.
Comment 6 Philip Stoev 2005-08-03 17:37:52 UTC
Thanks a lot for the quick reply. Can I find any links to past discussions on 
this? Is that a decision on principle or there are technical limitations, in other 
words, can people sponsor Wikipedia to get the logs?
Comment 7 River Tarnell 2005-08-03 17:42:02 UTC
the most recent one was here: on
wikitech-l, although i think there were previous discussions as well (i don't
have links handy, but Google might be able to find them...)
Comment 8 Andre Klapper 2012-12-03 14:00:15 UTC
[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]

Note You need to log in before you can comment on or make changes to this bug.