Last modified: 2007-11-11 20:17:57 UTC
SleepResearch_Facility is a valid page in wikipedia (it actually contains the underscore). When I try to fetch info on this page through the web API using this URL: http://en.wikipedia.org/w/api.php?action=query&prop=info&format=xml&titles=SleepResearch_Facility it will automatically normalize the page title "SleepResearch_Facility" to "SleepResearch Facility", like this: <api> <query> <normalized> <n from="SleepResearch_Facility" to="SleepResearch Facility"/> </normalized> <pages> <page pageid="8149769" ns="0" title="SleepResearch Facility" touched="2007-11-02T00:32:23Z" lastrevid="168245210" counter="0" length="10923"/> </pages> </query> </api> Of course this page title works as well, but it should really contain the underscore. So two solutions: 1) Do not normalize if the actual page contains underscores. 2) Add an API option "normalize=0" to disable normalization altogether.
(In reply to comment #0) > Of course this page title works as well, but it should really contain the > underscore. > So two solutions: > 1) Do not normalize if the actual page contains underscores. > 2) Add an API option "normalize=0" to disable normalization altogether. This is technically impossible. All page titles are stored in the database with spaces changed to underscores. So [[United States]] is stored as "United_States" in the database, and there's no way to figure out whether it was created with a space or an underscore. There's also no way to have two different pages called [[United States]] and [[United_States]]; they're just aliases for the same title. Note that if you go to [[SleepResearch_Facility]], the big title on top of the page also has a space instead of an underscore (i.e. is also normalized).