Last modified: 2013-07-05 03:56:15 UTC
Child elements under <results> are given the name of the page. It is easy to create pages with titles that result in illegal XML tag names. Just to name a few that I've tried: 4me Some "quoted" text <- an important one for special purpose wikis xml Example query: Modification%20date::%3E4%20February%202013]]%20[[Has property::%2B]]|?Has">http://www.mywikidev.com/wiki/api.php?action=ask&query=[[Modification%20date::%3E4%20February%202013]]%20[[Has property::%2B]]|?Has property&format=xml <?xml version="1.0"?> <api> <query> <printrequests> <printrequest label="" typeid="_wpg" mode="2" /> <printrequest label="Has property" typeid="_txt" mode="1" /> </printrequests> <results> <some_"quoted"_text fulltext="some "quoted" text" fullurl="http://www.mywikidev.com/wiki/index.php/some_%22quoted%22_text"> <printouts> <Has_property> <value>1234</value> </Has_property> </printouts> </some_"quoted"_text> </results> </query> </api> Workaround(s): Unknown, but would love to hear of one.
I would like to suggest that the result tag names (currently set to the page names) be replaced by something simple, such as <result> or <result-<index>>, since the title and page is already specified by the fulltext and fullurl properties. So, the sample output would instead look like this: <?xml version="1.0"?> <api> <query> <printrequests> <printrequest label="" typeid="_wpg" mode="2" /> <printrequest label="Has property" typeid="_txt" mode="1" /> </printrequests> <results> <result fulltext="some "quoted" text" fullurl="http://www.mywikidev.com/wiki/index.php/some_%22quoted%22_text"> <printouts> <Has_property> <value>1234</value> </Has_property> </printouts> </result> </results> </query> </api> Thank you
Of course, I'm not suggesting to break backwards compatibility with the above suggestion :) So, maybe a new format/query param will be acceptable?
I think it would be acceptable to break bc, most APIs I know of do <pages><p> and this should be no different.
Woh ... not so fast. We are not jumping ship here and break things up. The SMW\DISerialzier provides serialization for the SMWAPI, the JSON format, and the SMW\ApiResultPrinter (since SMW 1.9). Before considering any change, please be aware of the legacy support that comes with the serialization and its content structure.
I am not sure how it has been useful till now, I would find it hard to parse. Still if you think there has to be bc support please add in a follow-up change or put precise comments in https://gerrit.wikimedia.org/r/#/c/47707/
[1] was breaking compatibility and therefore abandoned. This was only important for XML and similar formats it is therefore suggested to only change the output for these formats, and not for JSON. https://gerrit.wikimedia.org/r/#/c/47707/
It is not a tag problem but rather a problem in how 'fulltext' => $title->getFullText() encodes special characters (&' etc.). It results in encoded strings like ' " that causes problems in the XML output format.
Another issue with XML could be that for example Property:GG, XML is claiming that "Namespace prefix Property on ... is not defined" ## Example <Property:GG fulltext="..." namespace="106" exists="1"> <printouts> <Modification_date> <value>1365684120</value> </Modification_date> </printouts> </Property:GG>
James, I think trying to create XML tag names that are page titles is just asking for trouble. The XML spec has restrictions on what characters can be in a tag name[1] so any character that can be in a page title will have to be mapped into an XML element. It also makes the XML unnecessarily verbose and hard to read... just looks flaky, imo. Finally, it is also redundant information since the page name is provided by the fulltext attribute already. I propose putting in the change just for the XML format if that solves the JSON compatibility conflict. 1. http://www.w3.org/TR/REC-xml/#NT-NameStartChar
> read... just looks flaky, imo. Finally, it is also redundant information > since > the page name is provided by the fulltext attribute already. For more information about SMW related serialization see [1]. PS: I will not take a crack on it in near future, so feel free to tackle this issue but please keep in mind to add PHPUnit/QUnit tests to ensure consistency among the output serialization. [1] http://www.semantic-mediawiki.org/wiki/Serialization_%28JSON%29
Hi James, what was the reference for? BTW, I'm afraid I'm not qualified to hack on the wiki code myself.
Hey, in case it matters. This is a major pain for me.. I hit it while trying to upgrade my SMW installation and it is a real blocker for downstream code.
(In reply to comment #11) > Hi James, what was the reference for? BTW, I'm afraid I'm not qualified to > hack on the wiki code myself. It will give some insights in how serialization works in SMW works and why [1] wasn't a fit as it only eliminates a possible tag parameter at the head by replacing $results[$diWikiPage->getTitle()->getFullText()] = $result; with $results[] = $result; This solves the issue half way because if you happen to use a property like "Has_xml'_label" and use it as printout parameter, it would face the same problem but at this level you need to know to which printout you are referring since it a reference key to the printrequests array . While the subject "tag" at the head might seem as information redundancy (it isn't but that's not the issue of this discussion), you clearly can't get away by eliminating the property label from the structure as it is used as key for the a purpose to eliminate redundancy by splitting printrequest and result information. XML (pretty-print) output <?xml version="1.0"?> <api> <query> <printrequests> <printrequest label="" typeid="_wpg" mode="2" format="" /> <printrequest label="Has date" typeid="_dat" mode="1" format="ISO" /> <printrequest label="Has xml" typeid="_wpg" mode="1" format="" /> <printrequest label="Has xml' label" typeid="_wpg" mode="1" format="" /> </printrequests> <results> <XML_Example fulltext="XML Example" fullurl=".." namespace="0" exists="1"> <printouts> <Has_date> <value>631152000</value> </Has_date> <Has_xml> <value fulltext="Test" fullurl=".." namespace="0" exists="" /> </Has_xml> <Has_xml'_label> <value fulltext="Test" fullurl=".." namespace="0" exists="" /> </Has_xml'_label> </printouts> </XML_Example> </results> <meta hash="d3a1a814ff424003d9cfaa9a3ab7221f" count="1" offset="0" /> </query> </api> JSON (pretty-print) output { "query": { "printrequests": [ { "label": "", "typeid": "_wpg", "mode": 2, "format": false }, { "label": "Has date", "typeid": "_dat", "mode": 1, "format": "ISO" }, { "label": "Has xml", "typeid": "_wpg", "mode": 1, "format": "" }, { "label": "Has xml' label", "typeid": "_wpg", "mode": 1, "format": "" } ], "results": { "XML Example": { "printouts": { "Has date": [ "631152000" ], "Has xml": [ { "fulltext": "Test", "fullurl": "...", "namespace": 0, "exists": false } ], "Has xml' label": [ { "fulltext": "Test", "fullurl": "...", "namespace": 0, "exists": false } ] }, "fulltext": "XML Example", "fullurl": "...", "namespace": 0, "exists": true } }, "meta": { "hash": "d3a1a814ff424003d9cfaa9a3ab7221f", "count": 1, "offset": 0 } } } [1] https://gerrit.wikimedia.org/r/#/c/47707/
*** Bug 48705 has been marked as a duplicate of this bug. ***
Related URL: https://gerrit.wikimedia.org/r/65646 (Gerrit Change Icbc92c9e74161c1ec626775bf6f95703a6df8de1)
I don't see any use for the printrequests element in the XML format other than just confirmation of the output part of the query. Consumers will know what elements they are looking for and their XPath. Maybe it would be easier to let the XML format diverge from the JSON format by eliminating the printrequests element. I don't think the two formats need to mirror one another element-for-element; the formats are too different. It's issues like this that are already known to cause problems with JSON->XML conversion. Just my $.02
JSON/XML will mirror available information in order to support interoperability which means output formats will stay as close as possible. A content consumer (Custom parser that implements the individual parsing on client-side) can ignore the information if necessary.
Interoperability between what?
I see, but perfect interoperability btw JSON and XML is impossible... as you may have noticed. This is a major bug, 5 months old, w/an easy fix by Nischay, but it's been rolled back in an attempt to do the impossible (commendable, but impossible). Google JSON to XML conversion and you'll see that no solution is perfect and will fail exactly like this one does with invalid tags.
Change 65646 merged by jenkins-bot: (Bug 44696) AskApi to support valid XML using the SMW\ApiQueryResultFormatter https://gerrit.wikimedia.org/r/65646