Last modified: 2008-03-13 19:55:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T14120, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 12120 - Unescaped quote in YAML output
Unescaped quote in YAML output
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-26 14:42 UTC by Patrick Sinclair
Modified: 2008-03-13 19:55 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Patrick Sinclair 2007-11-26 14:42:09 UTC
Quotes are not escaped properly in the YAML output, e.g.
http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=%27N_Sync&format=yamlfm

This breaks the YAML parser in Ruby:

require 'yaml'
require 'open-uri'
p YAML::load( open('http://en.wikipedia.org/w/api.php?action=query&prop=info&titles=%27N_Sync&format=yamlfm') )
Comment 1 Patrick Sinclair 2007-11-27 12:03:39 UTC
I have also encountered the following pages that are causing errors to the Ruby YAML parser:

http://en.wikipedia.org/w/api.php?action=query&format=yamlfm&prop=info%7Crevisions%7Ccategories&titles=Lalo%20Schifrin&rvprop=timestamp%7Cids
http://en.wikipedia.org/w/api.php?action=query&format=yamlfm&prop=info%7Crevisions%7Ccategories&titles=Lisa%20Gerrard&rvprop=timestamp%7Cids

In both cases this seems to be an error with the formatting of the categories YAML (one has a line break, one has a ':' in the title). 
Comment 2 Willemo 2008-01-09 11:49:12 UTC
I did take a look at this one, since I also figured out that titles containing ": " fails parsing in Ruby.

According to the YAML 1.0 specification (http://yaml.org/spec/history/2004-01-29/2004-01-29.html#id2569840) " #" and ": " (also string starting with "!!", "[" and some others) are forbidden in so-called 'plain style' scalar syntax.

When I take a look at ApiFormatYaml_spyc.php, function _dumpNode only supports plain style:

 // It's mapped
 $string = $spaces.$key.': '.$value."\n";

This is a too simplistic approach to render YAML, in some situations.

To solve this, the _dumpNode function needs to be extended with a kind of YAML escape algorithm when plain style is not possible.
Comment 3 Roan Kattouw 2008-03-13 16:51:39 UTC
I've committed a fix in r31927 which (hopefully, don't have a YAML parser handy) fixes this issue. Requesting api.php?action=query&prop=info&titles=Main_Page|Talk:Main_Page now results in what I hope is correct YAML (those with YAML parsers, please test!). Note the difference between:

title: Main Page

and:

title: |
        Talk:Main Page

The entire YAML output of the sample request is at the end of this message for completeness's sake. The criteria I used are:
* If the string contains newlines, use literal syntax (with the | character and all that) (was already present)
* If the string starts with : or # use literal syntax
* If the string starts with any of - ? , [ ] { } ! * & | > ' " % @ ` also use literal syntax
* In all other situations, use plain syntax (folded if the string is longer than 40 characters)

YAML CODE STARTS HERE 

---
query: 
  normalized: 
    - 
      from: Main_Page
      to: Main Page
    - 
      from: |
        Talk:Main_Page
      to: |
        Talk:Main Page
  pages: 
    - 
      pageid: 54
      ns: 0
      title: Main Page
      touched: |
        2008-03-06T17:36:33Z
      lastrevid: 440
      counter: 86
      length: 76
    - 
      pageid: 12
      ns: 1
      title: |
        Talk:Main Page
      touched: |
        2008-03-11T15:09:07Z
      lastrevid: 448
      counter: 64
      length: 173

YAML CODE ENDS HERE
Comment 4 Roan Kattouw 2008-03-13 19:55:19 UTC
(In reply to comment #3)
> * If the string starts with : or # use literal syntax
That should be: "If the string *contains* : or #" (good catch, Loek)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links