Last modified: 2009-07-08 22:16:57 UTC
action=parse&prop=sections should list the line number of the section on the page, because now it is very hard/impossible to assign the section to the correct line in page text (wiki text, not HTML). With the information about the line number it would become possible to break down the full page text (e.g. from action=query&prop=revisions&rvprop=content) into sections and synchronize it with the output of action=parse&prop=sections and link those sections properly. Thank you very much.
In the trunk version of action=parse, the byte offset is output, which is just as good. This version is not live on Wikipedia yet, however.
(In reply to comment #1) > In the trunk version of action=parse, the byte offset is output, which is just > as good. This version is not live on Wikipedia yet, however. > Thanks for your answer! I heard from this new value, but I was wondering what happens with the unicode char encodings (which can also have more than 1 byte per char)? Does it count each char as 1 byte? And another thing is; what happens if the page contains templates whose contain headings by themselves (retrieved by 'query': action=query&prop=revisions&rvprop=content&rvexpandtemplates)? Are those headings took into account too, or how are they handled? Greetings
(In reply to comment #2) > Thanks for your answer! > > I heard from this new value, but I was wondering what happens with the unicode > char encodings (which can also have more than 1 byte per char)? Does it count > each char as 1 byte? Yes. Charoffset would have been a better name. > And another thing is; what happens if the page contains > templates whose contain headings by themselves (retrieved by 'query': > action=query&prop=revisions&rvprop=content&rvexpandtemplates)? Are those > headings > took into account too, or how are they handled? > They aren't counted, the offset you get is the offset in the *original* wikitext, which of course doesn't contain these sections, just the template call. Transcluded sections themselves don't have offsets.
(In reply to comment #3) > (In reply to comment #2) > > Thanks for your answer! > > > > I heard from this new value, but I was wondering what happens with the unicode > > char encodings (which can also have more than 1 byte per char)? Does it count > > each char as 1 byte? > Yes. Charoffset would have been a better name. Cool! This is good news! Thank you! > > And another thing is; what happens if the page contains > > templates whose contain headings by themselves (retrieved by 'query': > > action=query&prop=revisions&rvprop=content&rvexpandtemplates)? Are those > > headings > > took into account too, or how are they handled? > > > They aren't counted, the offset you get is the offset in the *original* > wikitext, which of course doesn't contain these sections, just the template > call. Transcluded sections themselves don't have offsets. > So will there be a byte-/charoffset for action=query&rvexpandtemplates in the future? And/or for action=parse? (feature request ;)
(In reply to comment #4) > So will there be a byte-/charoffset for action=query&rvexpandtemplates in the > future? And/or for action=parse? (feature request ;) > It's in action=parse already, I said that in comment #1. rvexpandtemplates and action=expandtemplates don't have them, but they shouldn't have them because they're not outputting a section tree (or even able to pull one).