Last modified: 2013-08-15 19:29:05 UTC
Liangent had the idea to make LanguageConverter check for lang attributes in the page content, and disable conversion for those pieces of text, similar to how -{}- disables conversion. This would avoid e.g. <span lang="ja">-{...}-</span> which is double work, certainly when extensions have to disable conversion.
In LanguageConverter::autoConvert, I see code snippet: // disable convert to variants between <code></code> tags $codefix = '<code>.+?<\/code>|'; // disable convertsion of <script type="text/javascript"> ... </script> $scriptfix = '<script.*?>.*?<\/script>|'; // disable conversion of <pre xxxx> ... </pre> $prefix = '<pre.*?>.*?<\/pre>|'; Maybe we want to replace these with a real parser, for easier implementation of the wanted feature in this bug.
Another way: I believe there're some HTML parsers inside wikitext parser, to remove harmful attribs. Maybe we can add some markNoConversion calls when it sees elements with other lang="".
See bug 42490 comment 1.
Sure, I'll take this for now. If/when I write the parsoid language converter, it would be straightforward to fix this. Fixing it in PHP is harder...