Last modified: 2014-11-17 10:35:23 UTC
I tried to find an existing bug for this, but surprisingly, I wasn't able to find one. We need this especially for Notifications/Echo. Currently there is Extension:ShortUrl, but the URLs generated by this aren't much shorter than the existing URLs. We also can't use most 3rd party services due to privacy concerns. The main thing we need is a short domain name. This would most likely have to be donated or granted to us, as short domain names are not cheap.
(In reply to comment #0) > I tried to find an existing bug for this, but surprisingly, I wasn't able to > find one. We need this especially for Notifications/Echo. > What kind of urls does Echo needs shortened? And why? > Currently there is Extension:ShortUrl, but the URLs generated by this aren't > much shorter than the existing URLs. > ShortUrl only shortens regular article paths (e.g. no diff urls or custom actions). However it not being much shorter is not true. It can be as short as you want it to be. The recommended set up is to use url rewriting and configure wgShortUrlTemplate. For example, on test.wikipedia and test2.wikipedia it is installed as /s/<ID>: * http://test.wikipedia.org/s/1f * http://test2.wikipedia.org/s/59 > > The main thing we need is a short domain name. This would most likely have to > be donated or granted to us, as short domain names are not cheap. I'd recommend we use subdomains or paths to make it usable for all wmf projects. e.g. lang.abbrev-site.<short>.org/<id> or abbrev-site.<short>.org/lang/<id> or <short>.org/abbrev-site/lang/<id> something like this: * wp.wi.ki/1f * en.wp.wi.ki/1f * wp.wi.ki/en/1f * wi.ki/wp/en/1f This would be backed by Extension:ShortUrl and rewriting the root path of the short domain. So... ideas for short domains? * w.org (exists, for sale) * w.co (available) * w.ly (exists, unavailable) * wmf.org (exists, unavailable) * wmf.co (exists, for sale) * wmf.ly (available) * wi.ki (exists, for sale?) * w.mf (available)
(In reply to comment #1) > (In reply to comment #0) >> I tried to find an existing bug for this, but surprisingly, I wasn't able to >> find one. We need this especially for Notifications/Echo. > > What kind of urls does Echo needs shortened? And why? I don't think it's very surprising. I'll be more surprised to hear what legitimate use-case you've found for shortened URLs. For what it's worth, many wikis ban public URL shorteners as they are a spam vector.
Certain types of Echo notifications are going to be broadcast using XMPP. These notifications can then be picked up and rebroadcast as IRC posts, Twitter messages, SMS messages, etc. For some of these mediums, shortened URLs are very useful. Even for email notifications, it would be nice to be able to use shortened URLs. Most of the URLs will be article or user page links, but not always. Sometimes they will be links to diffs. Another legitimate use-case is for the WMF Comminications Dept. They would very much like to be able to use shortened URLs in tweets, blog posts, etc. without resorting to a 3rd party service. A 3rd legitimate use-case is fundraising. We commonly post links to our fundraising landing pages on social media and in emails to past donors. Right now these URLs look something like: https://donate.wikimedia.org/wiki/Special:LandingCheck?landing_page=L12_0615_JW&utm_medium=email&utm_campaign=none&utm_source=B12_061522_Jimmy&language=en&country=US
FYI, http://ur1.ca is running on free software and may or may not be useful. I'd love to see this, too--especially for long bugzilla and Gerrit URLs.
(In reply to comment #3) > Another legitimate use-case is for the WMF Comminications Dept. They would very > much like to be able to use shortened URLs in tweets, blog posts, etc. without > resorting to a 3rd party service. This seems to mostly come down to Twitter, which already has a bazillion URL shorteners floating around it, including its own (t.co). I can't fathom a reason you would use a shortened URL in a blog post and it's exactly that kind of mis-use that I worry about. > A 3rd legitimate use-case is fundraising. We commonly post links to our > fundraising landing pages on social media and in emails to past donors. Right > now these URLs look something like: > https://donate.wikimedia.org/wiki/Special:LandingCheck?landing_page=L12_0615_JW&utm_medium=email&utm_campaign=none&utm_source=B12_061522_Jimmy&language=en&country=US This feels like a red herring, honestly. Fundraising-specific shortened URLs have been place for _years_. There's nothing wrong with using http://donate.wikimedia.org in an e-mail or on social media if you're concerned about using a long URL. And while geeks and nerds are very focused on and concerned with URL length, the rest of humanity really isn't.
We really do want to be able to send donors to *specific* landing pages, because sometimes we want to send a specific set of donors to a specific page that we've customized to appeal to them. And that URI might be more than 80 characters long and get mucked by some email clients when the line wraps. I've worked in enough marketing to know that much. On Identi.ca, in https://identi.ca/settings/url , one can choose to use ur1.ca or to choose no URL shortener. For WMF Identi.ca accounts, we could choose the latter and then use our own shortened URLs. The Communications Department has a point -- we try to respect our users' privacy in how we communicate with them in general, so using our own URI shortener when possible, instead of subjecting other people to the tracking of bit.ly or t.co, etc., is a good idea.
Regarding ur1.ca, it's not viable for us. We've tried to use it in the past for high-volume things and it just dies.
> This seems to mostly come down to Twitter, which already has a bazillion URL > shorteners floating around it, including its own (t.co). I can't fathom a > reason you would use a shortened URL in a blog post and it's exactly that kind > of mis-use that I worry about. Why would using a short URL in a blog post be a 'mis-use'? Is there something wrong with telling people URLs they can remember? Sending everyone to http://donate.wikimedia.org/ is an awful solution. The fundraisers would like to have more control over customization and A/B testing without having to create complicated software solutions involving referrer-sniffing, randomized bucketing, etc. A couple more use cases: URL sharing via the Mobile App, and file sharing from Commons.
(In reply to comment #8) >> This seems to mostly come down to Twitter, which already has a bazillion URL >> shorteners floating around it, including its own (t.co). I can't fathom a >> reason you would use a shortened URL in a blog post and it's exactly that kind >> of mis-use that I worry about. > > Why would using a short URL in a blog post be a 'mis-use'? Is there something > wrong with telling people URLs they can remember? You seem to assume that people have an easier time remembering bit.ly/fdjs3423 than en.wikipedia.org/wiki/Foo. It's a mis-use because there's an actual cost to URL shortening. The primary cost is in the introduction of a middle-man dependency, though there are other costs associated as well. In instances where there's no (or almost no) benefit to using a shortened URL (such as a blog, where you're not constrained by arbitrary character limitations), the cost easily outweighs any benefit of using a shortened URL. This is all a bit tangential, though. I think it would be best if you started an RFC on mediawiki.org about a URL shortener service, listing the various use-cases and possibilities for a domain name, etc.
Well sure, I wouldn't expect them to use it in every blog post. Maybe once every year or two for a URL they are wanting to push virally - something on the level of the SOPA blackout, for example. But anyway, you're right, this is getting tangential. I'll start an RfC on MediaWiki.org.
(In reply to comment #9) > You seem to assume that people have an easier time remembering bit.ly/fdjs3423 > than en.wikipedia.org/wiki/Foo. As we don't talk about en.wikipedia.org/wiki/Foo here we run in circles. Again, see comment 3 for an example, that long donation URLs are needed, and that linking to the generic landing page isn't appropriate. > The primary cost is in the introduction of a middle-man dependency "Cost" is unclear to me, if WMF controls the service. > In instances where there's no benefit to using a shortened URL > arbitrary character limitations), the cost easily outweighs any > benefit of using a shortened URL. I fail to understand the "cost". People are still free to use the long URLs in blog posts.
RFC created: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener
Some short urls, which works: https://en.wikipedia.org?curid=15580374 https://en.wikipedia.org?diff=521573414 But wikimedia should have an own shortener for traffic reasons and privacy and to avoid break of urls, if the shortener closed.
(In reply to comment #13) > Some short urls, which works: > > https://en.wikipedia.org?curid=15580374 > https://en.wikipedia.org?diff=521573414 > These have been discussed in the past and should never be used as a short url. Diff is a difference view, hardly useful. I assume that should be "oldid" instead: https://en.wikipedia.org?oldid=521573414 Which is a permanent link to a certain revision, however revisions can be removed, hidden or merged. And one probably wants to link to the latest version of an article. '"curid" isn't very useful either because that links to the pageid, and articles can be renamed, deleted, re-created, merged, and split. What we want is link to a title, this is what the shorturl extension does. It creates a short unique ID for titles.
(In reply to comment #12) > RFC created: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener Thanks for starting this. I expanded it and responded to some of the points inline. I think the biggest barrier(s) to implementation right now are the domain name and the maintenance question. If the Wikimedia Foundation picks this project up, there's a reasonable expectation that it would have to maintain the service indefinitely (particularly if these URLs start being printed on tote bags, end up in printed publications, etc.). Resources are finite. The maintenance point is a big one to consider, in my opinion.
*** Bug 20610 has been marked as a duplicate of this bug. ***
(In reply to comment #14) > (In reply to comment #13) > > Some short urls, which works: > > > > https://en.wikipedia.org?curid=15580374 > > https://en.wikipedia.org?diff=521573414 > > > > These have been discussed in the past and should never be used as a short url. [...] Please discuss this at bug 21572 (I'm adding it as a blocker, but this bug is very unclear so I've no idea if linking wiki pages is what it wants).
(In reply to comment #17) > (In reply to comment #14) > > (In reply to comment #13) > > > Some short urls, which works: > > > > > > https://en.wikipedia.org?curid=15580374 > > > https://en.wikipedia.org?diff=521573414 > > > > > > > These have been discussed in the past and should never be used as a short url. [...] > > Please discuss this at bug 21572 (I'm adding it as a blocker, but this bug is > very unclear so I've no idea if linking wiki pages is what it wants). No, we don't discuss it there. I've undid the dependency on bug 21572. For short urls intended for sharing articles on subjects we want title ids, not page ids. There is no further discussion or implementation needed, the ShortUrl extension already implements the title id concept and has been deployed on several wikis. It is working fine. I was pointing out that curid/diff/oldid are wrong for this purpose. I didn't do that because we're still trying to figure out what is right, no we already know how to solve this problem (namely with title ids, not page or revision ids) and already did (with the ShortUrl extension). I was merely pointing that out because someone else suggested we should use that instead.
(In reply to comment #18) > No, we don't discuss it there. I've undid the dependency on bug 21572. For > short urls intended for sharing articles on subjects we want title ids, not > page ids. The "sharing articles on subjects" is a very rare use-case, nobody except us wikignomes cares about exact titles and the-content-which-is-not-there-but-eventually-will-because-so-MOS-commands; see bug 21572 comment 27. > There is no further discussion or implementation needed, the ShortUrl > extension already implements the title id concept and has been deployed on > several wikis. It is working fine. Maybe. Anyway, as you pointed out yourself in comment 1, it doesn't address what comment 0 asked. I guess we have at least three different things asked here; we already have a request for one of them, I'll now open the remaining two to track the plan more clearly.
(In reply to comment #19) > (In reply to comment #18) > > No, we don't discuss it there. I've undid the dependency on bug 21572. For > > short urls intended for sharing articles on subjects we want title ids, not > > page ids. > > The "sharing articles on subjects" is a very rare use-case, nobody except us > wikignomes cares about exact titles and > the-content-which-is-not-there-but-eventually-will-because-so-MOS-commands; see > bug 21572 comment 27. > It isn't a rare use case. In fact, it is the only use case we should care about. What is slightly rare, however, is the case where using titles or pageids makes a difference. And when there is a difference, the title will be the right choice (as the pageid will likely be no longer visible, or refer to a title that is about a subset or superset of the subject (delete, split or merge). > > There is no further discussion or implementation needed, the ShortUrl > > extension already implements the title id concept and has been deployed on > > several wikis. It is working fine. > > Maybe. Anyway, as you pointed out yourself in comment 1, it doesn't address > what comment 0 asked. I guess we have at least three different things asked > here; we already have a request for one of them, I'll now open the remaining > two to track the plan more clearly. Don't twist my words ;-). There is two parts to this service in the current schedule: An implementation on the MediaWiki side, and a redirect set up from a short domain. I'm not sure what you meant by "it doesn't address what was asked" since ShortUrl implements exactly what was asked as regarding software/database implementation. All it needs further is a short domain, which we'll need regardless of the implementation and the domain not related to the short-id database itself.
I changed the summary to be more specific, since this bug is mostly about the domain name. Implementing some sort of subdomain scheme should be trivial one we have the domain name, and we have Extension:ShortUrl for the path, and bug 21572 for the permanent link issue.
On 2nd thought, since this has turned into a tracking bug, I'm going to create a new bug for the domain name.
*** Bug 54459 has been marked as a duplicate of this bug. ***
Below is a copy paste of my proposal from Bug 54459 which I feel improves on the above discussion. Particularly in terms of further usage consideration and improvements on the encoding. I have also started an RfC on MediaWiki wiki per request: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortening_system_for_Wikimedia_sites_to_support_QR_codes (yeah, this link could have been shorter) -------------------------------- I'd like to propose a schema for a redirect system for all Mediawiki sites for improved QR code compatibility. My proposal has several parts. Version 1 is the simplest QR code variant with 21x21 ROWxCOL's. With high error correction this allows a total of 17 (30% correction)/27 (25% correction) alpha-numeric characters (0 to 9, A to Z, space, $ % * + - . / :). Version 2 25x25 with 34 (30% correction)/48 (25% correction). Article titles can be light years long so a shorter redirect would be helpful. However even something as simple as http://r.wikimedia.org/ has 13 characters to begin with. I don't know if enwp.org is under WMF control but http://r.enwp.org/ would be 18 characters. The remaining characters would be used for the redirect itself. Each page on wikis have a page ID, a decimal value. However these on some wikis can get fairly large. En.wp has 40,592,460 pages for example. This can however be expressed in base 36 since QR codes will have to be alpha-numeric anyways. So instead of 8 base 10 digits same number would be expressed with 5 base 36 digits (O61CC). Furthermore with an encoding of something like PLLCCCCCC 9 characters would be enough to determine the Project (Wikipedia, Wikinews, Wikisource, etc - 36 total projects), Language (en, fr, es, ru, etc - 36 * 36 = 1296 languages), Code-word for Article ID (36^6=2,176,782,336 possible IDs). So... Possibly with http://r.enwp.org/PLLCCCCCC 27 characters would fit 25% correction version 1 scheme. Once this is implemented perhaps WMUK's QRpedia may be easier to implement. This relates to: https://www.wikidata.org/wiki/Wikidata:Project_chat#Article_specific_QR_codes I realize the above link may become obsolete with archiving... :p