Last modified: 2014-11-17 10:35:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44085, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42085 - Wikimedia needs a URL shortener
Wikimedia needs a URL shortener
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 20610 54459 (view as bug list)
Depends on: 21572 42242 42243 42270
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-13 22:46 UTC by Ryan Kaldari
Modified: 2014-11-17 10:35 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ryan Kaldari 2012-11-13 22:46:39 UTC
I tried to find an existing bug for this, but surprisingly, I wasn't able to find one. We need this especially for Notifications/Echo.

Currently there is Extension:ShortUrl, but the URLs generated by this aren't much shorter than the existing URLs.

We also can't use most 3rd party services due to privacy concerns.

The main thing we need is a short domain name. This would most likely have to be donated or granted to us, as short domain names are not cheap.
Comment 1 Krinkle 2012-11-13 23:16:42 UTC
(In reply to comment #0)
> I tried to find an existing bug for this, but surprisingly, I wasn't able to
> find one. We need this especially for Notifications/Echo.
> 

What kind of urls does Echo needs shortened? And why?

> Currently there is Extension:ShortUrl, but the URLs generated by this aren't
> much shorter than the existing URLs.
> 

ShortUrl only shortens regular article paths (e.g. no diff urls or custom actions). However it not being much shorter is not true. It can be as short as you want it to be. The recommended set up is to use url rewriting and configure wgShortUrlTemplate. For example, on test.wikipedia and test2.wikipedia it is installed as /s/<ID>:

* http://test.wikipedia.org/s/1f
* http://test2.wikipedia.org/s/59

> 
> The main thing we need is a short domain name. This would most likely have to
> be donated or granted to us, as short domain names are not cheap.

I'd recommend we use subdomains or paths to make it usable for all wmf projects.

e.g. lang.abbrev-site.<short>.org/<id>
or abbrev-site.<short>.org/lang/<id>
or <short>.org/abbrev-site/lang/<id>

something like this:
* wp.wi.ki/1f
* en.wp.wi.ki/1f
* wp.wi.ki/en/1f
* wi.ki/wp/en/1f

This would be backed by Extension:ShortUrl and rewriting the root path of the short domain.

So... ideas for short domains?

* w.org (exists, for sale)
* w.co (available)
* w.ly (exists, unavailable)
* wmf.org (exists, unavailable)
* wmf.co (exists, for sale)
* wmf.ly (available)
* wi.ki (exists, for sale?)
* w.mf (available)
Comment 2 MZMcBride 2012-11-13 23:20:07 UTC
(In reply to comment #1)
> (In reply to comment #0)
>> I tried to find an existing bug for this, but surprisingly, I wasn't able to
>> find one. We need this especially for Notifications/Echo.
> 
> What kind of urls does Echo needs shortened? And why?

I don't think it's very surprising. I'll be more surprised to hear what legitimate use-case you've found for shortened URLs.

For what it's worth, many wikis ban public URL shorteners as they are a spam vector.
Comment 3 Ryan Kaldari 2012-11-13 23:45:58 UTC
Certain types of Echo notifications are going to be broadcast using XMPP. These notifications can then be picked up and rebroadcast as IRC posts, Twitter messages, SMS messages, etc. For some of these mediums, shortened URLs are very useful. Even for email notifications, it would be nice to be able to use shortened URLs. Most of the URLs will be article or user page links, but not always. Sometimes they will be links to diffs.

Another legitimate use-case is for the WMF Comminications Dept. They would very much like to be able to use shortened URLs in tweets, blog posts, etc. without resorting to a 3rd party service.

A 3rd legitimate use-case is fundraising. We commonly post links to our fundraising landing pages on social media and in emails to past donors. Right now these URLs look something like:
https://donate.wikimedia.org/wiki/Special:LandingCheck?landing_page=L12_0615_JW&utm_medium=email&utm_campaign=none&utm_source=B12_061522_Jimmy&language=en&country=US
Comment 4 Mark Holmquist 2012-11-13 23:50:49 UTC
FYI, http://ur1.ca is running on free software and may or may not be useful.

I'd love to see this, too--especially for long bugzilla and Gerrit URLs.
Comment 5 MZMcBride 2012-11-13 23:54:40 UTC
(In reply to comment #3)
> Another legitimate use-case is for the WMF Comminications Dept. They would very
> much like to be able to use shortened URLs in tweets, blog posts, etc. without
> resorting to a 3rd party service.

This seems to mostly come down to Twitter, which already has a bazillion URL shorteners floating around it, including its own (t.co). I can't fathom a reason you would use a shortened URL in a blog post and it's exactly that kind of mis-use that I worry about.

> A 3rd legitimate use-case is fundraising. We commonly post links to our
> fundraising landing pages on social media and in emails to past donors. Right
> now these URLs look something like:
> https://donate.wikimedia.org/wiki/Special:LandingCheck?landing_page=L12_0615_JW&utm_medium=email&utm_campaign=none&utm_source=B12_061522_Jimmy&language=en&country=US

This feels like a red herring, honestly. Fundraising-specific shortened URLs have been place for _years_. There's nothing wrong with using http://donate.wikimedia.org in an e-mail or on social media if you're concerned about using a long URL. And while geeks and nerds are very focused on and concerned with URL length, the rest of humanity really isn't.
Comment 6 Sumana Harihareswara 2012-11-14 00:08:06 UTC
We really do want to be able to send donors to *specific* landing pages, because sometimes we want to send a specific set of donors to a specific page that we've customized to appeal to them.  And that URI might be more than 80 characters long and get mucked by some email clients when the line wraps. I've worked in enough marketing to know that much.

On Identi.ca, in https://identi.ca/settings/url , one can choose to use ur1.ca or to choose no URL shortener.  For WMF Identi.ca accounts, we could choose the latter and then use our own shortened URLs.  The Communications Department has a point -- we try to respect our users' privacy in how we communicate with them in general, so using our own URI shortener when possible, instead of subjecting other people to the tracking of bit.ly or t.co, etc., is a good idea.
Comment 7 Brandon Harris 2012-11-14 00:14:40 UTC
Regarding ur1.ca, it's not viable for us.  We've tried to use it in the past for high-volume things and it just dies.
Comment 8 Ryan Kaldari 2012-11-14 00:15:35 UTC
> This seems to mostly come down to Twitter, which already has a bazillion URL
> shorteners floating around it, including its own (t.co). I can't fathom a
> reason you would use a shortened URL in a blog post and it's exactly that kind
> of mis-use that I worry about.

Why would using a short URL in a blog post be a 'mis-use'? Is there something
wrong with telling people URLs they can remember?

Sending everyone to http://donate.wikimedia.org/ is an awful solution. The
fundraisers would like to have more control over customization and A/B testing
without having to create complicated software solutions involving
referrer-sniffing, randomized bucketing, etc.

A couple more use cases: URL sharing via the Mobile App, and file sharing from
Commons.
Comment 9 MZMcBride 2012-11-14 00:25:16 UTC
(In reply to comment #8)
>> This seems to mostly come down to Twitter, which already has a bazillion URL
>> shorteners floating around it, including its own (t.co). I can't fathom a
>> reason you would use a shortened URL in a blog post and it's exactly that kind
>> of mis-use that I worry about.
> 
> Why would using a short URL in a blog post be a 'mis-use'? Is there something
> wrong with telling people URLs they can remember?

You seem to assume that people have an easier time remembering bit.ly/fdjs3423 than en.wikipedia.org/wiki/Foo.

It's a mis-use because there's an actual cost to URL shortening. The primary cost is in the introduction of a middle-man dependency, though there are other costs associated as well. In instances where there's no (or almost no) benefit to using a shortened URL (such as a blog, where you're not constrained by arbitrary character limitations), the cost easily outweighs any benefit of using a shortened URL.

This is all a bit tangential, though. I think it would be best if you started an RFC on mediawiki.org about a URL shortener service, listing the various use-cases and possibilities for a domain name, etc.
Comment 10 Ryan Kaldari 2012-11-14 00:30:57 UTC
Well sure, I wouldn't expect them to use it in every blog post. Maybe once every year or two for a URL they are wanting to push virally - something on the level of the SOPA blackout, for example. But anyway, you're right, this is getting tangential. I'll start an RfC on MediaWiki.org.
Comment 11 Andre Klapper 2012-11-14 00:39:36 UTC
(In reply to comment #9)
> You seem to assume that people have an easier time remembering bit.ly/fdjs3423
> than en.wikipedia.org/wiki/Foo.

As we don't talk about en.wikipedia.org/wiki/Foo here we run in circles. Again, see comment 3 for an example, that long donation URLs are needed, and that linking to the generic landing page isn't appropriate.

> The primary cost is in the introduction of a middle-man dependency

"Cost" is unclear to me, if WMF controls the service.

> In instances where there's no benefit to using a shortened URL 
> arbitrary character limitations), the cost easily outweighs any 
> benefit of using a shortened URL.

I fail to understand the "cost".
People are still free to use the long URLs in blog posts.
Comment 12 Ryan Kaldari 2012-11-14 01:04:39 UTC
RFC created: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener
Comment 13 db [inactive,noenotif] 2012-11-17 19:42:00 UTC
Some short urls, which works:

https://en.wikipedia.org?curid=15580374
https://en.wikipedia.org?diff=521573414

But wikimedia should have an own shortener for
traffic reasons and
privacy and
to avoid break of urls, if the shortener closed.
Comment 14 Krinkle 2012-11-17 20:36:35 UTC
(In reply to comment #13)
> Some short urls, which works:
> 
> https://en.wikipedia.org?curid=15580374
> https://en.wikipedia.org?diff=521573414
> 

These have been discussed in the past and should never be used as a short url. Diff is a difference view, hardly useful. I assume that should be "oldid" instead:
https://en.wikipedia.org?oldid=521573414

Which is a permanent link to a certain revision, however revisions can be removed, hidden or merged. And one probably wants to link to the latest version of an article.
'"curid" isn't very useful either because that links to the pageid, and articles can be renamed, deleted, re-created, merged, and split. What we want is link to a title, this is what the shorturl extension does. It creates a short unique ID for titles.
Comment 15 MZMcBride 2012-11-17 21:08:53 UTC
(In reply to comment #12)
> RFC created: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener

Thanks for starting this. I expanded it and responded to some of the points inline. I think the biggest barrier(s) to implementation right now are the domain name and the maintenance question.

If the Wikimedia Foundation picks this project up, there's a reasonable expectation that it would have to maintain the service indefinitely (particularly if these URLs start being printed on tote bags, end up in printed publications, etc.). Resources are finite. The maintenance point is a big one to consider, in my opinion.
Comment 16 Nemo 2012-11-18 10:26:11 UTC
*** Bug 20610 has been marked as a duplicate of this bug. ***
Comment 17 Nemo 2012-11-18 10:36:38 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Some short urls, which works:
> > 
> > https://en.wikipedia.org?curid=15580374
> > https://en.wikipedia.org?diff=521573414
> > 
> 
> These have been discussed in the past and should never be used as a short url. [...]

Please discuss this at bug 21572 (I'm adding it as a blocker, but this bug is very unclear so I've no idea if linking wiki pages is what it wants).
Comment 18 Krinkle 2012-11-18 13:39:12 UTC
(In reply to comment #17)
> (In reply to comment #14)
> > (In reply to comment #13)
> > > Some short urls, which works:
> > > 
> > > https://en.wikipedia.org?curid=15580374
> > > https://en.wikipedia.org?diff=521573414
> > > 
> > 
> > These have been discussed in the past and should never be used as a short url. [...]
> 
> Please discuss this at bug 21572 (I'm adding it as a blocker, but this bug is
> very unclear so I've no idea if linking wiki pages is what it wants).

No, we don't discuss it there. I've undid the dependency on bug 21572. For short urls intended for sharing articles on subjects we want title ids, not page ids. There is no further discussion or implementation needed, the ShortUrl extension already implements the title id concept and has been deployed on several wikis. It is working fine.

I was pointing out that curid/diff/oldid are wrong for this purpose. I didn't do that because we're still trying to figure out what is right, no we already know how to solve this problem (namely with title ids, not page or revision ids) and already did (with the ShortUrl extension).

I was merely pointing that out because someone else suggested we should use that instead.
Comment 19 Nemo 2012-11-18 15:57:00 UTC
(In reply to comment #18)
> No, we don't discuss it there. I've undid the dependency on bug 21572. For
> short urls intended for sharing articles on subjects we want title ids, not
> page ids. 

The "sharing articles on subjects" is a very rare use-case, nobody except us wikignomes cares about exact titles and the-content-which-is-not-there-but-eventually-will-because-so-MOS-commands; see bug 21572 comment 27.

> There is no further discussion or implementation needed, the ShortUrl
> extension already implements the title id concept and has been deployed on
> several wikis. It is working fine.

Maybe. Anyway, as you pointed out yourself in comment 1, it doesn't address what comment 0 asked. I guess we have at least three different things asked here; we already have a request for one of them, I'll now open the remaining two to track the plan more clearly.
Comment 20 Krinkle 2012-11-19 02:59:54 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > No, we don't discuss it there. I've undid the dependency on bug 21572. For
> > short urls intended for sharing articles on subjects we want title ids, not
> > page ids. 
> 
> The "sharing articles on subjects" is a very rare use-case, nobody except us
> wikignomes cares about exact titles and
> the-content-which-is-not-there-but-eventually-will-because-so-MOS-commands; see
> bug 21572 comment 27.
> 

It isn't a rare use case. In fact, it is the only use case we should care about. What is slightly rare, however, is the case where using titles or pageids makes a difference. And when there is a difference, the title will be the right choice (as the pageid will likely be no longer visible, or refer to a title that is about a subset or superset of the subject (delete, split or merge).

> > There is no further discussion or implementation needed, the ShortUrl
> > extension already implements the title id concept and has been deployed on
> > several wikis. It is working fine.
> 
> Maybe. Anyway, as you pointed out yourself in comment 1, it doesn't address
> what comment 0 asked. I guess we have at least three different things asked
> here; we already have a request for one of them, I'll now open the remaining
> two to track the plan more clearly.

Don't twist my words ;-). There is two parts to this service in the current schedule: An implementation on the MediaWiki side, and a redirect set up from a short domain. I'm not sure what you meant by "it doesn't address what was asked" since ShortUrl implements exactly what was asked as regarding software/database implementation. All it needs further is a short domain, which we'll need regardless of the implementation and the domain not related to the short-id database itself.
Comment 21 Ryan Kaldari 2012-11-19 18:25:37 UTC
I changed the summary to be more specific, since this bug is mostly about the domain name. Implementing some sort of subdomain scheme should be trivial one we have the domain name, and we have Extension:ShortUrl for the path, and bug 21572 for the permanent link issue.
Comment 22 Ryan Kaldari 2012-11-19 19:39:15 UTC
On 2nd thought, since this has turned into a tracking bug, I'm going to create a new bug for the domain name.
Comment 23 Andre Klapper 2013-09-23 12:13:17 UTC
*** Bug 54459 has been marked as a duplicate of this bug. ***
Comment 24 とある白い猫 2013-09-23 13:06:58 UTC
Below is a copy paste of my proposal from Bug 54459 which I feel improves on the above discussion. Particularly in terms of further usage consideration and improvements on the encoding.

I have also started an RfC on MediaWiki wiki per request: https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortening_system_for_Wikimedia_sites_to_support_QR_codes (yeah, this link could have been shorter)

--------------------------------

I'd like to propose a schema for a redirect system for all Mediawiki sites for
improved QR code compatibility. My proposal has several parts.

Version 1 is the simplest QR code variant with 21x21 ROWxCOL's. With high error
correction this allows a total of 17 (30% correction)/27 (25% correction)
alpha-numeric characters (0 to 9, A to Z,
space, $ % * + - . / :). Version 2 25x25 with 34 (30% correction)/48 (25%
correction).

Article titles can be light years long so a shorter redirect would be helpful.
However even something as simple as http://r.wikimedia.org/ has 13 characters
to begin with. I don't know if enwp.org is under WMF control but
http://r.enwp.org/ would be 18 characters. The remaining characters would be
used for the redirect itself.

Each page on wikis have a page ID, a decimal value. However these on some wikis
can get fairly large. En.wp has 40,592,460 pages for example. This can however
be expressed in base 36 since QR codes will have to be alpha-numeric anyways.
So instead of 8 base 10 digits same number would be expressed with 5 base 36
digits (O61CC).

Furthermore with an encoding of something like PLLCCCCCC 9 characters would be
enough to determine the Project (Wikipedia, Wikinews, Wikisource, etc - 36
total projects), Language (en, fr, es, ru, etc - 36 * 36 = 1296 languages),
Code-word for Article ID (36^6=2,176,782,336 possible IDs).

So... Possibly with http://r.enwp.org/PLLCCCCCC 27 characters would fit 25%
correction version 1 scheme.

Once this is implemented perhaps WMUK's QRpedia may be easier to implement.

This relates to:
https://www.wikidata.org/wiki/Wikidata:Project_chat#Article_specific_QR_codes

I realize the above link may become obsolete with archiving... :p

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links