Last modified: 2013-09-18 15:39:17 UTC
It'd be great if we could be smarter about the order of items/properties in the entity selector and put the ones at the top that are likely to be relevant for the current statement.
As we now have an increasing number of links, the easiest and fastest way might be to count the number of incoming wikilinks to each potential item, and sort them by most incoming links first. Paris (France) will have more incoming links than Paris (the god) or Paris (Texas), which will have more than other obscure uses.
Addendum: If no hits are available in the current language, try other languages.
The problem is the same as product advice to customers, where customers are the properties and products are the items used. It will also trigger the same scalabillity issues. That is simple counting is not enough to get good guesses on which items should be sorted first. For example that would mean all municipalities of Brazil (or France) will be listed before municipalities in Norway, which is bad if you try to find municipalities in Norway.
We're not going to make this sort order perfect. But taking the number of site-links should give a good-enough sort order most of the time. This is what counts.
Number of sitelinks doesn't really make sense in this case. A low number actually is an indication that a high count does not make sense because the values you are looking for isn't common. That is it is a feature with negative correlation with your wanted entries. Its a classic automatic data classifier problem.
Addendum: Allow language prefixes, e.g. "de:Berlin", to show the item that has the language link for "Berlin" on de.wikipedia
(In reply to comment #5) > Number of sitelinks doesn't really make sense in this case. A low number > actually is an indication that a high count does not make sense because the > values you are looking for isn't common. That is it is a feature with > negative > correlation with your wanted entries. Its a classic automatic data classifier > problem. Not sure I understand. I want Paris, France, to show up on top for the search "Paris", as I most likely add a person's birth or death place, or location of an object.
Addendum: For each item, show the "is a(n)" field if no description is set.
(In reply to comment #3) > The problem is the same as product advice to customers, where customers are > the > properties and products are the items used. It will also trigger the same > scalabillity issues. This is interesting. But I think you meant item=>product and property=>customer, since we are recommending properties to items (then based upon the recommendation scores we can sort the list), much like recommending "products" to "customers". i) What kind of scalability issues and why? ii) Do you think this would be a better method (accuracy-wise) for sorting than using incoming wikilinks as a metric?
(In reply to comment #9) > But I think you meant item=>product and > property=>customer, since we are recommending properties to items (then based > upon the recommendation scores we can sort the list), much like recommending > "products" to "customers". Sorry - my mistake. Please ignore the above section of my comment.
Entity search is now weighted by number of sitelinks.