Last modified: 2014-09-16 14:25:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59812, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57812 - Annotation tool that uses Wikidata concepts to annotate statements from books
Annotation tool that uses Wikidata concepts to annotate statements from books
Status: ASSIGNED
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: apsdehal
:
Depends on:
Blocks: Wikisource
  Show dependency treegraph
 
Reported: 2013-12-01 16:16 UTC by vladjohn2013
Modified: 2014-09-16 14:25 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description vladjohn2013 2013-12-01 16:16:35 UTC
Annotation tool that extracts statements from books and feed them on Wikidata

Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

Now think about this: you are at home, reading and studying for pleasiure, or an assignment, or for your PhD thesis. When you study, you engage with the text, and you often annotate and take notes. What about a tool that would let you share important quotes and statements to Wikidata?

A statement in Wikidata is often a simple subject - predicate - object, plus a source. Many, many facts, in the books you read, can be represented in this structure. We an think of a way to share them.

A client-side browser plugin or script or app that would take some highlighted text, offering you a GUI to fix up the statement and source, and then feed it into Wikidata.

We could unveil a brand-new world of sharing and collaborating, directly from you reading.

Possible projects:

    Pundit. http://www.thepund.it/ (the team is aware of Wikidata and willing to collaborate).
    Annotator https://github.com/okfn/annotator,

Mentors: Aubrey is available for mentorship, paired with a technical expert.

URL:https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Annotation_tool_that_extracts_statements_from_books_and_feed_them_on_Wikidata
Comment 1 vladjohn2013 2013-12-01 16:16:52 UTC
This proposal has been listed at https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects and we are filing a report to gather community feedback and share updates.
Comment 2 Lydia Pintscher 2013-12-04 19:09:41 UTC
I am not sure I understand what kind of data would be put into Wikidata.
Comment 3 dacuetu 2013-12-04 22:46:49 UTC
I agree with Lydia that the description is confusing. The title of this bug should be "Annotation tool that uses Wikidata concepts to annotate statements from books". Thepund.it almost does that, but it is using dbpedia instead of Wikidata. In my opinion text annotations shouldn't be stored directly stored into Wikidata, but on another database. IIRC, the implementation of Extension:Annotator creates another table on MW to store the annotations there.
Comment 4 Matthew Flaschen 2013-12-05 03:37:54 UTC
Am I understanding right that someone would highlight something like:

"John Smith was born on February 3rd, 1952" on a web site, then natural language processing would be used to suggest Wikidata statements (e.g. https://www.wikidata.org/wiki/Property:P569)?  If so, it doesn't seem like the original quote/annotation needs to be stored, though the web site should be cited as a source.
Comment 5 Lydia Pintscher 2013-12-05 08:08:01 UTC
(In reply to comment #3)
> I agree with Lydia that the description is confusing. The title of this bug
> should be "Annotation tool that uses Wikidata concepts to annotate statements
> from books". Thepund.it almost does that, but it is using dbpedia instead of
> Wikidata. In my opinion text annotations shouldn't be stored directly stored
> into Wikidata, but on another database. IIRC, the implementation of
> Extension:Annotator creates another table on MW to store the annotations
> there.

Ok that makes a lot more sense and would have my support.
Comment 6 Andrea Zanni 2013-12-05 12:35:44 UTC
Thank you Andre for changing it and David to be my official interpreter. We are actually working (slowly) with the Pundit team to implement this thing. Pundit is a third party application for annotation in the web, and as David says needs minor development to enable this kind of action. Of course, having a proper MediaWiki Extension on Wikisource would be more appropriate and useful, IMHO.
Comment 7 Andrea Zanni 2013-12-05 12:37:26 UTC
(In reply to comment #4)
> Am I understanding right that someone would highlight something like:
> 
> "John Smith was born on February 3rd, 1952" on a web site, then natural
> language processing would be used to suggest Wikidata statements (e.g.
> https://www.wikidata.org/wiki/Property:P569)?  If so, it doesn't seem like
> the
> original quote/annotation needs to be stored, though the web site should be
> cited as a source.

Yes, Matthew, that is what I have in mind. You take statements/citations/quotes from books and texts and you make them Wikidata statements: 2 Wikidata items, a Proeprty and a source.
Comment 8 Lydia Pintscher 2013-12-05 13:06:23 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > Am I understanding right that someone would highlight something like:
> > 
> > "John Smith was born on February 3rd, 1952" on a web site, then natural
> > language processing would be used to suggest Wikidata statements (e.g.
> > https://www.wikidata.org/wiki/Property:P569)?  If so, it doesn't seem like
> > the
> > original quote/annotation needs to be stored, though the web site should be
> > cited as a source.
> 
> Yes, Matthew, that is what I have in mind. You take
> statements/citations/quotes
> from books and texts and you make them Wikidata statements: 2 Wikidata
> items, a
> Proeprty and a source.

Hmm this again sounds to me like this is supposed to be saved in Wikidata. This is something I want to see a well thought out plan for with examples.
Comment 9 Andrea Zanni 2013-12-05 14:01:16 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #4)
> > > Am I understanding right that someone would highlight something like:
> > > 
> > > "John Smith was born on February 3rd, 1952" on a web site, then natural
> > > language processing would be used to suggest Wikidata statements (e.g.
> > > https://www.wikidata.org/wiki/Property:P569)?  If so, it doesn't seem like
> > > the
> > > original quote/annotation needs to be stored, though the web site should be
> > > cited as a source.
> > 
> > Yes, Matthew, that is what I have in mind. You take
> > statements/citations/quotes
> > from books and texts and you make them Wikidata statements: 2 Wikidata
> > items, a
> > Proeprty and a source.
> 
> Hmm this again sounds to me like this is supposed to be saved in Wikidata.
> This
> is something I want to see a well thought out plan for with examples.

Sorry, Lydia, my English is very bad and it's difficult for me to use the proper terms. Yes, that is supposed to be saved in Wikidata. Is it a problem?

The idea is simple: I read a statement on a text, like, "Jorge Luis Borges was born in Buenos Aires". I have a tool for highlight it, and the tool parse the sentence, process natural language and suggests me 2 WD item (Borges and Buenos Aires) and a WD property (place of birth). I also have a source, which is the webpage I read the sentence from. This statement now should go in WD: the tool would login in WD and post it with my account. Is it more clear now?
Comment 10 Matthew Flaschen 2013-12-05 20:14:16 UTC
(In reply to comment #8)
> Hmm this again sounds to me like this is supposed to be saved in Wikidata.
> This is something I want to see a well thought out plan for with examples.

I don't think there's any actual proposed change to the Wikidata data model.  It's just a way to get input/data.

Using my example, a user would highlight "John Smith was born on February 3rd, 1952".  Wikidata would give the users choices for John Smith (the user would have to pick the right Q-item).  Then, it would suggest P569 (date of birth).  Finally, it would generate source statements referring to the website.

So it would be something like:

Q245903 P569 February 3rd, 1952 (with the value using the normal date datatype).
   Source (standard Wikidata source):
      P854 http://example.com
      etc.

Then the user could confirm it's correct and post it through OAuth.
Comment 11 dacuetu 2013-12-05 20:27:21 UTC
(In reply to comment #10)
> I don't think there's any actual proposed change to the Wikidata data model. 
> It's just a way to get input/data.

And still, if you want to highlight the source text later on, then you need to store somewhere the quotation (maybe copyrighted) or the annotation reference (start pos, end pos).

I would recommend taking a look to the Pund.it server side and see how much could be reused: https://github.com/net7/pundit-server
Comment 12 Andrea Zanni 2014-02-19 12:18:47 UTC
User:Apsdehal is interested in working on this project as GSoC 2014. More info's ehre: https://www.mediawiki.org/wiki/User:Apsdehal. We are in contact with the Pundit team for helping us in the work, and they will probably be the mentors.
Comment 13 apsdehal 2014-03-20 20:21:40 UTC
Final proposal is here at:https://www.mediawiki.org/wiki/Wikidata_annotation_tool
Please feel free to comment and provide feedback.
Comment 14 Quim Gil 2014-03-22 19:05:07 UTC
Just to confirm: we have two GSoC proposals aiming to work on this project, see https://www.mediawiki.org/wiki/Google_Summer_of_Code_2014

Questions: 

* Is the Wikidata community aware of these proposals? Have the students shared them at the Wikidata mailing list?

* Are the Wikidata maintainers fine with these plans?

* We have four mentors (!) available, including two from the Pund.it project, which is great. Still, I must ask: do you feel having the experience required to deal with Wikidata? Even if it is not as official co-mentor, it would be good to have someone following the current evaluation, and the eventual project if we have a candidate accepted.
Comment 15 apsdehal 2014-07-14 22:30:59 UTC
Kindly assign this bug to me, as I am working on this under GSoC.
Comment 16 Quim Gil 2014-09-12 09:56:26 UTC
GSoC is over and this project was evaluated as PASSED by its mentors. However, looking at the reports it is unclear whether this project is in fact completed, or whether there are still open tasks pending. We are also missing the required wrap up post

https://www.mediawiki.org/wiki/Wikidata_annotation_tool/updates

Please wrap up your project properly.
Comment 17 Matthew Flaschen 2014-09-12 21:33:51 UTC
It seems the main implementation is at https://github.com/apsdehal/WikidataAnnotationFeeder/ , with helper libraries listed at https://www.mediawiki.org/wiki/Wikidata_annotation_tool/updates#Notes
Comment 18 apsdehal 2014-09-14 10:43:29 UTC
Sorry for the delay. Give me a day or two, wrap up post would be there.
I had put a mail on Wikidata long time ago about the completion of the project and asking for reviews but didn't get reply from others.

The project has been completed with everything implemented in a proper way.
Comment 19 apsdehal 2014-09-16 14:25:11 UTC
Here is the project completion report
https://www.mediawiki.org/wiki/Wikidata_annotation_tool/project_completion_report

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links