Last modified: 2014-02-18 16:20:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60555, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58555 - Create API Upload Wizard Smoke Tests
Create API Upload Wizard Smoke Tests
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
UploadWizard (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: aarcos.wiki
:
Depends on: 58923
Blocks: 58353
  Show dependency treegraph
 
Reported: 2013-12-16 22:41 UTC by aarcos.wiki
Modified: 2014-02-18 16:20 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description aarcos.wiki 2013-12-16 22:41:28 UTC
From our discussion it seems that we should start spending our time with
"API testing/monitoring". From what I gathered so far there is nothing in
place (test-wise) to monitor the sanity of the APIs that are used by the UW.
I will be happy to create the basic testing hooks to perform this API testing.
Comment 1 aarcos.wiki 2013-12-17 00:26:25 UTC
I've been playing with the UI for some days now and I think I have a good idea on what API's calls are made and came up with curl commands that do the same. There are a few details missing but these are my main findings so far:

The main workflow for downloading an image using the UW makes the following API calls:

0) Request an edit token
  https://commons.wikimedia.org/w/api.php?action=tokens&format=json&type=edit

1) Upload the image using stash=1, which means that the file is going to
be downloaded but kept in a temporary space. If the upload succeeds, a 'filekey' is returned that can be used later to actually move the uploaded file from the temporary space to the permanent storage. Ex:

curl 'https://commons.wikimedia.org/w/api.php' -H <some-header-stuff...> --form action=upload --form format=json --form 'token=7fcb554f0f44cc77f2c44bcf9ce86496+\' --form "filename=1387232355370test-image-15x15.gif" --form stash=1 --form ignorewarnings=true --form "file=@test-image-15x15.gif" --trace-ascii 15x15.out

Response:

{"upload":{"result":"Success","filekey":"11xxgr2del40.36x3m1.3268187.gif","sessionkey":"11xxgr2del40.36x3m1.3268187.gif","imageinfo":{"timestamp":"2013-12-16T23:45:02Z","user":null,"userid":null,"anon":"","size":68,"width":15,"height":15,"parsedcomment":"","comment":null,"url":"https://commons.wikimedia.org/wiki/Special:UploadStash/file/11xxgr2del40.36x3m1.3268187.gif","descriptionurl":"https://commons.wikimedia.org/wiki/Special:UploadStash/file/11xxgr2del40.36x3m1.3268187.gif","sha1":"7b21163f31a7894f43342cd8839da82a6bab0f41","metadata":[{"name":"frameCount","value":1},{"name":"looped","value":false},{"name":"duration","value":0},{"name":"metadata","value":[{"name":"_MW_GIF_VERSION","value":1}]}],"extmetadata":{"DateTime":{"value":"2013-12-16T23:45:02Z","source":"mediawiki-metadata","hidden":""},"ObjectName":{"value":"20131216234502!phpZnZnJB","source":"mediawiki-metadata","hidden":""},"CommonsMetadataExtension":{"value":1.1,"source":"extension","hidden":""},"Categories":{"value":"","source":"commons-categories","hidden":""},"Assessments":{"value":"","source":"commons-categories","hidden":""}},"mime":"image/gif","mediatype":"UNKNOWN","bitdepth":0,"html":"<div>\n<div class=\"thumb tright\"><div class=\"thumbinner\" style=\"width:182px;\"><a href=\"/w/index.php?title=Special:Upload&amp;wpDestFile=20131216234502!phpZnZnJB.gif\" class=\"new\" title=\"File:20131216234502!phpZnZnJB.gif\">File:20131216234502!phpZnZnJB.gif</a>  <div class=\"thumbcaption\">Existing file</div></div></div>\n<p><span id=\"wpUploadWarningFileexists\">A file with this name already exists; please check <b><a href=\"/w/index.php?title=File:20131216234502!phpZnZnJB.gif&amp;action=edit&amp;redlink=1\" class=\"new\" title=\"File:20131216234502!phpZnZnJB.gif (page does not exist)\">the existing file</a></b> if you are not sure whether you want to change it. Please choose another filename, unless you are uploading a technically improved version of the same file. <br>Do not overwrite an image with a different one of the same topic (see <a href=\"/wiki/Commons:File_naming\" title=\"Commons:File naming\">file naming</a>).</span>\n</p>\n<div style=\"clear:both;\"></div>\n</div>\n"}}}

2) An API call to ask for information about the image is made. Probably just to check that the file doesn't exist already.

https://commons.wikimedia.org/w/api.php?action=query&format=json&titles=File%3AMagenta-dot-for-test%2Egif&prop=info%7Cimageinfo&inprop=protection&iiprop=url%7Cmime%7Csize&iiurlwidth=150

Response:
{"query":{"pages":{"-1":{"ns":6,"title":"File:Magenta-dot-for-test.gif","missing":"","contentmodel":"wikitext","pagelanguage":"en","protection":[],"imagerepository":""}}}}

3) An API call to check if the title should be blacklisted:

  https://commons.wikimedia.org/w/api.php?action=titleblacklist&format=json&tbaction=create&tbtitle=File%3AMagenta-dot-for-test%2Egif

Response:
  {result:ok}

4) Ask the user for extra information about the image: Description, copyrights, tags, etc.

5) Request again an edit token
  https://commons.wikimedia.org/w/api.php?action=tokens&format=json&type=edit

6) API call to move the already downloaded file from the stash space to main and add the extra information we already have, something like:

curl 'https://commons.wikimedia.org/w/api.php' -H '<header_stuff...>' --data-urlencode 'action=upload' --data-urlencode format=json --data-urlencode 'filekey=11xxgr2del40.36x3m1.3268187.gif' --data-urlencode 'filename=Test-image-15x15.gif' --data-urlencode 'token=7fcb554f0f44cc77f2c44bcf9ce86496+\' --data-urlencode  'comment=Page uploaded by curl script' --data '<more-data-from-the-user:description, rights, tags, etc.>'

Response:
{"upload":{"result":"Success","filename":"Test-image-15x15.gif","imageinfo":{"timestamp":"2013-12-16T23:48:59Z","user":"Aaron arcos","userid":3268187,"size":68,"width":15,"height":15,"parsedcomment":"Page uploaded by curl script","comment":"Page uploaded by curl script","url":"https://upload.wikimedia.org/wikipedia/commons/2/2d/Test-image-15x15.gif","descriptionurl":"https://commons.wikimedia.org/wiki/File:Test-image-15x15.gif","sha1":"7b21163f31a7894f43342cd8839da82a6bab0f41","metadata":[{"name":"frameCount","value":1},{"name":"looped","value":false},{"name":"duration","value":0},{"name":"metadata","value":[{"name":"_MW_GIF_VERSION","value":1}]}],"extmetadata":{"DateTime":{"value":"2013-12-16T23:45:02Z","source":"mediawiki-metadata","hidden":""},"ObjectName":{"value":"20131216234502!phpZnZnJB","source":"mediawiki-metadata","hidden":""},"CommonsMetadataExtension":{"value":1.1,"source":"extension","hidden":""},"Categories":{"value":"","source":"commons-categories","hidden":""},"Assessments":{"value":"","source":"commons-categories","hidden":""}},"mime":"image/gif","mediatype":"BITMAP","bitdepth":1,"html":"<div>\n<div class=\"thumb tright\"><div class=\"thumbinner\" style=\"width:182px;\"><a href=\"/w/index.php?title=Special:Upload&amp;wpDestFile=Test-image-15x15.gif\" class=\"new\" title=\"File:Test-image-15x15.gif\">File:Test-image-15x15.gif</a>  <div class=\"thumbcaption\">Existing file</div></div></div>\n<p><span id=\"wpUploadWarningFileexists\">A file with this name already exists; please check <b><a href=\"/wiki/File:Test-image-15x15.gif\" title=\"File:Test-image-15x15.gif\">the existing file</a></b> if you are not sure whether you want to change it. Please choose another filename, unless you are uploading a technically improved version of the same file. <br>Do not overwrite an image with a different one of the same topic (see <a href=\"/wiki/Commons:File_naming\" title=\"Commons:File naming\">file naming</a>).</span>\n</p>\n<div style=\"clear:both;\"></div>\n</div>\n"}}}


  I think that's pretty much it. As you can see above, I've been able to create curl commands that actually upload to stash area and then move to main, simulating the whole workflow. I think this would be a good starting point for smoke test candidates but other suggestions welcome?

  I plan to productionize these curl scripts so they can be run standalone. Suggestions on what languages (Ruby?, Python?) and testing frameworks to use are very much welcome.
Comment 2 Tisza Gergő 2013-12-17 16:58:55 UTC
If we stick to PHP (not a great language for scripts, on the other hand everyone is familiar with it), then Behat is a nice BDD framework. It is Gherkin-based so the tests would be language-independent to some degree. (Of course, that also means more work than just using PHPUnit or a simple script with exit status.)

If we don't care about accessibility, then Ruby would be the obvious choice - it is the most popular language for test automation, and the QA team uses Ruby as well, AFAIK.
Comment 3 Chris McMahon 2013-12-17 17:07:18 UTC
QA uses Ruby for browser tests because the tools available (Cucumber, RSpec, Selenium+watir-webdriver) are the best available for browser testing. 

Ruby was a great choice for browser testing.  

I think Ruby is a good choice for testing in general.  If we were to use it for API testing it would be convenient to manage the tests along with the browser tests in git/gerrit and in Jenkins.
Comment 4 Bawolff (Brian Wolff) 2013-12-17 17:44:55 UTC
(In reply to comment #3)
> QA uses Ruby for browser tests because the tools available (Cucumber, RSpec,
> Selenium+watir-webdriver) are the best available for browser testing. 
> 
> Ruby was a great choice for browser testing.  
> 
> I think Ruby is a good choice for testing in general.  If we were to use it
> for
> API testing it would be convenient to manage the tests along with the browser
> tests in git/gerrit and in Jenkins.

I think it would be a good idea to stay consistent language-wise with what we are already doing.

-----


Keep in mind there are several code paths here. Chunked upload may be used on larger files (I'm not sure what the cut off is. Possibly around 5 mb). On the backend, the code path for chunked upload is quite a bit different from the normal upload, so both should probably be tested. Also I'm not sure where this test is going to be conducted, but the code path on Wikimedia (with async job queue based chunked upload) is a little different from mediawiki's default (which has $wgEnableAsyncUploads = false) , so that should be kept in mind when setting up the test.
Comment 5 Jeff Hall 2013-12-17 17:45:38 UTC
I did a lot automated test development at the API and web service layer in my previous job using Ruby.  We used the generic Test/Unit (i.e. xUnit) framework with additional Rubygems for JSON and XML parsing, and the Faraday gem for the HTTP client action (although there are any number of Ruby HTTP clients to choose from…)
Comment 6 aarcos.wiki 2013-12-18 00:09:46 UTC
Hi there folks,

  Thanks for your comments, it seems I read them too late though because this morning I was able to create a Python test that uploads an image a la UW. I used a pointer given by Kunal Mehta (https://github.com/wikimedia/pywikibot-core/tree/master/tests). I just modified one of the examples there and I was able to very quickly come up with:

    https://gerrit.wikimedia.org/r/#/c/102353/

  Still work in progress and nothing written in stone. I know this is not the right place for this test, I will have to move it to '../extensions/UploadWizard/tests', but I wanted to show you what I have so far. I liked the compactness and explicitness of the test. Let me know any comments and suggestions you might have.
Comment 7 aarcos.wiki 2013-12-18 02:16:08 UTC
Any pointers on how to move these proof of concept tests to '.../extensions/UploadWizard/test' would be appreciated.

Thanx !
Comment 8 aarcos.wiki 2013-12-19 00:34:08 UTC
Hi again folks,

  I followed some of your recommendations and decided to use a more lightweight library. One suggestion was:

  https://code.google.com/p/python-wikitools/

  This looked very lightweight indeed and went for another round
of coding using this library. You can find the code under:

  https://gerrit.wikimedia.org/r/#/c/102603/

  This is a standalone script with just a few dependencies (wikitools, poster).
I guess the idea is that this script is run as part of the sanity checks just before deployment. It can also be run periodically on any of the staging environments.

  Let me know you comments and advise on how to hook it at the right
places. Thanx !
Comment 9 Chris McMahon 2013-12-19 21:18:26 UTC
I like this.  I was able to 'pip install' poster and wikitools easily, and assert() yields a message on failure like: 


i$ python upload-wizard_tests.py 
Traceback (most recent call last):
  File "upload-wizard_tests.py", line 77, in <module>
    main()
  File "upload-wizard_tests.py", line 63, in main
    assert result == "xSuccess"
AssertionError



I am looking at http://commons.wikimedia.org/wiki/Special:RecentChanges and not seeing contributions from user mw_test_uw_1

I think this is doing what we want; could it be run from Jenkins in such a way as to notify people upon encountering an AssertionError?
Comment 10 Bawolff (Brian Wolff) 2013-12-19 21:36:21 UTC
> 
> 
> 
> I am looking at http://commons.wikimedia.org/wiki/Special:RecentChanges and
> not
> seeing contributions from user mw_test_uw_1
> ?

I do not think tests that create data entries should be run on the real wiki.
Comment 11 Bawolff (Brian Wolff) 2013-12-19 21:41:44 UTC
(In reply to comment #10)
> > 
> > 
> > 
> > I am looking at http://commons.wikimedia.org/wiki/Special:RecentChanges and
> > not
> > seeing contributions from user mw_test_uw_1
> > ?
> 
> I do not think tests that create data entries should be run on the real wiki.

And at the very least you should document what you are doing on that user's talk page.
Comment 12 Tisza Gergő 2013-12-19 21:51:47 UTC
(In reply to comment #10)
> I do not think tests that create data entries should be run on the real wiki.

That step is not really functional anyway; even if the upload was not successful, the test will succeed, because the image hasn't been removed after the previous test. Also, automated tests tend to break things in unexpected ways, because they do stuff many more times than real users would. (For example, not sure if we are prepared to handle images which have tens of thousands of versions because an automated test uploads a new one every few minutes. Deleting pages with very high revision counts used to have catastrophic results, it is probably possible to trigger something similar with images as well.)

Given the subtle differences between live and beta, running the full smoke test on live would be useful... but the costs probably outweigh the benefits.
Comment 13 Chris McMahon 2013-12-19 21:55:51 UTC
note that Commons has a Category "Test images" which this test should use if it loads a real file.
Comment 14 aarcos.wiki 2013-12-20 01:25:10 UTC
Thanx for your comments, below my replies:

> I do not think tests that create data entries should be run on the real wiki.

How do we test that the functionality is working in the real wiki then? In my experience it is OK to run integration tests and end-to-end tests against production environments.

> And at the very least you should document what you are doing on that user's
talk page.

This I can do, ;-).

> That step is not really functional anyway; even if the upload was not successful, the test will succeed, because the image hasn't been removed after
the previous test.

Not true, the image is uploaded and a new revision is created, see:

  https://commons.wikimedia.org/wiki/File:Test-image-rosa-mx-15x15.png

  In fact, that's why I had to add ""ignorewarnings=true" in the second API call,
without this, the upload operation is not even tried because the file already exists.

> Also, automated tests tend to break things in unexpected
ways, because they do stuff many more times than real users would.

  I am willing to change this and create a new image every time the test
is run, instead of generating a new version on every run. Still, I wouldn't blame the tests for this but the implementation.

> Given the subtle differences between live and beta, running the full smoke test
on live would be useful... but the costs probably outweigh the benefits.

Disagree, having an automatic way of testing that important functionality
is broken in production outweighs many costs. I don't see that many costs in this case. The image is 215 bytes, if many versions is an issue, then I can create a new image on every run.

> note that Commons has a Category "Test images" which this test should use if it
loads a real file.

I manually added this category to the wiki page but it can be done as part of the script.
Comment 15 Gerrit Notification Bot 2013-12-20 09:43:48 UTC
Change 102603 had a related patch set uploaded by Hashar:
Smoke tests for Upload Wizard using the API.

https://gerrit.wikimedia.org/r/102603
Comment 16 Tisza Gergő 2013-12-20 16:25:20 UTC
(In reply to comment #14)
> > That step is not really functional anyway; even if the upload was not successful, the test will succeed, because the image hasn't been removed after
> > the previous test.
> 
> Not true, the image is uploaded and a new revision is created, see:
> 
>   https://commons.wikimedia.org/wiki/File:Test-image-rosa-mx-15x15.png

What I mean is that if the new revision is not created, the assert will succeed anyway, because the old revision has the same content. (True, the API returning success but not uploading the file is a lot less likely then simply erroring out, so the test would still catch most  upload problems.)

> I am willing to change this and create a new image every time the test
> is run, instead of generating a new version on every run. Still, I wouldn't
> blame the tests for this but the implementation.

Creating new images would mean that eventually a non-trivial fraction of Commons images are test images. That would be even more disruptive. People who patrol recent changes, list of new files etc. would probably complain.

> Disagree, having an automatic way of testing that important functionality
> is broken in production outweighs many costs. I don't see that many costs in
> this case. The image is 215 bytes, if many versions is an issue, then I can
> create a new image on every run.

I guess the main issue is that it will show up in the various change lists (less often if you use get a bot flag for the test user). Having many revisions might be a problem as well (for articles it used to be; the English Wikipedia was borked several times by someone trying to delete a page with a long history), but that should not be hard to prevent, as long as we do not forget about it.
Comment 17 Gerrit Notification Bot 2013-12-20 17:00:48 UTC
Change 102603 merged by jenkins-bot:
Smoke tests for Upload Wizard using the API.

https://gerrit.wikimedia.org/r/102603
Comment 18 aarcos.wiki 2013-12-20 17:48:22 UTC
  Hi again folks, it seems the script has been merged into HEAD. The idea is for these tests to be run as a precondition for a deployment in any of our environments (alpha, beta, prod). The tests can also be run continuously against prod to check the sanity of the UW functionality.

  Jeff tells me that Hashar is taking care of hooking the script at the right places? Let me know if any questions and/or requests that may still be needed for the integration to happen.

Thanx again for the comments !, here the replies:

> What I mean is that if the new revision is not created, the assert will succeed
anyway, because the old revision has the same content. (True, the API returning
success but not uploading the file is a lot less likely then simply erroring
out, so the test would still catch most  upload problems.)

  Yeap !, that's what I thought, remember, these are smoke tests and what I am
doing it is not exactly the same that the UW is doing. I am just testing the very basics.

> Creating new images would mean that eventually a non-trivial fraction of
Commons images are test images. That would be even more disruptive. People who
patrol recent changes, list of new files etc. would probably complain.

  I can create them with the "Test images" label and I would assume they would be ignored. Also, keep in mind that theses tests are not supposed to be run as frequently as unit tests. These tests only make sense to be run agains a new deployment candidate. Anyways, let's start running them and if there are any issues, I am sure we can address them.
Comment 19 Bawolff (Brian Wolff) 2013-12-21 07:38:04 UTC
>   I can create them with the "Test images" label and I would assume they
> would
> be ignored. Also, keep in mind that theses tests are not supposed to be run
> as
> frequently as unit tests. These tests only make sense to be run agains a new
> deployment candidate. Anyways, let's start running them and if there are any
> issues, I am sure we can address them.

Personally I would suggest asking permission instead of begging forgiveness in situations such as these.

------

>I guess the main issue is that it will show up in the various change lists
>(less often if you use get a bot flag for the test user). Having many revisions
>might be a problem as well (for articles it used to be; the English Wikipedia
>was borked several times by someone trying to delete a page with a long
>history), but that should not be hard to prevent, as long as we do not forget
>about it.

There are indeed scaling issues with deleting (or moving) an image with a large number of old versions (Which probably come a lot sooner than the 100000 revisions or whatever it was for WP:sandbox). My suggestion would be just not to delete it. There would hopefully not be scaling issues with just uploading a lot of new versions, and if there are its something that should be a high priority to resolve as we do have examples of regularly updated images on commons that will have large numbers of old versions (At the very least in the hundreds range).

[Slightly off topic]
Could be interesting to have an icingna type check that runs say every 20 minutes which uploads a small file to upload stash on commons (bonus points for also doing chunked in addition to normal), then downloads it from the upload stash, verifying it is the same file. This would leave no footprints in the wiki, and catch a good portion of the things that can go wrong with the upload pipeline.
Comment 20 aarcos.wiki 2013-12-23 23:54:51 UTC
> Could be interesting to have an icingna type check that runs say every 20
minutes which uploads a small file to upload stash on commons (bonus points for
also doing chunked in addition to normal), then downloads it from the upload
stash, verifying it is the same file. This would leave no footprints in the
wiki, and catch a good portion of the things that can go wrong with the upload
pipeline.

This is possible, in fact this was my first version of the test but decided to augment it, so I would cover most of the basic functionality triggered by the UW. Another test can be created that just covers this particular functionality.
Comment 21 Andre Klapper 2014-02-17 19:33:08 UTC
aarcos:
https://gerrit.wikimedia.org/r/#/c/102353/ was abandoned and https://gerrit.wikimedia.org/r/#/c/102603/ was merged a while ago - resetting bug status.
Comment 22 Chris McMahon 2014-02-18 16:20:35 UTC
This is running now, marking FIXED

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links