Commit Graph

1004 Commits (8fef62d46ee7b3fe708879c57ea0c8db8a7b6afb)
 

Author SHA1 Message Date
Federico Leva 8fef62d46e Implement continuation for --xmlrevisions with prop=revisions in MW 1.19 4 years ago
Federico Leva 8b58599645 Merge branch 'xmlrevisions' of github.com:nemobis/wikiteam into xmlrevisions 4 years ago
Federico Leva 17283113dd Wikia: make getXMLHeader() check more lenient
Otherwise we end up using Special:Export even though the export API
would work perfectly well with --xmlrevisions.

For some reason using the general requests session always got an empty
response from the Wikia API.

May also fix images on fandom.com:
https://github.com/WikiTeam/wikiteam/issues/330
4 years ago
Federico Leva 2c21eadf7c Wikia: make getXMLHeader() check more lenient,
Otherwise we end up using Special:Export even though the export API
would work perfectly well with --xmlrevisions.

May also fix images on fandom.com:
https://github.com/WikiTeam/wikiteam/issues/330
4 years ago
Federico Leva 131e19979c Use mwclient generator for allpages
Tested with MediaWiki 1.31 and 1.19.
4 years ago
Federico Leva faf0e31b4e Don't set apfrom in initial allpages request, use suggested continuation
Helps with recent MediaWiki versions like 1.31 where variants of "!" can
give a bad title error and the continuation wants apcontinue anyway.
4 years ago
Federico Leva 49017e3f20 Catch HTTP Error 405 and switch from POST to GET for API requests
Seen on http://wiki.ainigma.eu/index.php?title=Hlavn%C3%AD_strana:
HTTPError: HTTP Error 405: Method Not Allowed
4 years ago
Federico Leva 8b5378f991 Fix query prop=revisions continuation in MediaWiki 1.22
This wiki has the old query-continue format but it's not exposes here.
4 years ago
Federico Leva 92da7388b0 Avoid asking allpages API if API not available
So that it doesn't have to iterate among non-existing titles.

Fixes https://github.com/WikiTeam/wikiteam/issues/348
4 years ago
Federico Leva 1645c1d832 More robust XML header fetch for getXMLHeader()
Avoid UnboundLocalError: local variable 'xml' referenced before assignment

If the page exists, its XML export is returned by the API; otherwise only
the header that we were looking for.

Fixes https://github.com/WikiTeam/wikiteam/issues/355
4 years ago
Federico Leva 0b37b39923 Define xml header as empty first so that it can fail graciously
Fixes https://github.com/WikiTeam/wikiteam/issues/355
4 years ago
Federico Leva becd01b271 Use defined requests.exceptions.ConnectionError
Fixes https://github.com/WikiTeam/wikiteam/issues/356
4 years ago
Federico Leva f0436ee57c Make mwclient respect the provided HTTP/HTTPS scheme
Fixes https://github.com/WikiTeam/wikiteam/issues/358
4 years ago
Federico Leva 9ec6ce42d3 Finish xmlrevisions option for older wikis
* Actually proceed to the next page when no continuation.
* Provide the same output as with the usual per-page export.

Tested on a MediaWiki 1.16 wiki with success.
4 years ago
Federico Leva 0f35d03929 Remove rvlimit=max, fails in MediaWiki 1.16
For instance:
"Exception Caught: Internal error in ApiResult::setElement: Attempting to add element revisions=50, existing value is 500"
https://wiki.rabenthal.net/api.php?action=query&prop=revisions&titles=Hauptseite&rvprop=ids&rvlimit=max
4 years ago
Federico Leva 6b12e20a9d Actually convert the titles query method to mwclient too 4 years ago
Federico Leva f10adb71af Don't try to add revisions if the namespace has none
Traceback (most recent call last):
  File "dumpgenerator.py", line 2362, in <module>

  File "dumpgenerator.py", line 2354, in main
    resumePreviousDump(config=config, other=other)
  File "dumpgenerator.py", line 1921, in createNewDump
    getPageTitles(config=config, session=other['session'])
  File "dumpgenerator.py", line 755, in generateXMLDump
    for xml in getXMLRevisions(config=config, session=session):
  File "dumpgenerator.py", line 861, in getXMLRevisions
    revids.append(str(revision['revid']))
IndexError: list index out of range
4 years ago
Federico Leva 3760501f74 Add a couple comments 4 years ago
Federico Leva 11507e931e Initial switch to mwclient for the xmlrevisions option
* Still maintained and available for python 3 as well.
* Allows raw API requests as we need.
* Does not provide handy generators, we need to do continuation.
* Decides on its own which protocol and exact path to use, fails at it.
* Appears to use POST by default unless asked otherwise, what to do?
4 years ago
Federico Leva 3d04dcbf5c Use GET rather than POST for API requests
* It was just an old trick to get past some barriers which were waived with GET.
* It's not conformant and doesn't play well with some redirects.
* Some recent wikis seem to not like it at all, see also issue #311.
4 years ago
nemobis 0eeb6bfcb0
Upload all relevant wikidump.7z and history.xml.7z
Don't stop at the first 7z file found in the directory listing.
Should be fast enough for most users.

Fixes #326
4 years ago
emijrp 527401560c
2020 4 years ago
emijrp 7b03096ace update wikidot list 5 years ago
emijrp 714c9ea1f7 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 6aac36ce57 wikidot wiki list 5 years ago
emijrp 61b0b1b80b Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 0cd4efb51c better spider for wikidot 5 years ago
emijrp f6c57d59e7 . 5 years ago
emijrp 5fd980c6b7 delay 1 second 5 years ago
emijrp aecee2dc53 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 5 years ago
emijrp 33a93fd76a delay 1 second 5 years ago
emijrp 966df37c54
new url https://www.archiveteam.org/ 5 years ago
emijrp d43d017075
Update README.md 5 years ago
Emilio 080b723334
Update wikiapiary-update-ia-params.py 5 years ago
nemobis be0dcd8e55
Merge pull request #337 from zerote000/master
Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL.
5 years ago
Christoffer Popp Nørskov 83f72db6cd Wikiapiary update script - Change Internet Archive search string to search using both API URL and Index URL. 5 years ago
Emilio 287b8b88a3
250,000 wikis 5 years ago
emijrp ffb39afd1e 800 wikidot sites 6 years ago
emijrp 28158f9b04 wikis 6 years ago
emijrp 7c72c27f2a wikidot 6 years ago
emijrp 4e8c92b6d2 Merge branch 'master' of https://github.com/WikiTeam/wikiteam 6 years ago
emijrp 0ebf86caf6 update, 1.8M users, 400K wikis 6 years ago
nemobis bee34f4b1b
Merge pull request #319 from TyIsI/patch-1
Updated with vancouver.hackspace.ca -> vanhack.ca domain change
6 years ago
TyIsI 09fac2aeeb Updated with vancouver.hackspace.ca domain change 6 years ago
emijrp 5aac17ea03 update 6 years ago
emijrp 72b67c74f1 randomize saving 6 years ago
emijrp ca672426bb quotes issues in titles 6 years ago
emijrp a69f44caab ignore expired wikis 6 years ago
emijrp a359984932 ++ 6 years ago
emijrp 5525a3cc4a ++ 6 years ago