Commit Graph

57 Commits (master)

Author SHA1 Message Date
Federico Leva 54d9d8051e Remove dead Miraheze wikis per checkalive.py
Closes issue #465
11 months ago
Federico Leva c09db669c9 Update checkalive.pl documentation 12 months ago
Federico Leva 1b02cee1d5 Revert "Update miraheze.org list with checkalive.py"
Some 70 % of the removed wikis still return an HTTP 200 although they
may be frozen or closed.

Tested with:

git show | grep ^- | cut -f3 -d/ | sed --regexp-extended 's,(.+),https://\1/wiki/,g' | sort | shuf -n 100 | xargs -I§ -P10 sh -c "curl -Is -w '%{stderr}%{http_code}\n' § > /dev/null" 2>&1 | sort | uniq -c

This reverts commit 0a3dc23f98.
12 months ago
Federico Leva 0a3dc23f98 Update miraheze.org list with checkalive.py
Addresses issue #465
12 months ago
Federico Leva 40a1f35dae Update miraheze.org list of wikis 12 months ago
Liu d9885e0845 Update shoutwiki-spider to remove duplicates 2 years ago
Liu fcc4080b23 Update neoseeker.com.info instructions 2 years ago
Liu e7f7266550 Update fandom.com spider and remove duplicates 2 years ago
Liu 9c5c55342d Update miraheze.org spider and remove duplicates 2 years ago
Liu 4c970e358d Remove duplicates from wiki-site.com 2 years ago
Liu 74a8e9609f Update wiki-site.com spider and list 2 years ago
Liu ba7fab2e96 Add fandom-spider and update metadata and lists 2 years ago
Liu 49e41ee75d Update neoseeker.com spider and list 2 years ago
Liu 6346fd6553 Update shoutwiki.com spider and list 2 years ago
Liu f93988e9c6 Update fandom.com to HTTPS 2 years ago
Liu 91faa34529 Update shoutwiki.com list 2 years ago
Liu d6fe1d9ff8 Update battlestarwiki.org list 2 years ago
Liu 6f8f160d75 Update fandom.com list 2 years ago
Liu 6b39402ebf Update miraheze.org list 2 years ago
Liu f755153de9 Update neoseeker.com list 2 years ago
Federico Leva 10ee80ca3b Rename wikia list to fandom 2 years ago
RhinosF1 3b28efab80
Update miraheze.org list
Using https://gist.github.com/RhinosF1/18c83dfbfadb84e28ee083628c029b41
4 years ago
Federico Leva 8fb2b44fdb Update list of Wikia wikis with today's list from the API 4 years ago
Federico Leva ed46725a89 Sort list of Wikia wikis again
No change in content.
4 years ago
Federico Leva 7dad9a44cd Give up on Wikia-made dumps
There are less than 500 available right now, out of 400k active wikis.
4 years ago
Federico Leva accc7db019 Update list of MediaWikis
* Run checkalive.py on the "originalurl" URLs from existing items in the
  WikiTeam collection on the Internet Archive, minus dead wiki farms.
* Downloaded the list of unarchived wikis from WikiApiary.
4 years ago
Federico Leva aa0b133c1d Minimal update to list of Wikia wikis
* Change API URL to HTTPS and fandom.com.
* New output of the script (403k wikis), changed to wikia.com for diff purposes.
4 years ago
Federico Leva baae839a38 Complete update of the Wikia lists
* Reduce the offset to 100, the new limit for non-bots.
* Continue listing even when we get an empty request because all
  the wikis in a batch have become inactive and are filtered out.
* Print less from curl's requests.
* Automatically write the domain names to the files here.
6 years ago
Federico Leva b8909baa3d Update Wikia list with wikia.py 6 years ago
Federico Leva 293da80da9 Add alive MediaWikis from the WikiTeam acrhive.org collection 6 years ago
Federico Leva 6a34bf65ea Wikia dumps now use 7z, not gz
Note that existence doesn't mean the dump is usable.
6 years ago
emijrp 0e20be9a6e sort 7 years ago
emijrp bbdaf7723b update neoseeker 7 years ago
emijrp fc48c895ae update info 7 years ago
emijrp c7d5f9bb2e update, 2244 wikis 7 years ago
emijrp 75e7628a11 now get ALL wikis, even closed ones 7 years ago
Hydriz a8270a7769 Update Miraheze wiki farm 7 years ago
Hydriz 9fd6df7a3c Scan for closed wikis as well 7 years ago
Hydriz Scholz 9f97e21503 Update Miraheze wiki farm 8 years ago
Alexia E. Smith cb766de5ff Update gamepedia.com wikis.
This is current as of 2016-04-07 and is correct at 1,120 wikis.
8 years ago
emijrp dde7eb90ba wiki.wiki info 9 years ago
emijrp 8048b92029 adding wiki.wiki wikifarm list 9 years ago
emijrp e30cd44384 new wikifarm list of wikis 9 years ago
emijrp d44db951c2 update date 9 years ago
emijrp 64c30f2b50 updating neoseeker list and sorting, +1 new wiki 9 years ago
Southparkfan ebffb99f48 Add Miraheze wiki farm 9 years ago
Hydriz Scholz 1550d3755d Update orain.org wiki list 9 years ago
Federico Leva a1921f0919 Update list of wikia.com unarchived wikis
The list of unarchived wikis was compared to the list of wikis that we
managed to download with dumpgenerator.py:
https://archive.org/details/wikia_dump_20141219
To allow the comparison, the naming format was aligned to the format
used by dumpgenerator.py for 7z files.
9 years ago
Federico Leva ce6fbfee55 Use curl --fail instead and other fixes; add list
Now tested and used to produce the list of some 300k Wikia wikis
which don't yet have a public dump. Will soon be archived.
10 years ago
Federico Leva 7471900e56 It's easier if the list has the actual domains 10 years ago