Commit Graph

361 Commits (main)
 

Author SHA1 Message Date
Aloïs Micard e07ed8156e
scheduler: hash url before caching it 3 years ago
Aloïs Micard cae3bb514f
Merge pull request #128 from creekorful/114-fix-tests
indexer: sort headers to have deterministic output
3 years ago
Aloïs Micard faee8b48c1
indexer: sort headers to have deterministic output 3 years ago
Aloïs Micard 8297dc7616
Merge pull request #127 from creekorful/124-improve-scheduler-speed
scheduler: increase performances
3 years ago
Aloïs Micard 84a28c5be0
scheduler: increase event prefetch 3 years ago
Aloïs Micard afed403e6a
Remove useless regex 3 years ago
Aloïs Micard 4e33813b21
Merge remote-tracking branch 'origin/develop' into 124-improve-scheduler-speed 3 years ago
Aloïs Micard 7820820fa9
scheduler: add batch support for dialing with cache 3 years ago
Aloïs Micard de50ed02e3
Merge pull request #126 from creekorful/125-indexer-bulk-indexation
Indexer: implement bulk indexation
3 years ago
Aloïs Micard 9b46dc205e
Indexer: support buffered indexing 3 years ago
Aloïs Micard 71f82d4aad
process: Rework whole flags system
- Turn the flag into Feature system to allow easier configuration.
- Add prefetch flag to event feature
3 years ago
Aloïs Micard 4075dfc98a
Merge pull request #121 from creekorful/develop
Release 0.10.0
3 years ago
Aloïs Micard 829afcbb6a
Release 0.10.0 3 years ago
Aloïs Micard ec3357be5d
Big improvements
- Reduce debug noise
- Create scripts to blacklist 'famous' legit hostnames
- Make blacklister more resilient
- Merge archiver & indexer together
- Better prefix for cache key
- Rework scheduling process
- Update architecture.png
- Remove trandoshanctl
- Improve testing
3 years ago
Aloïs Micard 2d7499f7e2
Merge pull request #118 from creekorful/106-improve-blacklister
Implement new blacklister
3 years ago
Aloïs Micard 8da1f29a43
little fixes 3 years ago
Aloïs Micard d0dffb9928
Implement new blacklister 3 years ago
Aloïs Micard 6c4ecc1a7d
Merge pull request #117 from creekorful/develop
Release 0.9.0
3 years ago
Aloïs Micard 2133a1aeb5
bump app versions 3 years ago
Aloïs Micard 46a7a05e4a
Merge pull request #116 from creekorful/110-archiver-new-format
Implement new storage format
3 years ago
Aloïs Micard a27092fd13
Use new storage format 3 years ago
Aloïs Micard 1ac5c1e036
Merge remote-tracking branch 'origin/develop' into 110-archiver-new-format 3 years ago
Aloïs Micard 571b1e2628
Merge pull request #115 from creekorful/111-prevent-duplicates-urls
Prevent duplicates urls in crawlingQueue
3 years ago
Aloïs Micard cc3c0d62d6
remove hacky check 3 years ago
Aloïs Micard c8352d3299
Use url cache to determinate if crawling should be done 3 years ago
Aloïs Micard e245e5d79a
last fixes 3 years ago
Aloïs Micard 60a23f7182
Fix ttl 3 years ago
Aloïs Micard 12362e0100
Fix tests case 3 years ago
Aloïs Micard 4a0fbd0b9b
add configapi key prefix 3 years ago
Aloïs Micard 0aba4fa4f9
Finalize redis cache impl 3 years ago
Aloïs Micard 387a93b7b9
Create new flags for cache 3 years ago
Aloïs Micard d826fe73b6
Refactor configapi to use new cache 3 years ago
Aloïs Micard 477092316b
Implement cache logic 3 years ago
Aloïs Micard 55ae36f3b9
s/database/index 3 years ago
Aloïs Micard 87a2fb246f
Add new hostname to blacklist 3 years ago
Aloïs Micard 38a0a36de0
Merge pull request #113 from creekorful/109-pre-declared-mapping
elastic: pre-declare index mapping
3 years ago
Aloïs Micard 2d6beb26ce
elastic: pre-declare index mapping 3 years ago
Aloïs Micard 33ba6b4e7d
Merge pull request #112 from creekorful/101-scheduler-whitelisting
make scheduler use whitelisting instead of blacklisting
3 years ago
Aloïs Micard 15bae2143d
improve test cases 3 years ago
Aloïs Micard 4bddf39335
make scheduler use whitelisting instead of blacklisting 3 years ago
Aloïs Micard d5eb551d82
Merge pull request #104 from creekorful/103-turn-api-into-indexer
Turn api into indexer
3 years ago
Aloïs Micard 188df77541
improve logging 3 years ago
Aloïs Micard 039f8cb76c
update architecture.png 3 years ago
Aloïs Micard 2eb416845e
improve logging 3 years ago
Aloïs Micard c5bd0b3b87
remove old CD stuff 3 years ago
Aloïs Micard ad808e6b31
indexer: do not publish duplicate URLs 3 years ago
Aloïs Micard 797c3df9a5
move api client into appropriate package 3 years ago
Aloïs Micard 4d250b6cb0
Finalize refactoring 3 years ago
Aloïs Micard c42cb26a11
remove extractor 3 years ago
Aloïs Micard a996bf2d5b
Turn API into indexer 3 years ago