Commit Graph

459 Commits (eda8379e8cd07418f51c92b6a0d1f962616acb21)

Author SHA1 Message Date
Aleksa Sarai 5709b4c2f1
kopt: correctly handle CJK character detection for space insertion (#8438)
Previously getTextFromBoxes would just pass the first and last three
bytes of the current and previous words when trying to detect CJK
characters (which shouldn't have spaces inserted).

However, this handling was not correct because CJK characters can be
longer than 3 bytes, and internally BaseUtil.utf8charcode doesn't ensure
that it was only given a single utf8 character (it blindly does the bit
operations on whatever length code you give it).

As a result, before this patch selections in PDF documents would have
lots of spaces stripped because getTextFromBoxes would think that almost
all characters were CJK characters.

Fixes: 6f1b70e5eb ("util.utf8: improve CJK character detection")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
3 years ago
NiLuJe 3483238546
Fix reflow calls for DjVu documents (#8379)
The second argument is a ddjvu_render_mode_t
Try to actually honor the user settings instead of enforcing COLOR
while we're there.

Fix #8376
Regression since #8250
3 years ago
Aleksa Sarai 3ffb4c1692 kopt: add fallbacks for cases where kctx is not in cache
There were a handful of cases where if there was no cached kctx there
was no fallback and several KoptInterface methods would return nil,
causing issues in various parts of KOReader (this happened with the
migration to selected_text everywhere but it's unclear how that change
caused this regression).

In any case, from a correctness perspective it makes sense to have the
corresponding fallback paths to create a new kctx if we couldn't find a
cached one.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
3 years ago
Aleksa Sarai b21029f1ac credocument: update getTextFromXPointers wrapper to support selections
With the latest koreader-base update, we can now create native
selections using getTextFromXPointers. In order to make the wrapper less
annoying to use, always enable segmented selection if selections are
enabled (to match getTextFromPositions).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
3 years ago
Aleksa Sarai a29d24f86d geom: supplement :combine with more generic .boundingBox
It is a bit cleaner to do all of the necessary looping over lists of
Geoms within a straight-forward Geom.boundingBox function rather than
looping over :combine every time (or reimplementing :combine in some
cases). Geom:combine can be trivially reimplemented in terms of
Geom.boundingBox as well.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
3 years ago
yparitcher 888802f618 kopt: allow pdf auto straighten 3 years ago
hius07 456dfeaf8e
Fix segfault on exit after opening fb2.zip (#8232) 3 years ago
NiLuJe 48da545e32 Kobo/Elipsa: More fine-grained control over the amount of online CPU
cores

* Only keep a single core online most of the time.
* Device: Add an enableCPUCores method to allow controlling the amount of
  online CPU cores.
* Move the initial core onlining setup to Kobo:init, instead of the startup script.
* Enable two CPU cores while hinting new (e.g., cache miss) pages in PDF land.
* Enable two CPU cores while processing book metadata.
* Drive-by fix to isolate the DocCache pressure check to KoptInterface
  and actually apply it when it matters most (e.g., k2pdfopt stuff).
3 years ago
NiLuJe 3955f83019 DocCache: Only compute cache size once
Minor refinement to #8198
3 years ago
NiLuJe 18687e4666
DocCache: Allow disabling it (again) (#8198)
* Ensure DocCache will always have at least one slot
Fix #8181
3 years ago
NiLuJe acbf4b7a8c
Document: Round dimensions properly in getPageDimensions (#8170)
* Geom:transformByScale:
  * Apply the right scaling factor to the y axis
  * Round in a more sensible fashion (àla fz_round_rect, since we pretty much exclusively use it in a similar fashion).
* Bump base (https://github.com/koreader/koreader-base/pull/1407)
3 years ago
poire-z e3bac94db1 PDF written highlights: fix boxes, trash cached tiles
TileCacheItem: add created_ts property.
Document: manage a tile_cache_validity_ts and ignore
older cached tiles.
This timestamps is updated when highlights are written
as annotations in, or deleted from, the PDF, so we can
get the most current rendered bitmap from MuPDF and
avoid highlight ghosts on old tiles.
Save this timestamp in doc settings so older cached to
disk tiles will also be ignored across re-openings.
Bump base for: mupdf.lua: update frontend pboxes with
MuPDF adjusted ones.
3 years ago
poire-z eeb09d2150 PDF text selection: fix/tweak spacing between words/boxes
We may get multiple boxes when selecting texts, one for each
word, and we have to add spaces between the extracted words
ourselves. Previously, we were only adding a space if the
last char of previous word was ASCII, so missing spaces
after accents or greek words.
Try to do better by measuring the distances between boxes
and comparing to box heights, with a few heuristics.
3 years ago
zwim ab6867c8fa
FileManager: allow case sensitive file search (#7956)
Bump base for cre.cpp cleanup and utf8proc FFI.
Add a checkbutton for case sensitive search in FileBrowser,
and use Utf8Proc.lowercase() for case insensitive search.
Also use it in ReaderUserHyph as a replacement for
crengine getLowercasedWord().
3 years ago
zwim 4d9d599a6a
CRe: fix issues with case sensitive and regex search (#7947)
Fix crash with previous commit.
Show regex checkbox only with cre documents.
3 years ago
zwim 826a765705
CRe: support for case sensitive and regex search (#7883)
- bump crengine: findText(): add support for regular
  expression search.
- bump base: add thirdparty/srell/srell.hpp, a C++ library
  that provides Unicode regex support, used by crengine.
- ReaderSearch: with credocuments, add checkboxes for case
  sensitive and regular expression search.
3 years ago
patart 246b402d9c
Add another mimetype alias for FB2 files for OPDS (#7932)
I've encountered an issue when Calibre Content Server's OPDS feed produced ``text/fb2-xml`` mimetype. Don't know if it is actually Calibre to blame, but thought this simple fix will save some poor souls' time.
3 years ago
NiLuJe 62fd154629 DocCache: Log the effective cache size 3 years ago
NiLuJe e4a333a980 KOptInterface: Keep returning nil in get*Boxes when we don't actually
get any boxes

Exposed by #7624, but we were arguably putting garbage in the Cache
before that anyway, so, it w<asn't all that great either ;p.

Fix #7850
3 years ago
zwim 594b4c9035
Add option for custom hyphenation rules (#7787)
This is the successor of #7746.
3 years ago
Frans de Jonge 039947886f
Revert "Hyphenation: add custom hyphenation rules (#7746)" (#7785)
This reverts commit f25da5d0d5.
3 years ago
zwim f25da5d0d5
Hyphenation: add custom hyphenation rules (#7746)
The hyphenation of a word can be changed from its default
by long pressing for 3 seconds and selecting 'Hyphenate'.
These overrides are stored in a per-language file, i.e:
koreader/settings/user-German.hyph.
3 years ago
NiLuJe 2c4cbd12a2 DocumentRegistry: Downgrade refcount warnings to debug logging.
It can happen in perfectly sane contexts.

CReDocument: Don't destroy internal engine data when Document just
decreased the refcount (as opposed to actually tore down the document
userdata if it were the last ref).

PdfDocument: Only write edited documents if the Doc instance was torn
down.

PicDocument: Silence some DocumentRegistry related warnings
3 years ago
NiLuJe 94f708b53b BookInfoManager: Actually close the document after extraction
DocumentRegistry just decreases a ref, it doesn't close anything.

Plug the same Document leak in a few other places, and document this.
3 years ago
NiLuJe 1ffbd8760d KOPTInterface: Minor optimization when hashing the configurable status
Use a table & table.concat instead of individual concats.
And then use that same table for every hash-related operation.

(Nothing else uses the configurable hash function, otherwise I'd have
limited the table shenanigans to the function itself).
3 years ago
NiLuJe 2635593890 Cache: Some more tweaks after #7624
* Allow doing away with CacheItem
  Now that we have working FFI finalizers on BBs, it's mostly useless overhead.
  We only keep it for DocCache, because it's slightly larger, and memory pressure might put us in a do or die situation where waiting for the GC might mean an OOM kill.
* Expose's LRU slot-only mode
  And use it for CatalogCache, which doesn't care about storage space
* Make GlyphCache slots only (storage space is insignificant here, it was
  always going to be evicted by running out of slots).
* More informative warning when we chop the cache in half
3 years ago
NiLuJe 05806abeaa CreDocument Call Cache: Minor modernization tweaks
* Neuter timekeeping when statistics are disabled
  Saves a few syscalls ;).
* Port to ffi/lru
  Only a tiny bit of it actually requires any sort of LRU logic, so it's fairly painless.
* Release the cache on close
* Use string.buffer to serialize function arguments
  Ought to be faster than the custom approach ;).
  (Still requires wrapping them in a table, though).
  It's much less human-readable, but then again, this doesn't need to be :).
3 years ago
NiLuJe 06a273b48d Port ffiUtil.getTimestamp users to TimeVal:now()
They were all using it to compute durations,
something which is going to be more sensible
from a monotonic clock source.
3 years ago
NiLuJe 21b067792d Cache: Rewrite based on lua-lru
Ought to be faster than our naive array-based approach.
Especially for the glyph cache, which has a solid amount of elements,
and is mostly cache hits.
(There are few things worse for performance in Lua than
table.remove @ !tail and table.insert @ !tail, which this was full of :/).

DocCache: New module that's now an actual Cache instance instead of a
weird hack. Replaces "Cache" (the instance) as used across Document &
co.
Only Cache instance with on-disk persistence.

ImageCache: Update to new Cache.

GlyphCache: Update to new Cache.
Also, actually free glyph bbs on eviction.
3 years ago
NiLuJe ce624be8b8 Cache: Fix a whole lot of things.
* Minor updates to the min & max cache sizes (16 & 64MB). Mostly to satisfy my power-of-two OCD.
  * Purge broken on-disk cache files
  * Optimize free RAM computations
  * Start dropping LRU items when running low on memory before pre-rendring (hinting) pages in non-reflowable documents.
  * Make serialize dump the most recently *displayed* page, as the actual MRU item is the most recently *hinted* page, not the current one.
  * Use more accurate item size estimations across the whole codebase.

TileCacheItem:

  * Drop lua-serialize in favor of Persist.

KoptInterface:

  * Drop lua-serialize in favor of Persist.
  * Make KOPTContext caching actually work by ensuring its hash is stable.
3 years ago
poire-z 9ef435c97a
bump crengine: more granular font weights (#7616)
Includes:
- MathML: a few minor fixes
- (Upstream) lvtext: fix possible index out of range
- Fonts: RegisterExternalFont() should take a documentId
- Fonts: fix: letter-spacing should not be applied on diacritic
- (Upstream) Fonts: more granular synthetic weights
- Fonts: synthesized weights: tweak some comments
- Fonts: keep hinting with synthetic weight
- Fonts: fix synthesized weight inconsitencies
- Fonts: fix getFontFileNameAndFaceIndex()
- Fonts: adds LVFontMan::RegularizeRegisteredFontsWeights()
- Fonts: handle synth_weight tweaks in glyph/glyphinfo slots
- (Upstream) Fonts: fix some compiler warnings
- Fix hyphenation on Armenian and Georgian text

Update the bottom menu widget "Font Weight" to allow more
granular weights than the previous "regular | bold".

Also bump thirdparty/luasec to v1.0.1.
3 years ago
Martín Fernández 53234fcdc1
add hasSystemFonts device property (#7535)
Add system + user paths to the ReMarkable (has normal linux paths)
3 years ago
poire-z b9ffc3d05b
bump crengine: add support for MathML (#7465)
Includes (among others):
- LVImg: Tweak JPEG decoding some more
- toStringV2(): fix (again) when target node is a boxing node
- LVFontCache::find(): give more weight to first fonts in list
- Page splitting: more accurate rendering progress
- getRenderedWidths(): fix nowrap around image/inlineBoxes
- Tables rendering: tweak column widths algorithm
- CSS: parse/handle "currentcolor", default for border-color
- CSS: add units 'ch' (just like 'ex')
- SVG images: proper alpha blending
- MathML: add parsing and rendering support files
- MathML: plug MathML code into crengine core
- MathML: <epub:switch/case/default>: accept MathML
- (Upstream) Make crengine.font.fallback.faces plural
- (Upstream) Option to not limit font size to a set
- Text: dont adjust space after consecutive initial marks/dashes
- Update German hyphenation patterns
3 years ago
NiLuJe f3341d9dc0
PdfDocument: Unbreak highlights (#7457)
Regression since #7411
Fix #7456
3 years ago
Toromtomtom 3706196bfe
Update PDF annotations when changing bookmark text (#7411) 3 years ago
NiLuJe bf6c0cdd6c
LuaSettings: Add a method to initialize a setting properly (#7371)
* LuaSettings/DocSettings: Updated readSetting API to allow proper initialization to default.
Use it to initialize tables, e.g., fixing corner-cases in readerFooter that could prevent settings from being saved.
(Fixes an issue reported on Gitter).
* LuaSettings/DocSettings: Add simpler API than the the flip* ones to toggle boolean settings.
* Update LuaSettings/DocSettigns usage throughout the codebase to use the dedicated boolean methods wher appropriate, and clean up some of the more mind-bending uses.
* FileChooser: Implement an extended default exclusion list (fix #2360)
* ScreenSaver: Refactor to avoid the pile of kludges this was threatening to become. Code should be easier to follow and use, and fallbacks now behave as expected (fix #4418).
3 years ago
Frans de Jonge ac668ecb64
Add a few more mimetypes for OPDS (#7258)
Doesn't include application/zip as CBZ, but it will be downloaded (as ZIP).

Doesn't include CBR since that's not supported.

Closes #7218, closes #5997.
3 years ago
zwim 3118d0dba0 Refresh AltStatusBar once a minute, if there are changes 3 years ago
poire-z 05126b94b6 Dual pages: shown as 2 columns on a single page
Rework Dual pages code so that the view is considered
a single page number, so it looks more like 2-columns
on a single page.
This solves a few issues like:
- Page number and count are consistent between top
  and bottom status bars
- SkimTo -1/+1 doing nothing every other tap
- Statistics being wrong (like "Pages read" never
  going over half of the book page count)
3 years ago
poire-z 7779e2d8e7 CRe: use getDocumentRenderingHash() to detect rendering changes
Instead of just relying on document full height
and number of pages.
3 years ago
poire-z 8ff50a9e24 CreDocument: disable crengine image scaling options
Since their handling in crengine has been re-enabled.
3 years ago
Frans de Jonge 1ef6d0b257
[feat] Support mimetypes in DocumentRegistry:hasProvider() (#7155)
And make .djvu the canonical extension for DjVu.

Fixes #5478.
3 years ago
poire-z 396d1fbf46
bump crengine: parsing, lists, 2-pages mode fixes & tweaks (#7138)
Includes:
- EPUB: fix truncated HEAD>STYLE stylesheet
- XML parsing: slightly better parsing of <script>
- Update German hyphenation patterns
- (chore) Silence some clang warnings
- (Upstream) CSS content: fix regression with open-quote/close-quote
- (Upstream) HTML lists: support the 'reversed' attribute
- (Upstream) Tweak list items disc/circle/square symbols
- 2-pages mode: option to skip geometry checks

CRE bottom menu: allow toggling Dual Pages in portrait mode.
3 years ago
poire-z dd74194e0a cre.getWordFromPosition(): fix a few issues
Drop the use of crengine's getWordFromPosition() which
is a bit unreliable: it may returns wrong coordinates,
or words from far away in the book (ie. when holding
in the margins).
Rely only on the robust getTextFromPositions() that
we already use for multi words selection.
Having good coordinates allows refreshing a smaller region
(the higlighted word, or the 2 lines if hyphenated).
3 years ago
NiLuJe d80d6dc562 Handle the BlitBuffer struct changes
* stride is now a size_t
  On some platforms, that's 64 bits, which means it's no longer
  automatically converted to a Lua number to avoid precision loss.
  Do that ourselves, because lua-serialize doesn't know how to handle an
  uint64_t cdata ;).
3 years ago
jperon 8b7d60299f
JPG/PNG: MuPDF as default provider (#6931)
As said in #6929
4 years ago
jperon 8eeb010dc9
Paged documents: rework zoom options (#6885)
- Move zoom options from top menu to bottom config
- Add option to manually define zoom (relative to
  page width) and overlap (in percent)
- Add options to zoom to columns or rows, possibly
  with overlap. Add panning direction options when
  page forward in these modes
4 years ago
Jellby 5e3c554dd7 Hide non-linear fragments
Add option to hide (skip) non-linear fragments, only working
in 1-page mode. Tweaks mostly to footer, toc and skim code
to make it clear(er) which pages belong to linear or non-linear
fragments.
4 years ago
Jellby f892d4559f Fix typos 4 years ago
yparitcher edec69ac8b
[CRe] Tweak nightmode and CRe call cache interaction (#6859)
Simplify 4caf8f28 (#6854), allowing us not to track
nightmode in two places.
4 years ago