You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
koreader/frontend
Aleksa Sarai 6f1b70e5eb util.utf8: improve CJK character detection
Previously the CJK character detection defined only characters in the
range U+4000..U+AFFF as "CJK characters". This excludes an incredibly
large number of CJK characters within the BMP, let alone the whole two
planes dedicated to rarer CJK characters (the SIP and TIP). As a result,
a very large number of Chinese, Japanese, and Korean characters were not
detected as being CJK characters.

While slightly less elegant-looking, it is far more accurate to compute
the codepoint from the utf8 character and then see if it falls within
one of the defined CJK blocks. This is not future-proof against future
CJK ideograph extensions in future Unicode versions, but there is no
real way to accurately predict such changes so this is the best we can
do without accidentally treating characters explicitily defined as being
non-CJK in Unicode as CJK.

While we're at it, copy Lua 5.3's utf8.charpattern constant definition
so that we can more easily write utf8 iterators with string.gmatch (at
least in the interim until there is a rework of utf8 handling in
KOReader and everything is rebuilt on top of utf8proc).

Some unit tests are added for Korean and Japanese text, and the existing
unit tests needed a minor adjustment to handle the fact that
isSplittable now correctly detects CJK punctuation as a character to
compare against the forbidden split rules.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
3 years ago
..
apps ReaderSearch: remove stray newline from regex help (#8358) 3 years ago
device Add initial support for Kobo Sage (Cadmus) (#8361) 3 years ago
document kopt: allow pdf auto straighten 3 years ago
ui ProgressWidget: Simplify painting logic. 3 years ago
cache.lua DocCache: Allow disabling it (again) (#8198) 3 years ago
cacheitem.lua Cache: Some more tweaks after #7624 3 years ago
configurable.lua KOPTInterface: Minor optimization when hashing the configurable status 3 years ago
dbg.lua [doc] Documentation stub for Dbg module (#7677) 3 years ago
depgraph.lua Tame a few tests that relied on `pairs` being somewhat deterministic (#6371) 4 years ago
device.lua Truly silence the attempt at loading SDL2 3 years ago
dispatcher.lua Dispatcher: fix horizontal margins (#8344) 3 years ago
docsettings.lua DocSettings/Purge .sdr: reword, don't purge other books (#8348) 3 years ago
dump.lua Order keys in settings.reader.lua (#6868) 4 years ago
fontlist.lua fontlist: disable/enable some Kindle fonts (#8233) 3 years ago
gettext.lua [fix] Always initiate empty context table (#6874) 4 years ago
httpclient.lua build: enforce luacheck in travis build 8 years ago
logger.lua use android log categories 5 years ago
luadata.lua Use fsync() for more robust setting files saving 5 years ago
luasettings.lua LuaSettings: Add a method to initialize a setting properly (#7371) 3 years ago
luxl.lua [fix] Don't break OPDS parsing on HR tags (#5949) 4 years ago
optmath.lua [[doc] Add documentation to optmath (#6258) 4 years ago
persist.lua Cache: Fix a whole lot of things. 3 years ago
pluginloader.lua crash.log: write plugin regular information only in debug mode (#8230) 3 years ago
pluginshare.lua Move PluginShare.backgroundJobs into PluginShare module (#3128) 7 years ago
random.lua Faster blitting @ BB8/BBRGB32 when no processing is needed (#4847) 5 years ago
readcollection.lua Minor util & ffi/util cleanups (#6657) 4 years ago
readhistory.lua ReadHistory: nil guard a Document instance access 3 years ago
socketutil.lua Unify LuaSocket usage (#7405) 3 years ago
util.lua util.utf8: improve CJK character detection 3 years ago
version.lua Centralize one time migration code after updates (#7531) 3 years ago