Fix whitespace normalization in title metadata

When switching to a newer version of JSDOM, it is more literal
about listing whitespace as part of textContent, including
newlines and not normalizing multiple spaces.

It seems prudent to just always normalize whitespace for titles,
which are guaranteed to be pretty short anyway.
pull/509/head
Gijs Kruitbosch 6 years ago committed by Gijs
parent 3610476663
commit e8bb7f722f

@ -409,7 +409,7 @@ Readability.prototype = {
curTitle = this._getInnerText(hOnes[0]);
}
curTitle = curTitle.trim();
curTitle = curTitle.trim().replace(this.REGEXPS.normalize, " ");
// If we now have 4 words or fewer as our title, and either no
// 'hierarchical' separators (\, /, > or ») were found in the original
// title or we decreased the number of words by more than 1 word, use

@ -1,5 +1,5 @@
{
"title": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy\n eirmod tempor invidunt",
"title": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt",
"byline": "Henri Sivonen",
"excerpt": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy\n eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam\n voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet\n clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit\n amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam\n nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat,\n sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.\n Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor\n sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed\n diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat,\n sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.\n Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor\n sit amet.",
"readerable": true,

Loading…
Cancel
Save