Commit Graph

182 Commits (master)

Author SHA1 Message Date
Evan Tseng 63230a307a Bug 1142312 - Add two more types of unlikely candidates: cover-wrap and yom-remote, r=Gijs 8 years ago
andrei-ch 4a0d08c56a font-to-span conversion skips half the font elements on 'real' DOMs 8 years ago
Evan Tseng e84c0c3f07 Bug 1285543 - Only use "og:title" or "twitter:title" if _getArticleTitle does not return a valid title, r=Gijs 8 years ago
Evan Tseng 33dc8fa023 Bug 1255978 - Remove legends candidate, r=Gijs 8 years ago
Evan Tseng af0aa5c59f Bug 1173548 - Find out text direction from ancestors of final candidate, r=Gijs 8 years ago
Evan Tseng 4fa0d1b207 Bug 1177619 - Score div nodes which have br nodes. r=Gijs 8 years ago
Taylor Hunt 71aa562387 Add microformats2 class names to heuristics (#303)
Microformats updated their old `hentry` to [a newer
`h-entry`](http://microformats.org/wiki/h-entry).

With the [number of IndieWeb sites breaking into the
ten-thousands](http://tantek.com/2016/190/b1/state-of-indieweb-summit),
this seems like a fair idea.
8 years ago
Gijs 1a12befa41 Fix code style, tighten up eslint rules (#301) 8 years ago
Ivan Persidsky fd11f92adb Use a dedicated method and backward iteration for removing nodes (#300)
This improves compat with "real" DOMs that provide a live NodeList as the return value of getElementsByTagName.
8 years ago
Gijs Kruitbosch 140d4c4aca Only compute textContent once. 8 years ago
usergit 327bfcb93f exposed textContent to be returned
this returns the text content only, this is useful as it allows the content to be easily accessible
8 years ago
Gijs 69b81f5d70 Fix #287: convert getElementsByTagName result to an array (#288) 8 years ago
Gijs Kruitbosch 46b08a5ea5 Address issue #277 by marking 'modal' unlikely+negative 8 years ago
Peter deHaan b380917b4b Convert nested function declaration to function expression 8 years ago
Gijs Kruitbosch e830ac9dd8 Fix eslint issues identified in m-c 8 years ago
Gijs Kruitbosch dffa760c04 Fix issue #267 by ignoring hash URIs when making URIs absolute 8 years ago
Gijs Kruitbosch a9597efc17 Fix bug 1230050 by checking for the 'hid' class specifically, r?MattN 9 years ago
Gijs a801846a45 Merge pull request #204 from mozilla/tweak-great-grandparent-scoring
Updated great grandparent node scoring.
9 years ago
Nicolas Perriault ae0833522c Improved embedded video elements detection. 9 years ago
Nicolas Perriault 46304bb5fe Updated great grandparent node scoring. 9 years ago
Nicolas Perriault 88ef3893b5 Fixes #180 - Score intermediary headings. 9 years ago
Nicolas Perriault dc1b2c9fa0 Refs #195 - Exclude nodes likely to be related content. 9 years ago
Nicolas Perriault cc18cb5787 Ref #195 - Add support for dailymotion videos. 9 years ago
Nicolas Perriault 9dbc009376 Fixes #113 - Recursive node ancestor scoring. 9 years ago
Nicolas Perriault 44879722b6 Fixes #183 - Preserve list items. 9 years ago
Gijs 79aa2fca87 Merge pull request #189 from mozilla/dont-remove-headings
Fixes #150 - Keep article intermediary headings.
9 years ago
Margaret Leibovic af6da2a87d Merge pull request #190 from mozilla/improved-author-meta-extraction
Improved author metadata detection.
9 years ago
Nicolas Perriault 7aee44adb2 Improved author metadata detection. 9 years ago
Gijs Kruitbosch 5f184053cd Make isProbablyReaderable include <pre>, and deal with long <br>-separated paragraphs and/or shorter-than-5-paragraph text and such. 9 years ago
Nicolas Perriault 2451a07a7d Fixes #150 - Keep article intermediary headings. 9 years ago
Margaret Leibovic 319a50b4f0 Fixes #184 - Don't strip class names from article content 9 years ago
Gijs 49e40768aa Merge pull request #185 from mozilla/score-section-tags-by-default
Fixes #139 #143: Added more weight to section tags.
9 years ago
Nicolas Perriault f6ffa6acde Fixes #139 #143: Added more weight to section tags. 9 years ago
Nicolas Perriault 58cd789cd3 Improved title extraction 'algorithm'. 9 years ago
Gijs b37ff08bc7 Merge pull request #169 from mozilla/clean-footer-tags
Fixes #163 - Avoid including footer tag contents.
9 years ago
Nicolas Perriault 12c6a11f67 Fixes #163 - Avoid including footer tag contents. 9 years ago
Nicolas Perriault 6eeabf90c1 Fixes #164 - Add support for title alt semantic metadata. 9 years ago
Gijs Kruitbosch 0ff82de0f4 Implement createTextNode, do more relaxed escaping there, update testcase. 9 years ago
Margaret Leibovic 37a8cd4171 Bug 1147584 - Don't remove unlikely <a> tags, and replace <a> tags with their text content if they won't be useful links 9 years ago
Gijs a6014f5854 Merge pull request #132 from gijsk/heise-ad-prioritization
Don't look at banners and skyscrapers, remove <noscript> elements
9 years ago
Gijs Kruitbosch a6346a0ad4 Don't look at banners and skyscrapers, remove <noscript> elements 9 years ago
Nicolas Perriault 4424b0bad7 Refs #128 - Add support for options to Readability constructor. r=@gijsk 9 years ago
Nicolas Perriault 4d41f5e4ed Refs #117 - Drop social/share buttons. 9 years ago
Gijs Kruitbosch 7c60dba3b6 Fix Readability.js to work with jsdom's DOM implementation (in particular: no firstElementChild implementation...) 9 years ago
Margaret Leibovic eb3a8e8dc4 Bug 1150695 - Move isProbablyReaderable function to Readability.js 9 years ago
Nicolas Perriault f8d37e4276 Don't remove elements containing figures or having them as a parent. 9 years ago
Nicolas Perriault b6730703a1 Fixes #81 - Keep article images. 9 years ago
Gijs 194a5376c8 Merge pull request #63 from mozilla/preserve-embedded-tweets
Preserve inline tweets as they're part of article contents.
9 years ago
Gijs Kruitbosch b4332328f3 Fix an issue where we don't track scores for the parents appropriately. 9 years ago
Gijs 14b33b69db Merge pull request #65 from mozilla/support-embed-videos
Fixes #56 - Updated support for embedded Youtube & Vimeo videos.
9 years ago
Nicolas Perriault ad52d8ee30 Fixes #53 - Fixed dot-slash relative URI resolution. 9 years ago
Nicolas Perriault 2d5f59f3eb Fixes #56 - Updated support for embedded Youtube & Vimeo videos. 9 years ago
Nicolas Perriault d83763c8a1 Preserve inline tweets as they're part of article contents. 9 years ago
Nicolas Perriault cf3dce6cf2 Refs #58 - Stripped embed tags. 9 years ago
Nicolas Perriault eee224560b Addressed review comments from @Gijsk. 9 years ago
Nicolas Perriault 4f9615cb9a Use forEach when it makes sense. 9 years ago
Gijs Kruitbosch 955951659d Bug 1143725 - fix the Herald Sun website 9 years ago
Gijs Kruitbosch eb81444946 Improve logic to rely on children instead of childNodes 9 years ago
Margaret Leibovic 3c2d93cd09 Improve byline algorithm 9 years ago
Gijs Kruitbosch d94f3158d3 Fix readability.js to do a DOM traversal rather than relying on a wonky DOMCollection, fix trims, fix a potential null access, etc. 9 years ago
Margaret Leibovic fc53e1a315 Set 'name' variable to null in _getExcerpt to avoid old values in future for loop iterations 9 years ago
Margaret Leibovic 2c7c504a36 Merge pull request #32 from gijsk/regex-issues-with-class-and-id-stuff
Fix regex issues. r=margaret
9 years ago
Gijs aec1ce774d Merge pull request #31 from gijsk/testing-generates
Allow generating tests from the web, make testing more closely match Firefox
9 years ago
Gijs Kruitbosch 1c42f29aa5 Create a script to generate testcases, actually use our version of JSDOMParser 9 years ago
Gijs 17062c1ccf Fix video regular expression to support https 9 years ago
Gijs d9f1e884dd Fix regex issues 9 years ago
Margaret Leibovic 98ee8f7463 Merge pull request #27 from gijsk/fix-missing-paragraphs
Bug 1144441 - avoid leaving out paragraphs. r=margaret
9 years ago
Gijs Kruitbosch 1d2df4a70e Bug 1144441 - avoid leaving out paragraphs 9 years ago
Margaret Leibovic a9bd60154d Bug 1144355 - Bail if we don't have a body to parse. r?Gijs 9 years ago
Gijs Kruitbosch d3f84a1e58 Fix class-related logging exception 9 years ago
Gijs Kruitbosch ce0ebe24e0 Improve logging of elements 9 years ago
Margaret Leibovic 03d9e36161 Merge pull request #22 from gijsk/fix-empty-classes
Don't create/leave empty class attributes around all the nodes we're using. r=margaret
9 years ago
Nicolas Perriault 99f338a03a Added logging to test output. 9 years ago
Gijs Kruitbosch b62fd27ba6 Don't create/leave empty class attributes around all the nodes we're using. 9 years ago
Gijs Kruitbosch a563714567 Bug 1127778 - while we're at it, add more logs. 9 years ago
Gijs Kruitbosch 3c277a1701 Bug 1127778 - fix paragraph reordering and add a test for it. 9 years ago
Peter deHaan 78b61ccbcd Convert `const` to `var`
Per https://github.com/mozilla/readability/issues/18#issuecomment-77229549
9 years ago
srlakhe a93aa7d0ad Updated Readability.js 9 years ago
shreyas 8061bf0254 Bug 958735 Function purgeNode moved 9 years ago
Stefan Arentz (Mozilla) 7057e46c4f Fixes #3 Let Readability.parse() also return the uri 9 years ago
Stefan Arentz (Mozilla) 255595cc70 Fixes #1 Replace occurrences of let with var 9 years ago
Tarek Ziade 55587d91ac initial file 9 years ago