Commit Graph

  • c9e087d077 Cleanups Mišo Belica 2013-03-26 23:13:57 +0100
  • e0c87223ae Better log messages while scoring candidates Mišo Belica 2013-03-26 22:03:03 +0100
  • df5cb8c8f6 Added scored nodes into candidates Mišo Belica 2013-03-26 22:02:10 +0100
  • f858f0dbb0 1 pt for 100 inner text chars is computed as float Mišo Belica 2013-03-26 21:34:14 +0100
  • 31b75c1cd8 Updated docstring for 'get_link_density' [ci skip] Mišo Belica 2013-03-26 20:08:28 +0100
  • d054823958 Added simple test for parser of annotated text Mišo Belica 2013-03-26 19:56:37 +0100
  • 05d2230015 Load articles/snippets as binary strings Mišo Belica 2013-03-26 19:55:50 +0100
  • e6191fe0d1 Link density is computed with normalized whitespace Mišo Belica 2013-03-26 19:55:18 +0100
  • 671580ac2c Use groupby for to group annotated texts Mišo Belica 2013-03-25 16:32:52 +0100
  • c2a5b74230 Changed representation of annotated text Mišo Belica 2013-03-25 14:26:03 +0100
  • e366721873 Convert <hr> tag into paragraphs Mišo Belica 2013-03-25 13:57:33 +0100
  • e198b94ffb Added string utils for handling whitespace Mišo Belica 2013-03-25 13:41:43 +0100
  • 3449a33d87 Test for changing multiple <br> into <p> Mišo Belica 2013-03-23 17:04:30 +0100
  • 7bd7231e25 Renamed property of 'OriginalDocument': 'html' -> 'dom' Mišo Belica 2013-03-23 17:03:54 +0100
  • 0e748a80a6 Cleaned class 'Article' Mišo Belica 2013-03-23 16:07:42 +0100
  • 530b7d8f22 Drop unlikely candidates as soon as you can Mišo Belica 2013-03-23 16:02:43 +0100
  • 69dd9ef4fd Changed 'readable_annotated_text' -> 'main_text' Mišo Belica 2013-03-23 15:47:14 +0100
  • c47530bfe0 Updated changelog Mišo Belica 2013-03-21 19:53:07 +0100
  • 0df3a95c1e Property of ``Article`` with annotated text Mišo Belica 2013-03-21 19:43:22 +0100
  • 7337e2fb38 Join node with 1 child of the same type Mišo Belica 2013-03-21 19:42:18 +0100
  • ade957cb47 Don't change <div> to <p> if it contains <p> elements Mišo Belica 2013-03-21 19:41:00 +0100
  • 35dd10f546 Better logging messages Mišo Belica 2013-03-21 19:38:54 +0100
  • f5939f4608 Skip unused tests instead of useless passing Mišo Belica 2013-03-21 19:36:04 +0100
  • 6b87ac5e07 Use unicode literals from future, not 'to_string' Mišo Belica 2013-03-19 23:49:07 +0100
  • c9e8e00b92 Refactored class ``OriginalDocument`` Mišo Belica 2013-03-19 23:48:14 +0100
  • eb8a8c5248 Replaced deprecated method 'getiterator' by 'iter' Mišo Belica 2013-03-19 16:06:49 +0100
  • 2159625626 Function 'callable' has returned in Python 3.2 Mišo Belica 2013-03-19 15:33:49 +0100
  • 76832530b4 I don't use Makefile Mišo Belica 2013-03-19 01:28:30 +0100
  • 5abe69d917 Added new test article Mišo Belica 2013-03-19 01:13:46 +0100
  • 5e41280f77 Updated helper for creating an article test Mišo Belica 2013-03-19 00:31:44 +0100
  • 0178cfff5c Added compatibility file with unittest2 import Mišo Belica 2013-03-18 22:01:11 +0100
  • 26fe24789c Made packages from all tests Mišo Belica 2013-03-18 21:45:33 +0100
  • ee483a7f91 Changed location of test HTML files Mišo Belica 2013-03-18 21:40:19 +0100
  • 3b5b2b1522 Renamed to readability Mišo Belica 2013-03-18 21:25:09 +0100
  • cf781bc595 Updated implementation of cached property Mišo Belica 2013-03-17 00:57:28 +0100
  • 4e3227521e Fewer code - fewer bugs (I hope) Mišo Belica 2013-03-15 01:40:41 +0100
  • 1a5970b238 Better names and positions for variables Mišo Belica 2013-03-15 00:52:56 +0100
  • 930b6ced12 Fixed transformation of leaf <div> into <p> Mišo Belica 2013-03-15 00:48:13 +0100
  • 314c999730 Drop useless tags by HTML cleaner Mišo Belica 2013-03-15 00:23:41 +0100
  • 272fe480a3 Updated setup.py Mišo Belica 2013-03-15 00:10:55 +0100
  • 9eacbd579c Updated LICENSE, AUTHORS, README Mišo Belica 2013-03-15 00:10:41 +0100
  • 18b5c9b447 Refactored file 'scoring.py' Mišo Belica 2013-03-11 23:06:21 +0100
  • dcb7c18fd5 Refactored file 'document.py' Mišo Belica 2013-03-11 22:10:26 +0100
  • 03ff0be266 Moved client script into 'breadability.scripts' Mišo Belica 2013-03-11 21:18:04 +0100
  • c92f61fa53 Fixed docopt version Mišo Belica 2013-03-11 12:43:17 +0100
  • ec88a4efe6 Use docopt as an argument parser Mišo Belica 2013-03-11 12:37:15 +0100
  • 8470ef2b45 Purification of file readable.py Mišo Belica 2013-03-09 13:15:05 +0100
  • b3b987440d Added test runner via nosetests Mišo Belica 2013-03-09 13:05:16 +0100
  • 2e2e906da7 Purification of document.py Mišo Belica 2013-03-09 00:05:49 +0100
  • 9f0fc2d433 Purification Mišo Belica 2013-03-08 23:48:35 +0100
  • baaefeda3c Refactored computing of link density Mišo Belica 2013-03-08 23:23:30 +0100
  • 3f71e1b7d4 Refactored checking of node's attribute Mišo Belica 2013-03-08 23:19:24 +0100
  • 636a38d705 Refactored generating of hash ID Mišo Belica 2013-03-08 23:06:57 +0100
  • 9a613317c0 Make package from tests Mišo Belica 2013-03-08 23:05:14 +0100
  • cc00976533 Replace implementation of 'cached_property' Mišo Belica 2013-03-08 19:29:15 +0100
  • e3b6ee2fd6 Suppress warning "ResourceWarning: unclosed file" Mišo Belica 2013-03-08 17:46:18 +0100
  • c69cd4b2ba Purification Mišo Belica 2013-03-08 17:42:01 +0100
  • 101950478e Simplify logging Mišo Belica 2013-03-08 17:41:39 +0100
  • 81be8ccbfb Updated readme Mišo Belica 2013-03-07 17:48:17 +0100
  • 9f83ea973a Fixed setup.py Mišo Belica 2013-03-07 17:12:14 +0100
  • 726fe59ecd Show build status from master branch [ci skip] Mišo Belica 2013-03-07 17:05:47 +0100
  • c7299b9852 Updated makefile [ci skip] Mišo Belica 2013-03-07 17:01:38 +0100
  • 671d940ded Removed branches from Travis configuration Mišo Belica 2013-03-07 16:57:41 +0100
  • ea90ee5a5e Updated changelog [ci skip] Mišo Belica 2013-03-07 16:52:50 +0100
  • c89010221e Changed/renamed/added AUTHORS, CHANGELOG, LICENSE Mišo Belica 2013-03-07 16:48:54 +0100
  • d31d804167 Exclude coverage file from repo Mišo Belica 2013-03-07 15:43:56 +0100
  • 231d251536 Added commands test into README Mišo Belica 2013-03-07 15:43:02 +0100
  • 3322681166 Use 'charade' for detecting encoding Mišo Belica 2013-03-07 15:42:18 +0100
  • 544220e9a3 Replaced u"" literal wit function 'to_unnicode' Mišo Belica 2013-03-07 15:13:15 +0100
  • 915876b675 Added Travis status image to README Mišo Belica 2013-03-07 14:57:14 +0100
  • 8c79d4c04b Set white-list branches for @travisbot Mišo Belica 2013-03-07 14:40:11 +0100
  • 94f6b0a84e Tests passes for both Python v2.7, v3.3 Mišo Belica 2013-03-07 14:15:10 +0100
  • 912bb50b76 Skip failing test that I don't know how to fix Mišo Belica 2013-03-07 13:22:51 +0100
  • c4dbe24a65 New repository structure Mišo Belica 2013-03-07 13:14:04 +0100
  • 75b3151de9 Update the unittest import to grab unittest2 for 2.6 Richard Harding 2012-12-12 20:37:24 -0500
  • 84f6a079f9 Try to adjust the travis command to test py2.6 Richard Harding 2012-12-12 20:16:08 -0500
  • b18589ced8 Use the right package doh Richard Harding 2012-12-12 20:08:09 -0500
  • 316c550709 Add python 2.6 to the travis ci Richard Harding 2012-12-12 20:00:23 -0500
  • fee5c37b39 Add argparse as a install req for py <2.7 Richard Harding 2012-12-12 19:58:27 -0500
  • 3dea2f349b Update ignore file Richard Harding 2012-10-29 11:00:06 +0100
  • 920094c81a Add a penalty for double quote chars in paragraphs. Nathan Nifong 2012-09-13 15:24:02 -0700
  • a902f29803 Merge bd226ad093 into 60da675da5 nhnifong 2012-09-13 15:25:27 -0700
  • bd226ad093 Added a penalty for double quote chars in paragraphs. They are far more common in random commented code and proprietary metadata that keeps slipping by the filter as actual content. Also downgraded the score value of commas for the same reason Nathan Nifong 2012-09-13 15:24:02 -0700
  • 60da675da5 Reprocess without candidate in case of errors using one Richard Harding 2012-08-27 17:31:14 -0400
  • 3984e04668 Add better handling around xml parsing issues Richard Harding 2012-08-27 15:31:28 -0400
  • fe9364295f prep for 0.1.7 release Richard Harding 2012-07-21 21:37:12 -0400
  • ae355e9f2f Update kwarg for older python Richard Harding 2012-07-21 21:36:03 -0400
  • 6623de15b3 Create gh-pages branch via GitHub gh-pages Rick Harding 2012-07-18 18:05:11 -0700
  • 0de17a7b81 Update readme Richard Harding 2012-06-21 15:55:09 -0400
  • e592f5322e Prep for 0.1.6 Richard Harding 2012-06-17 10:49:13 -0400
  • bf35e3410e Do some link filtring to drop stupid permalinks from the content. Richard Harding 2012-06-17 10:47:11 -0400
  • 9cf19d9970 Prep for 0.1.5 Richard Harding 2012-06-16 21:17:37 -0400
  • ff37f3169f Add checks to links to remove really bad links from the scripting site Richard Harding 2012-06-16 21:16:29 -0400
  • 5157b4570d Prep for the 0.1.4 release Richard Harding 2012-06-16 20:59:49 -0400
  • 5704eb4c15 Start process of adding a newtest script for generating test cases Richard Harding 2012-06-16 07:58:13 -0400
  • 3b00d33ad3 Prep for 0.1.3 release Richard Harding 2012-06-15 21:07:06 -0400
  • c2f935bf51 Remove code we didn't need Richard Harding 2012-06-15 21:03:50 -0400
  • 326fbfe107 Fix the processing and clean up the antipope article Richard Harding 2012-06-15 21:00:03 -0400
  • 3ae64f165e Update and merge Richard Harding 2012-06-15 20:15:37 -0400
  • edca1c74ba Add in test files for antipope blog post Richard Harding 2012-05-28 17:09:23 -0400