Strip HTML and do some semi-intelligent detection of faux extensions (i.e., more than 10 characters probably isn't one). Fixes #5049.