Successful return code on encoding error

Bug #1564317 reported by Gert van den Berg
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
One Hundred Papercuts
New
Undecided
Unassigned
html2text (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

When running html2text on a file with incorrect encoding (e.g. indicated as us-ascii, but actually UTF-8), it fails with an error message but returns a successful result code (0).

Actual output:
-----------------
$ html2text test.html; echo $?
Input recoding failed due to invalid input sequence. Unconverted part of text follows.
“Test”)
</body>
</html>

0

Expected output: (substitute 1 for any appropriate failure code)
----------------------
$ html2text test.html; echo $?
Input recoding failed due to invalid input sequence. Unconverted part of text follows.
“Test”)
</body>
</html>

1

This seem to have been introduced by 611-recognize-input-encoding.patch

OS: Ubuntu 14.04 / Ubuntu 16.04
Package: html2text-1.3.2a-17 / html2text-1.3.2a-18

Tags: patch
Revision history for this message
Gert van den Berg (mohag1) wrote :
description: updated
Revision history for this message
Gert van den Berg (mohag1) wrote :

Quick method to test:
sudo apt-get install html2text
curl -s https://launchpadlibrarian.net/250515419/test.html | html2text; echo $?

Revision history for this message
Gert van den Berg (mohag1) wrote :

The simplest fix is to replace the "continue" in the recode error handlers with "exit(1)" (Patched html2text.C lines 595 and 555). (Using a different value for encoding errors might be useful)

This makes the error handling consistent with other cases, like failing to open a file. (See line 482)

A better solution would be to keep track of whether an error occurred in the loop and then change the final return value based on that. (Requiring changes to add a variable declaration, and on lines 482, 510, 555, 582 and 595)

(Line numbers are from the patched Wily version)

Revision history for this message
Gert van den Berg (mohag1) wrote :

My attempt at an actual patch.

The other method, of actually changing all the error handling to save that there have been an error and returning the code at the end might still make sense for later.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch to modify patches to return an error code on recoding errors" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Gert van den Berg (mohag1) wrote :

Note: This is probably also present in Debian, since it seems like the Ubuntu package is mostly unmodified.

Revision history for this message
Gert van den Berg (mohag1) wrote :

Also affects 1.3.2a-18 on Ubuntu 16.04.

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in html2text (Ubuntu):
status: New → Confirmed
Revision history for this message
Roland Giesler (lifeboy) wrote :

I can confirm the same incorrect response on Ubuntu 14.04.5 LTS with all updates installed.

Revision history for this message
Roland Giesler (lifeboy) wrote :

I logged this upstream at https://github.com/Alir3z4/html2text/issues/156 as well.

Revision history for this message
Gert van den Berg (mohag1) wrote :

@Roland: Incorrect upstream, that is python-html2text / python3-html2text. This is from http://www.mbayer.de/html2text/ (And that won't have the issue, since it was introduced in Debian's UTF-8 patch) (Debian might also be affected though)

Revision history for this message
Roland Giesler (lifeboy) wrote :

Gert, you're right, my bad.

I get the full picture now. Have you reported this to the Debian bugtracker?

Revision history for this message
Roland Giesler (lifeboy) wrote :

Confirmed on Debian 8:

$ curl -s https://launchpadlibrarian.net/250515419/test.html | html2text; echo $?
Input recoding failed due to invalid input sequence. Unconverted part of text follows.
“Test��?)
</body>
</html>

0

Revision history for this message
Gert van den Berg (mohag1) wrote :

I haven't reported it to Debian, I did not have a test system to confirm that it is present there as well (I'm quite sure it would be present, it is the same package source) (And https://www.debian.org/Bugs/Reporting for manual reporting seems quite scary...)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.