Comment 6 for bug 505078

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 505078] Re: crashes with "invalid literal for int() with base 16: ''" if linefeeds are missing from the file

Copying from the gnome terminal into gvim lost the linefeeds. I don't
know whose fault it is, and in a sense it doesn't matter because
similar problems will occur in different situations, such as

* pasting data from a failing test run into an email or bug report
* opening the failing run in an editor, trimming it, and saving it
* pasting from a failed run in a terminal, or mail from pqm, or a
failure in a web page
* new implementations of the subunit protocol
* running existing implementations connected to a file stream that
does eol conversion

I think supporting these things are useful. Obviously there is no end
to how much text can be mangled and it would probably be asking too
much that subunit be robust against programs wrapping the text or
deleting whitespace, though both of those will eventually happen.

A format that specifically requires both \n and \r\n at different
points is pretty perverse: it looks like text but it can't actually be
treated as such. I think the smallest sensible fix would be to
define it to be all unix form, and give a clean error if it's not
transmitted as such.

> So, the following:
> 3\r\n
> 0\r\n
> 0\r\n
>
> (a single chunk consisting of '0\r\n') if copied and pasted will fail to
> parse even with a more relaxed parse because it would become:
> 3\n
> 0\n
> 0\n
>
> Now, I appreciate that many attachments will not contain \r and so not
> fail in this way - but tabs are also lost by some terminals.
>
> Sadly, the tensions:
>  - relatively easy for humans to read
>  - binary attachment safe
>
> have caused simple copy and paste to stop being as robust. Now, I've
> documented the multipart stuff as experimental, to permit it to be
> changed if needed. We could uuencode the contents of the attachments,
> but that would make them considerably less human inspectable IMO.

You're already partly using mime here, so I think you might as well do
it the rest of the way, by specifying a content-encoding for the
attachments. They could be quoted-printable for things that are
mostly text (like tracebacks) but need to be sent byte-for-byte, plain
text for things like tracebacks, and base64 for tarballs.

--
Martin <http://launchpad.net/~mbp/>