Support for live unicode character replacement

Bug #646861 reported by Tom "spot" Callaway
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gwibber
Triaged
Wishlist
Unassigned

Bug Description

As requested here (https://bugzilla.redhat.com/show_bug.cgi?id=634789), I looked into adding support for replacing common multi-character strings with their single character unicode equivalent, to minimize character count for microblog services which have a limit (which is pretty much all of them).

Specifically, "..." and "--" can be simplified to "…" and "—".

I hacked together a patch for this, but I admit it is not pretty. It does work for me, but the console doesn't love it, as I get:

found ... trying to replace
/usr/bin/gwibber:73: GtkWarning: Invalid text buffer iterator: either the iterator is uninitialized, or the characters/pixbufs/widgets in the buffer have been modified since the iterator was created.
You must use marks, character numbers, or line numbers to preserve a position across buffer modifications.
You can apply tags and insert marks without invalidating your iterators,
but any mutation that affects 'indexable' buffer contents (contents that can be referred to by character offset)
will invalidate all outstanding iterators
  gtk.main()
/usr/bin/gwibber:73: GtkWarning: _gtk_text_buffer_get_line_log_attrs: assertion `GTK_IS_TEXT_BUFFER (buffer)' failed
  gtk.main()
/usr/bin/gwibber:73: GtkWarning: IA__gtk_text_buffer_remove_tag: assertion `gtk_text_iter_get_buffer (end) == buffer' failed
  gtk.main()

** (gwibber:11855): CRITICAL **: enchant_dict_check: assertion `len' failed
/usr/bin/gwibber:73: GtkWarning: gtk_text_buffer_set_mark: assertion `gtk_text_iter_get_buffer (iter) == buffer' failed
  gtk.main()

I can't seem to find a good example of searching for a string in a textbuf, and replacing it in place whenever found. I did it without using marks or iters, which is why the warnings get thrown. I'm attaching this patch as a starting point, with the hope that someone who understands textbufs better than I do will clean it up a bit so it is no longer such a warning generator.

Revision history for this message
Tom "spot" Callaway (tcallawa) wrote :
Revision history for this message
Toshio Kuratomi (toshio) wrote :

This patch works with the text added event rather than the changed event so it should fire closer to when the event occurs. It uses iterator methods so that we don't get the warnings listed in the last comment. It handles both adding characters one at a time and pasting a string with multiple instances of the replacement charaxters in it.

Revision history for this message
Tom "spot" Callaway (tcallawa) wrote :

Very nice Toshio!

Bilal Shahid (s9iper1)
Changed in gwibber:
importance: Undecided → Wishlist
Revision history for this message
Bilal Shahid (s9iper1) wrote :

 join us in #gwibber on FreeNode and we can try to guide you through it

Revision history for this message
Bilal Shahid (s9iper1) wrote :

"Thanks for your patch, unfortunately our busy developers haven't been
able to review your patch in a timely manor. The gwibber codebase has
seen significant change and it is likely this patch no longer applies.
Please review it again and if it is still applicable, update it to work
with the latest gwibber trunk. We will be doing a patch review day in
the next few weeks and would like to review your patch. Thanks again for
your contribution!"

Changed in gwibber:
status: New → Incomplete
tags: added: patch-day-old
Revision history for this message
dobey (dobey) wrote :

This patch no longer applies, as the UI has been rewritten in Vala. However, the patch is also wrong as it seems to have "unicode" characters encoded in ISO-8859-15 perhaps, rather than the actual UTF-8 characters. Perhaps the same functinality would be useful in the Vala version, but a new patch would need to be written.

tags: removed: patch-day-old
Changed in gwibber:
status: Incomplete → Triaged
Revision history for this message
Toshio Kuratomi (toshio) wrote :

Actually -- that's incorrect. The file is encoded in UTF-8. It's probably just your browser is displaying it in a different encoding... Maybe launchpad is specifying a different default content encoding?

$ wget https://launchpadlibrarian.net/56422721/gwibber-867bzr-minimize-chars.patch
$ file gwibber-867bzr-minimize-chars.patch
gwibber-867bzr-minimize-chars.patch: unified diff output, UTF-8 Unicode text
$ LC_ALL=en_US.utf8 cat gwibber-867bzr-minimize-chars.patch|grep '\.\.\.'
+ self.replacements = [(".", "...", "…"), ("-", "--", "—")]

I'm afraid I don't know vala and I'm no longer using gnome/gtk for coding so I don't know if I can help from here.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.